It's wasteful to call splitlines() and only get the first line from
it. However, Python doesn't seem to provide a built-in way of doing
just one split based on the set of bytes used by splitlines(). As a
workaround, we do an initial split on just LF and then call
splitlines() on the result. Thanks to Joerg for this suggestion. I
didn't bother to also split on CR, so users with old Mac editors (or
repos created by such editors) will not get this performance
improvement.
Details
- Reviewers
Alphare - Group Reviewers
hg-reviewers - Commits
- rHG75794847ef62: stringutil: try to avoid running `splitlines()` only to get first line
Diff Detail
- Repository
- rHG Mercurial
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Note that you should at least try to check if text[i-1] is \r, otherwise the result changes from before. I'm perfectly fine with not supporting the ancient Mac convention of using \r only, but we should work properly for DOS-style \r\n.
How does the behavior change when using \r\n?
I should also update the commit message with s/Windows/Mac/. Thanks for pointing that out.
Well, for DOS-style line ending, the find will point to the \n and we should return everything before the \r. If we ignore line endings mixed with old Mac-style, just checking the character before is enough to cover both DOS and Unix convention.
Oh, sorry. I thought you returned the slice directly, which should avoid some extra memory allocations.
Hehe, I thought that was your idea :) But I'm glad I misunderstood you because I think this solution is good (we avoid splitting a long message into many lines, and we preserve the current behavior around \v etc.). I've updated the commit message to say "Mac" instead of "Windows".