This is an archive of the discontinued Mercurial Phabricator instance.

stringutil: try to avoid running `splitlines()` only to get first line
ClosedPublic

Authored by martinvonz on Mar 25 2022, 12:55 PM.

Details

Summary

It's wasteful to call splitlines() and only get the first line from
it. However, Python doesn't seem to provide a built-in way of doing
just one split based on the set of bytes used by splitlines(). As a
workaround, we do an initial split on just LF and then call
splitlines() on the result. Thanks to Joerg for this suggestion. I
didn't bother to also split on CR, so users with old Mac editors (or
repos created by such editors) will not get this performance
improvement.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.Mar 25 2022, 12:55 PM

Note that you should at least try to check if text[i-1] is \r, otherwise the result changes from before. I'm perfectly fine with not supporting the ancient Mac convention of using \r only, but we should work properly for DOS-style \r\n.

Note that you should at least try to check if text[i-1] is \r, otherwise the result changes from before. I'm perfectly fine with not supporting the ancient Mac convention of using \r only, but we should work properly for DOS-style \r\n.

How does the behavior change when using \r\n?

I should also update the commit message with s/Windows/Mac/. Thanks for pointing that out.

Well, for DOS-style line ending, the find will point to the \n and we should return everything before the \r. If we ignore line endings mixed with old Mac-style, just checking the character before is enough to cover both DOS and Unix convention.

Well, for DOS-style line ending, the find will point to the \n and we should return everything before the \r.

But the text.splitlines()[0] that comes after should lose the \r, right?

Oh, sorry. I thought you returned the slice directly, which should avoid some extra memory allocations.

martinvonz edited the summary of this revision. (Show Details)Mar 28 2022, 11:39 AM

Oh, sorry. I thought you returned the slice directly, which should avoid some extra memory allocations.

Hehe, I thought that was your idea :) But I'm glad I misunderstood you because I think this solution is good (we avoid splitting a long message into many lines, and we preserve the current behavior around \v etc.). I've updated the commit message to say "Mac" instead of "Windows".

Alphare accepted this revision.Apr 6 2022, 6:06 AM
This revision is now accepted and ready to land.Apr 6 2022, 6:06 AM