This is an archive of the discontinued Mercurial Phabricator instance.

Differential D12413

stringutil: try to avoid running `splitlines()` only to get first line
ClosedPublic

Authored by martinvonz on Mar 25 2022, 12:55 PM.

Download Raw Diff

Details

Reviewers

Alphare

Group Reviewers

hg-reviewers

Commits

rHG75794847ef62: stringutil: try to avoid running `splitlines()` only to get first line

Summary

It's wasteful to call splitlines() and only get the first line from
it. However, Python doesn't seem to provide a built-in way of doing
just one split based on the set of bytes used by splitlines(). As a
workaround, we do an initial split on just LF and then call
splitlines() on the result. Thanks to Joerg for this suggestion. I
didn't bother to also split on CR, so users with old Mac editors (or
repos created by such editors) will not get this performance
improvement.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.Mar 25 2022, 12:55 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptMar 25 2022, 12:55 PM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

Note that you should at least try to check if text[i-1] is \r, otherwise the result changes from before. I'm perfectly fine with not supporting the ancient Mac convention of using \r only, but we should work properly for DOS-style \r\n.

In D12413#189957, @joerg.sonnenberger wrote:

Note that you should at least try to check if text[i-1] is \r, otherwise the result changes from before. I'm perfectly fine with not supporting the ancient Mac convention of using \r only, but we should work properly for DOS-style \r\n.

How does the behavior change when using \r\n?

I should also update the commit message with s/Windows/Mac/. Thanks for pointing that out.

Well, for DOS-style line ending, the find will point to the \n and we should return everything before the \r. If we ignore line endings mixed with old Mac-style, just checking the character before is enough to cover both DOS and Unix convention.

In D12413#189959, @joerg.sonnenberger wrote:

Well, for DOS-style line ending, the find will point to the \n and we should return everything before the \r.

But the text.splitlines()[0] that comes after should lose the \r, right?

Oh, sorry. I thought you returned the slice directly, which should avoid some extra memory allocations.

martinvonz edited the summary of this revision. (Show Details)Mar 28 2022, 11:39 AM

In D12413#189961, @joerg.sonnenberger wrote:

Oh, sorry. I thought you returned the slice directly, which should avoid some extra memory allocations.

Hehe, I thought that was your idea :) But I'm glad I misunderstood you because I think this solution is good (we avoid splitting a long message into many lines, and we preserve the current behavior around \v etc.). I've updated the commit message to say "Mac" instead of "Windows".

Alphare accepted this revision.Apr 6 2022, 6:06 AM

This revision is now accepted and ready to land.Apr 6 2022, 6:06 AM

martinvonz added a commit: rHG75794847ef62: stringutil: try to avoid running `splitlines()` only to get first line.Apr 6 2022, 6:10 AM

Closed by commit rHG75794847ef62: stringutil: try to avoid running `splitlines()` only to get first line (authored by martinvonz). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/utils/stringutil.py (4 lines)

Status	Author	Revision
Closed	martinvonz	D12413 stringutil: try to avoid running `splitlines()` only to get first line
Closed	martinvonz	D12412 logcmdutil: use new function for getting first line of string
Closed	martinvonz	D12411 filemerge: use new function for getting first line of string
Closed	martinvonz	D12410 absorb: use new function for getting first line of string
Closed	martinvonz	D12409 extensions: use new function for getting first line of string
Closed	martinvonz	D12408 bookmarks: use new function for getting first line of string
Closed	martinvonz	D12407 help: use new function for getting first line of string
Closed	martinvonz	D12406 histedit: remove an unnecessary default value of `b''` for commit message
Closed	martinvonz	D12405 histedit: use new function for getting first line of a string
Closed	martinvonz	D12404 templates: extract function to `stringutil` for getting first line of text
Closed	martinvonz	D12403 templates: make `firstline` filter not keep '\v', '\f' and similar

Diff 32813

mercurial/utils/stringutil.py

	>>> isauthorwellformed(b'Bad Author <author>')			>>> isauthorwellformed(b'Bad Author <author>')
	False			False
	"""			"""
	return _correctauthorformat.match(author) is not None			return _correctauthorformat.match(author) is not None


	def firstline(text):			def firstline(text):
	"""Return the first line of the input"""			"""Return the first line of the input"""
				# Try to avoid running splitlines() on the whole string
				i = text.find(b'\n')
				if i != -1:
				text = text[:i]
	try:			try:
	return text.splitlines()[0]			return text.splitlines()[0]
	except IndexError:			except IndexError:
	return b''			return b''


	def ellipsis(text, maxlength=400):			def ellipsis(text, maxlength=400):
	"""Trim string to at most maxlength (default: 400) columns in display."""			"""Trim string to at most maxlength (default: 400) columns in display."""

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	32715		Mar 25 2022, 12:55 PM	★	★
Diff 2	32813	rHG75794847ef6240419d162dd1f475119ea690f0ec	Mar 25 2022, 11:33 AM	★	★