This is an archive of the discontinued Mercurial Phabricator instance.

Differential D12089

encoding: fix trim() to be O(n) instead of O(n^2)
ClosedPublic

Authored by martinvonz on Jan 26 2022, 1:29 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Commits

rHGf1ed5c304f45: encoding: fix trim() to be O(n) instead of O(n^2)

Summary

encoding.trim() iterated over the possible lengths smaller than the
input and created a slice for each. It then calculated the column
width of the result, which is of course O(n), so the overall algorithm
was O(n). This patch rewrites it to iterate over the unicode
characters, keeping track of the length so far. Also, the old
algorithm started from the end of the string, which made it much worse
when the input is large and the limit is small (such as the typical 72
we pass to it).

You can time it by running something like this:

time python3 -c 'from mercurial.utils import stringutil; print(stringutil.ellipsis(b"0123456789" * 1000, 5))'

That drops from 4.05 s to 83 ms with this patch (and most of that is
of course startup time).

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.Jan 26 2022, 1:29 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptJan 26 2022, 1:29 PM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

martinvonz added a commit: rHGf1ed5c304f45: encoding: fix trim() to be O(n) instead of O(n^2).Jan 26 2022, 2:14 PM

This revision was not accepted when it landed; it landed in state Needs Review.

Closed by commit rHGf1ed5c304f45: encoding: fix trim() to be O(n) instead of O(n^2) (authored by martinvonz). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/encoding.py (24 lines)

Diff 31982

mercurial/encoding.py


	if ucolwidth(u) <= width: # trimming is not needed			if ucolwidth(u) <= width: # trimming is not needed
	return s			return s

	width -= len(ellipsis)			width -= len(ellipsis)
	if width <= 0: # no enough room even for ellipsis			if width <= 0: # no enough room even for ellipsis
	return ellipsis[: width + len(ellipsis)]			return ellipsis[: width + len(ellipsis)]

				chars = list(u)
	if leftside:			if leftside:
	uslice = lambda i: u[i:]			chars.reverse()
	concat = lambda s: ellipsis + s			width_so_far = 0
	else:			for i, c in enumerate(chars):
	uslice = lambda i: u[:-i]			width_so_far += ucolwidth(c)
	concat = lambda s: s + ellipsis			if width_so_far > width:
	for i in pycompat.xrange(1, len(u)):			break
	usub = uslice(i)			chars = chars[:i]
	if ucolwidth(usub) <= width:			if leftside:
	return concat(usub.encode(_sysstr(encoding)))			chars.reverse()
	return ellipsis # no enough room for multi-column characters			u = u''.join(chars).encode(_sysstr(encoding))
				if leftside:
				return ellipsis + u
				return u + ellipsis


	class normcasespecs(object):			class normcasespecs(object):
	"""what a platform's normcase does to ASCII strings			"""what a platform's normcase does to ASCII strings

	This is specified per platform, and should be consistent with what normcase			This is specified per platform, and should be consistent with what normcase
	on that platform actually does.			on that platform actually does.

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	31981		Jan 26 2022, 1:29 PM	★	★
Diff 2	31982	rHGf1ed5c304f4558218e4f827340c1be256574844d	Jan 26 2022, 1:11 PM	★	★