This is an archive of the discontinued Mercurial Phabricator instance.

Differential D6422

copies: avoid calling matcher if matcher.always()
ClosedPublic

Authored by martinvonz on May 21 2019, 8:32 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Commits

rHGc0b51449bf6b: copies: avoid calling matcher if matcher.always()

Summary

When storing copy information in the changesets
(experimental.copies.read-from=changeset-only), this patch speeds up

hg debugpathcopies FENNEC_58_0_2_BUILD1 FIREFOX_59_0b8_BUILD2

from 5.9s to 4.7s. At the start of this series (b162229e), that
command took 18min.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.May 21 2019, 8:32 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptMay 21 2019, 8:32 PM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

marmoute added a subscriber: marmoute.May 22 2019, 4:19 AM

This comment was removed by marmoute.

marmoute added a comment.May 22 2019, 6:59 AM

This comment was removed by marmoute.

martinvonz added a child revision: D6431: copies: also encode p[12]copies destination as index into "files" list.May 22 2019, 1:02 PM

Can you indicate a summary of the total speedup of the series ? (from base to last changesets?). Also I am not sure for which case these number apply ? Is this the compatibility mode or after repository conversion ? Can we have number for both ?

To have a more diverse picture of the performacne of thes change, can you provide timing data for the following case?

mozilla-central: hg perfpathcopies 76caed42cf7cb7098aa0eb58242dd36054d06865 1daa622bbe42f8a85e0b4880c5c25df8ea60e95f
pypy:            hg perfpathcopies 3c8ac35c653afe108127ca75688e2f8278192512 d7746d32bf9d785bbc0c6afc9aa6015410a38c8f
mercurial:       hg perfpathcopies 7adb1274a4f930e13b35545ef23914ccae7d5534 0c6c600c03fddabcc45f1046e869f84b276fb467
netbeans:        hg perfpathcopies 588c2d1ced709885eb0bc6b88137efbadbb35b76 1aad62e59ddde2ce37882af12fbb202d3b7961dc

martinvonz edited the summary of this revision. (Show Details)May 22 2019, 3:15 PM

In D6422#93466, @marmoute wrote:

Can you indicate a summary of the total speedup of the series ? (from base to last changesets?).

Sure, done.

Also I am not sure for which case these number apply ? Is this the compatibility mode or after repository conversion ?

After repo conversion.

Can we have number for both ?

The compatibility number is going to be similar to before this series, since it won't benefit from having the removed set of files available cheaply. It would make sense with a follow-up for speeding up compatibility mode by not filtering out removed files. I'm not sure if that should be a separate option or not.

To have a more diverse picture of the performacne of thes change, can you provide timing data for the following case?

mozilla-central: hg perfpathcopies 76caed42cf7cb7098aa0eb58242dd36054d06865 1daa622bbe42f8a85e0b4880c5c25df8ea60e95f
pypy:            hg perfpathcopies 3c8ac35c653afe108127ca75688e2f8278192512 d7746d32bf9d785bbc0c6afc9aa6015410a38c8f
mercurial:       hg perfpathcopies 7adb1274a4f930e13b35545ef23914ccae7d5534 0c6c600c03fddabcc45f1046e869f84b276fb467
netbeans:        hg perfpathcopies 588c2d1ced709885eb0bc6b88137efbadbb35b76 1aad62e59ddde2ce37882af12fbb202d3b7961dc

Can you provide some tags you're curious about instead so it's easier to run the same command in both repos (the hashes are different)? I have the mozilla-unified repo and the hg repo converted.

The nodes in the above example have been selected by a script because they had interresting property. They are not based on a tag so I can't give you one. How did you converted the repo ? I think hg convert keeps a map somewhere, otherwise, using the commit message could work.

In D6422#93543, @marmoute wrote:

The nodes in the above example have been selected by a script because they had interresting property. They are not based on a tag so I can't give you one. How did you converted the repo ? I think hg convert keeps a map somewhere, otherwise, using the commit message could work.

Fair enough. For the mozilla repo, it takes 25s with copies in filelogs and 1m40s with copies in changesets (after this patch). For the mercurial repo, it takes 180ms with either format.

(I did some experiment, here seems a good spot to report them)

I build a crude cache (cbor based storage) for the data that needs caching after this series and tested it against my pypy test case.

filelog-based: 15s
compatibility mode; without cache: 75s
compatibility mode; caching copies without this series: 60s
compatibility mode; caching copies with this series: 40s
compatibility mode; caching all data with this series: 7s (65% spend parsing cbor cache data)

This is much promissing, even if need to check on more diverse cases (various factor can influence performance: number of considered file, number of changeset traversed, number of intermediate version, etc).

The timing above is enough motivation for me to look seriously into a caching/alternative storage plan.

In D6422#93611, @marmoute wrote:
(I did some experiment, here seems a good spot to report them)
I build a crude cache (cbor based storage) for the data that needs caching after this series and tested it against my pypy test case.
filelog-based: 15s
compatibility mode; without cache: 75s
compatibility mode; caching copies without this series: 60s
compatibility mode; caching copies with this series: 40s
compatibility mode; caching all data with this series: 7s (65% spend parsing cbor cache data)
This is much promissing, even if need to check on more diverse cases (various factor can influence performance: number of considered file, number of changeset traversed, number of intermediate version, etc).
The timing above is enough motivation for me to look seriously into a caching/alternative storage plan.

Nice :) Thanks for working on a way to get this stuff out to existing repos (which has not been a priority for me, since that is not Google's use case).

martinvonz removed a child revision: D6431: copies: also encode p[12]copies destination as index into "files" list.Jun 6 2019, 12:43 PM

martinvonz added a commit: rHGc0b51449bf6b: copies: avoid calling matcher if matcher.always().Jun 17 2019, 1:33 PM

This revision was not accepted when it landed; it landed in state Needs Review.

Closed by commit rHGc0b51449bf6b: copies: avoid calling matcher if matcher.always() (authored by martinvonz). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/copies.py (5 lines)

Status	Author	Revision
Closed	martinvonz	D6422 copies: avoid calling matcher if matcher.always()
Closed	martinvonz	D6421 copies: avoid unnecessary copying of copy dict
Closed	martinvonz	D6420 copies: don't filter out copy targets created on other side of merge commit
Closed	martinvonz	D6419 copies: do full filtering at end of _changesetforwardcopies()
Closed	martinvonz	D6418 copies: split up _chain() in naive chaining and filtering steps
Closed	martinvonz	D6417 context: get filesadded() and filesremoved() from changeset if configured
Closed	martinvonz	D6416 changelog: optionally store added and removed files in changeset extras
Closed	martinvonz	D6369 templatekw: make {file_*} compare to both merge parents (issue4292)
Closed	martinvonz	D6370 templatekw: move showfileadds() close to showfile{mods,dels}()
Closed	martinvonz	D6368 tests: add test for {file_mods}, {file_adds}, {file_dels} on merge commit
Closed	martinvonz	D6367 context: add ctx.files{modified,added,removed}() methods

Diff 15546

mercurial/copies.py

	children[p].append(r)			children[p].append(r)

	roots = set(children) - set(missingrevs)			roots = set(children) - set(missingrevs)
	# 'work' contains 3-tuples of a (revision number, parent number, copies).			# 'work' contains 3-tuples of a (revision number, parent number, copies).
	# The parent number is only used for knowing which parent the copies dict			# The parent number is only used for knowing which parent the copies dict
	# came from.			# came from.
	work = [(r, 1, {}) for r in roots]			work = [(r, 1, {}) for r in roots]
	heapq.heapify(work)			heapq.heapify(work)
				alwaysmatch = match.always()
	while work:			while work:
	r, i1, copies1 = heapq.heappop(work)			r, i1, copies1 = heapq.heappop(work)
	if work and work[0][0] == r:			if work and work[0][0] == r:
	# We are tracing copies from both parents			# We are tracing copies from both parents
	r, i2, copies2 = heapq.heappop(work)			r, i2, copies2 = heapq.heappop(work)
	copies = {}			copies = {}
	allcopies = set(copies1) \| set(copies2)			allcopies = set(copies1) \| set(copies2)
	# TODO: perhaps this filtering should be done as long as ctx			# TODO: perhaps this filtering should be done as long as ctx
	# is merge, whether or not we're tracing from both parent.			# is merge, whether or not we're tracing from both parent.
	for dst in allcopies:			for dst in allcopies:
	if not match(dst):			if not alwaysmatch and not match(dst):
	continue			continue
	# Unlike when copies are stored in the filelog, we consider			# Unlike when copies are stored in the filelog, we consider
	# it a copy even if the destination already existed on the			# it a copy even if the destination already existed on the
	# other branch. It's simply too expensive to check if the			# other branch. It's simply too expensive to check if the
	# file existed in the manifest.			# file existed in the manifest.
	if dst in copies1:			if dst in copies1:
	# If it was copied on the p1 side, mark it as copied from			# If it was copied on the p1 side, mark it as copied from
	# that side, even if it was also copied on the p2 side.			# that side, even if it was also copied on the p2 side.
	childctx = repo[c]			childctx = repo[c]
	if r == childctx.p1().rev():			if r == childctx.p1().rev():
	parent = 1			parent = 1
	childcopies = childctx.p1copies()			childcopies = childctx.p1copies()
	else:			else:
	assert r == childctx.p2().rev()			assert r == childctx.p2().rev()
	parent = 2			parent = 2
	childcopies = childctx.p2copies()			childcopies = childctx.p2copies()
	if not match.always():			if not alwaysmatch:
	childcopies = {dst: src for dst, src in childcopies.items()			childcopies = {dst: src for dst, src in childcopies.items()
	if match(dst)}			if match(dst)}
	# Copy the dict only if later iterations will also need it			# Copy the dict only if later iterations will also need it
	if i != len(children[r]) - 1:			if i != len(children[r]) - 1:
	copies = copies.copy()			copies = copies.copy()
	if childcopies:			if childcopies:
	childcopies = _chain(copies, childcopies)			childcopies = _chain(copies, childcopies)
	else:			else:

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	15220		May 21 2019, 8:32 PM	★	★
Diff 2	15546	rHGc0b51449bf6b70de368ffa439c7e6ea7f12ef235	May 3 2019, 2:39 AM	★	★