This is an archive of the discontinued Mercurial Phabricator instance.

copies: make two version of the changeset centric algorithm
ClosedPublic

Authored by marmoute on Sep 28 2020, 12:32 PM.

Download Raw Diff

Details

Reviewers

Alphare
pulkit

Group Reviewers

hg-reviewers

Commits

rHGad6ebb6f0dfe: copies: make two version of the changeset centric algorithm

Summary

They are two main ways to run the changeset-centric copy-tracing algorithm. One
fed from data stored in side-data and still in development, and one based on
data stored in extra (with a "compatibility" mode).

The extra based is used in production at Google, but still experimental in
code. It is mostly unsuitable for other users because it affects the hash.

The side-data based storage and algorithm have been evolving to store more data, cover more cases
(mostly around merge, that Google do not really care about) and use lower level
storage for efficiency.

All this changes make is increasingly hard to maintain de common code base,
without impacting code complexity and performance. For example, the
compatibility mode requires to keep things at different level than what we
need for side-data.

So, I am duplicating the involved functions. The newly added _extra variants
will be kept as today, while I will do some deeper rework of the side data
versions.

Long terms, the side-data version should be more featureful and performant than
the extra based version, so I expect the duplicated _extra functions to
eventually get dropped.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

marmoute created this revision.Sep 28 2020, 12:32 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptSep 28 2020, 12:32 PM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

marmoute added a child revision: D9115: copies: use dedicated `_revinfo_getter` function and call.Sep 28 2020, 12:32 PM

Alphare accepted this revision.Oct 1 2020, 11:49 AM

marmoute edited parent revisions, added: D9092: changing-files: retrieve changelogrevision.files from the sidedata block; removed: D9113: copies: rename some function to the new naming scheme.Oct 2 2020, 12:39 PM

pulkit accepted this revision.Oct 6 2020, 4:20 AM

This revision is now accepted and ready to land.Oct 6 2020, 4:20 AM

marmoute added a commit: rHGad6ebb6f0dfe: copies: make two version of the changeset centric algorithm.Oct 6 2020, 5:03 AM

Closed by commit rHGad6ebb6f0dfe: copies: make two version of the changeset centric algorithm (authored by marmoute). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/copies.py (104 lines)

Status	Author	Revision
Closed	marmoute	D9141 copies: move `merged` testing sooner
Closed	marmoute	D9140 copies: return None instead of ChangingFiles when relevant
Closed	marmoute	D9139 copies: add a HASCOPIESINFO flag to highlight rev with useful data
Closed	marmoute	D9130 salvaged: properly deal with salvaged file during copy tracing
Closed	marmoute	D9129 salvaged: persist the salvaged set on disk
Closed	marmoute	D9128 changing-files: add clean computation of changed file for merges
Closed	marmoute	D9127 changing-files: add clean computation of changed files for linear changesets
Closed	marmoute	D9126 changing-files: add clean computation of changed files for roots
Closed	marmoute	D9125 changing-files: add a debug command display changed files
Closed	marmoute	D9124 side-data: add a test to check sidedata upgrade
Closed	marmoute	D9123 changing-files: split the changing files computation from encoding
Closed	marmoute	D9120 salvaged: record salvaged in ChangingFiles at commit time
Closed	marmoute	D9119 salvaged: track removal-candidates in more cases
Closed	marmoute	D9122 salvaged: explicitly skip salvaged file while encoding
Closed	marmoute	D9118 changing-files: add a "salvaged" set to track file that were not removed
Closed	marmoute	D9117 copies: directly pass a changes object to the copy tracing code
Closed	marmoute	D9116 copies: no longer change the sidedata flag
Closed	marmoute	D9115 copies: use dedicated `_revinfo_getter` function and call
Closed	marmoute	D9114 copies: make two version of the changeset centric algorithm
Closed	marmoute	D9113 copies: rename some function to the new naming scheme
Closed	marmoute	D9112 changing-files: cache the various property
Closed	marmoute	D9111 changing-files: always use `mark_touched` to update the touched set
Closed	marmoute	D9092 changing-files: retrieve changelogrevision.files from the sidedata block
Closed	marmoute	D9091 changing-files: drop the now useless changelogrevision argument
Closed	marmoute	D9090 changing-files: rework the way we store changed files in side-data
Closed	marmoute	D9143 changing-files: fix docstring

Diff 23048

mercurial/copies.py

	cl.reachableroots(min_root, [b.rev()], list(roots), includepath=True)			cl.reachableroots(min_root, [b.rev()], list(roots), includepath=True)
	)			)

	iterrevs = set(from_head)			iterrevs = set(from_head)
	iterrevs &= mrset			iterrevs &= mrset
	iterrevs.update(roots)			iterrevs.update(roots)
	iterrevs.remove(b.rev())			iterrevs.remove(b.rev())
	revs = sorted(iterrevs)			revs = sorted(iterrevs)

				if repo.filecopiesmode == b'changeset-sidedata':
	return _combine_changeset_copies(			return _combine_changeset_copies(
	revs, children, b.rev(), revinfo, match, isancestor			revs, children, b.rev(), revinfo, match, isancestor
	)			)
				else:
				return _combine_changeset_copies_extra(
				revs, children, b.rev(), revinfo, match, isancestor
				)


	def _combine_changeset_copies(			def _combine_changeset_copies(
	revs, children, targetrev, revinfo, match, isancestor			revs, children, targetrev, revinfo, match, isancestor
	):			):
	"""combine the copies information for each item of iterrevs			"""combine the copies information for each item of iterrevs

	revs: sorted iterable of revision to visit			revs: sorted iterable of revision to visit
	if (			if (
	new_tt == other_tt			new_tt == other_tt
	or not isancestor(new_tt, other_tt)			or not isancestor(new_tt, other_tt)
	or ismerged(dest)			or ismerged(dest)
	):			):
	minor[dest] = value			minor[dest] = value


				def _combine_changeset_copies_extra(
				revs, children, targetrev, revinfo, match, isancestor
				):
				"""version of `_combine_changeset_copies` that works with the Google
				specific "extra" based storage for copy information"""
				all_copies = {}
				alwaysmatch = match.always()
				for r in revs:
				copies = all_copies.pop(r, None)
				if copies is None:
				# this is a root
				copies = {}
				for i, c in enumerate(children[r]):
				p1, p2, p1copies, p2copies, removed, ismerged = revinfo(c)
				if r == p1:
				parent = 1
				childcopies = p1copies
				else:
				assert r == p2
				parent = 2
				childcopies = p2copies
				if not alwaysmatch:
				childcopies = {
				dst: src for dst, src in childcopies.items() if match(dst)
				}
				newcopies = copies
				if childcopies:
				newcopies = copies.copy()
				for dest, source in pycompat.iteritems(childcopies):
				prev = copies.get(source)
				if prev is not None and prev[1] is not None:
				source = prev[1]
				newcopies[dest] = (c, source)
				assert newcopies is not copies
				for f in removed:
				if f in newcopies:
				if newcopies is copies:
				# copy on write to avoid affecting potential other
				# branches. when there are no other branches, this
				# could be avoided.
				newcopies = copies.copy()
				newcopies[f] = (c, None)
				othercopies = all_copies.get(c)
				if othercopies is None:
				all_copies[c] = newcopies
				else:
				# we are the second parent to work on c, we need to merge our
				# work with the other.
				#
				# In case of conflict, parent 1 take precedence over parent 2.
				# This is an arbitrary choice made anew when implementing
				# changeset based copies. It was made without regards with
				# potential filelog related behavior.
				if parent == 1:
				_merge_copies_dict_extra(
				othercopies, newcopies, isancestor, ismerged
				)
				else:
				_merge_copies_dict_extra(
				newcopies, othercopies, isancestor, ismerged
				)
				all_copies[c] = newcopies

				final_copies = {}
				for dest, (tt, source) in all_copies[targetrev].items():
				if source is not None:
				final_copies[dest] = source
				return final_copies


				def _merge_copies_dict_extra(minor, major, isancestor, ismerged):
				"""version of `_merge_copies_dict` that works with the Google
				specific "extra" based storage for copy information"""
				for dest, value in major.items():
				other = minor.get(dest)
				if other is None:
				minor[dest] = value
				else:
				new_tt = value[0]
				other_tt = other[0]
				if value[1] == other[1]:
				continue
				# content from "major" wins, unless it is older
				# than the branch point or there is a merge
				if (
				new_tt == other_tt
				or not isancestor(new_tt, other_tt)
				or ismerged(dest)
				):
				minor[dest] = value


	def _forwardcopies(a, b, base=None, match=None):			def _forwardcopies(a, b, base=None, match=None):
	"""find {dst@b: src@a} copy mapping where a is an ancestor of b"""			"""find {dst@b: src@a} copy mapping where a is an ancestor of b"""

	if base is None:			if base is None:
	base = a			base = a
	match = a.repo().narrowmatch(match)			match = a.repo().narrowmatch(match)
	# check for working copy			# check for working copy
	if b.rev() is None:			if b.rev() is None:

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	22903		Sep 28 2020, 12:32 PM	★	★
Diff 2	23048	rHGad6ebb6f0dfe69a2657105bd5c3eb636f5928751	Sep 25 2020, 8:39 AM	★	★