Download Raw Diff

Details

Reviewers

Group Reviewers

Commits

rHG0903d6b9b1df: repository: introduce register_changeset callback

Summary

The new callback is called whenever a changeset is added to the repository
(commit, unbundle or exchange). Since the bulk operations already parse
the changeset (readfiles or full changesetrevision), always use the
latter to avoid redundant lookups. The first consumer of the new
interface needs to look at extra.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

joerg.sonnenberger created this revision.Jan 14 2021, 8:19 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptJan 14 2021, 8:19 PM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

joerg.sonnenberger added a child revision: D9781: branchmap: update rev-branch-cache incrementally.Jan 14 2021, 8:19 PM

joerg.sonnenberger mentioned this in D9573: branchmap: update rev-branch-cache incrementally.Jan 14 2021, 8:20 PM

Still not convinced with the API, so poking at what else would be possible.
I nothing obvious emerge we should move forward with that series before the freeze, the change final change is quite valuable.

So I am still not fan of this API:

it pass around quite high level object where I think rev would be more appropriate,
it rely on revision adder to call the callback themself in "high level" layer, make this odd of this being forgotten high and the API more fragile,
it trigger the method even for revision "not actually added because we know them already"

I poked around at an alternative approach that is lower level (so less fragile), is only called for revision actually added, and pass revision number around, relying on some simple cache mechanism to keep things performant.

See D9826, D9827, and D9828

This revision now requires changes to proceed.Jan 18 2021, 1:15 PM

In D9780#148725, @marmoute wrote:

it pass around quite high level object where I think rev would be more appropriate,

I don't disagree, but that's a more general change IMO. I really dislike D9826 as it introduces new hidden assumptions about how a revlog backend works. I was looking at whether the higher layers actually care about the new node much, but haven't pushed into that direction yet to either give the callbacks always both node and rev or just rev. So any change in that area should start by _addrevision returning either both node,rev or just rev. But that's quite a different scope all by itself.

it rely on revision adder to call the callback themself in "high level" layer, make this odd of this being forgotten high and the API more fragile,

I did original versions by hooking into _addrevision, but ultimately decided against it because there are very few such high-level places and they already fail badly if changes slip through. Given that it completely avoids the need for another caching layer for changelogrevisions to perform well, but seems much better to do it on the higher layer. I've also tried such a layer, but it also came at a measurable price in my tests since it is actually not that hot right now.

it trigger the method even for revision "not actually added because we know them already"

This is only possible for commitctx and I'm not sure that is expected to trigger a duplicate either. The other cases all have to deal with the difference between duplicate and newly added anyway.

In D9780#148752, @joerg.sonnenberger wrote:

In D9780#148725, @marmoute wrote:

it pass around quite high level object where I think rev would be more appropriate,

I don't disagree, but that's a more general change IMO. I really dislike D9826 as it introduces new hidden assumptions about how a revlog backend works.

How so ?

I was looking at whether the higher layers actually care about the new node much, but haven't pushed into that direction yet to either give the callbacks always both node and rev or just rev. So any change in that area should start by _addrevision returning either both node,rev or just rev. But that's quite a different scope all by itself.

I am not sure what you mean, but that is probably related to the fact I did not get the first part :-)

it rely on revision adder to call the callback themself in "high level" layer, make this odd of this being forgotten high and the API more fragile,

I did original versions by hooking into _addrevision, but ultimately decided against it because there are very few such high-level places and they already fail badly if changes slip through. Given that it completely avoids the need for another caching layer for changelogrevisions to perform well, but seems much better to do it on the higher layer. I've also tried such a layer, but it also came at a measurable price in my tests since it is actually not that hot right now.

Do you have a way to get number of my proposal ? The caching layer is really simple and I don't expect a large overhead. Even if they are few high level place and really prefer contained interface.

it trigger the method even for revision "not actually added because we know them already"

This is only possible for commitctx and I'm not sure that is expected to trigger a duplicate either. The other cases all have to deal with the difference between duplicate and newly added anyway.

If if possible for bundle too, is it not ? You just have to apply a bundle you already have, don't you ? Having the callback call only for new rev seems simpler as code in such callback don't have to deal with the de duplication each.

@joerg.sonnenberger and I had a discussion on IRC about this API and the result is that Joerg prefers the higher level API for reason I now understand better without necessarly finding them decisive while I still prefer the lower level API for reason that (hopefully) Joerg understand better without finding them more decisive. So we are doing to need a bit more time to thing about that (and probably about the broader picture of the full API of the involved object) with probably more people.

So lets delay this for later during/after the freeze.

Don't forget that we both like the two previous Diff (D9778 and D9779) and we would like to see them in 5.7 if possible.

joerg.sonnenberger edited parent revisions, added: D9831: exchangev2: avoid second look-up by node; removed: D9779: changelog: move branchinfo to changelogrevision.Jan 18 2021, 6:51 PM

joerg.sonnenberger updated this revision to Diff 25144.

As discussed, move to using revision for the new function. Most of the prep work is factored out into smaller changesets, except changegroup.py, since that would create a small penalty by itself for no good reason. The goal for follow-up changes is to provide the revision from _addrevision directly (either instead or in addition to the node).

joerg.sonnenberger updated this revision to Diff 25163.Jan 19 2021, 9:55 PM

I like the fact we now take a rev as argument. However two discussion remains:

could/should we stop passing the changelogrevision as argument ?
should we move this within _add_revision or keep it at an higher level ?

In D9780#149086, @marmoute wrote:

I like the fact we now take a rev as argument. However two discussion remains:

could/should we stop passing the changelogrevision as argument ?

The performance critical path is IMO in during changegroup application (unbundle) and that one is already doing most of the work.
There are some good chances that consumers of the API will look at older revisions in at least some use cases, so the simple
last-use-cache won't work as well in that case. As such I think it is both simpler and more predictable to do the work once.

should we move this within _add_revision or keep it at an higher level ?

I think the behavior is still more consistent on the higher level. I am looking at a follow up change for the tags cache and that needs a secondary extension point for when manifests are added after changesets. I'm not sure yet if there is a use case for notification on all manifests.

joerg.sonnenberger edited parent revisions, added: D9779: changelog: move branchinfo to changelogrevision; removed: D9831: exchangev2: avoid second look-up by node.Jan 21 2021, 12:56 PM

joerg.sonnenberger updated this revision to Diff 25204.

In D9780#149093, @joerg.sonnenberger wrote:

In D9780#149086, @marmoute wrote:

I like the fact we now take a rev as argument. However two discussion remains:

could/should we stop passing the changelogrevision as argument ?

The performance critical path is IMO in during changegroup application (unbundle) and that one is already doing most of the work.
There are some good chances that consumers of the API will look at older revisions in at least some use cases, so the simple
last-use-cache won't work as well in that case. As such I think it is both simpler and more predictable to do the work once.

They are enough good argument here (for example the caching being fragile if callback do back things) to convince me here. Can you elaborate the method documentation a bit to highlight that?

should we move this within _add_revision or keep it at an higher level ?

I think the behavior is still more consistent on the higher level. I am looking at a follow up change for the tags cache and that needs a secondary extension point for when manifests are added after changesets. I'm not sure yet if there is a use case for notification on all manifests.

I am not convinced yet, but this is kind of a minor point. I am not sure about what is needed for the manifest, but for ChangedFiles related thing, we will definitely requires to do a second pass on changesets once the manifest and filelog are in. So they might be things common here.

joerg.sonnenberger updated this revision to Diff 25207.Jan 21 2021, 5:28 PM

joerg.sonnenberger added a commit: rHG0903d6b9b1df: repository: introduce register_changeset callback.Jan 22 2021, 3:39 PM

This revision was not accepted when it landed; it landed in state Needs Review.

Closed by commit rHG0903d6b9b1df: repository: introduce register_changeset callback (authored by joerg.sonnenberger). · Explain Why

This revision was automatically updated to reflect the committed changes.

			Path	Packages
M			mercurial/changegroup.py (5 lines)
M			mercurial/commit.py (3 lines)
M			mercurial/exchangev2.py (2 lines)
M			mercurial/interfaces/repository.py (8 lines)
M			mercurial/localrepo.py (3 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	24866		Jan 14 2021, 8:19 PM	★	★
Diff 2	25144		Jan 18 2021, 6:51 PM	★	★
Diff 3	25163		Jan 19 2021, 9:55 PM	★	★
Diff 4	25204		Jan 21 2021, 12:56 PM	★	★
Diff 5	25207		Jan 21 2021, 5:28 PM	★	★
Diff 6	25226	rHG0903d6b9b1dfc66f52fcb5794a71a2cde22313ef	Jan 18 2021, 6:20 PM	★	★

	efilesset = set()			efilesset = set()
	cgnodes = []			cgnodes = []

	def ondupchangelog(cl, node):			def ondupchangelog(cl, node):
	if cl.rev(node) < clstart:			if cl.rev(node) < clstart:
	cgnodes.append(node)			cgnodes.append(node)

	def onchangelog(cl, node):			def onchangelog(cl, node):
	efilesset.update(cl.readfiles(node))			rev = cl.rev(node)
				ctx = cl.changelogrevision(rev)
				efilesset.update(ctx.files)
				repo.register_changeset(rev, ctx)

	self.changelogheader()			self.changelogheader()
	deltas = self.deltaiter()			deltas = self.deltaiter()
	if not cl.addgroup(			if not cl.addgroup(
	deltas,			deltas,
	csmap,			csmap,
	trp,			trp,
	addrevisioncb=onchangelog,			addrevisioncb=onchangelog,

	tr,			tr,
	p1.node(),			p1.node(),
	p2.node(),			p2.node(),
	user,			user,
	ctx.date(),			ctx.date(),
	extra,			extra,
	)			)
	rev = repo[n].rev()			rev = repo[n].rev()
				if oldtip != repo.changelog.tiprev():
				repo.register_changeset(rev, repo.changelog.changelogrevision(rev))

	xp1, xp2 = p1.hex(), p2 and p2.hex() or b''			xp1, xp2 = p1.hex(), p2 and p2.hex() or b''
	repo.hook(			repo.hook(
	b'pretxncommit',			b'pretxncommit',
	throw=True,			throw=True,
	node=hex(n),			node=hex(n),
	parent1=xp1,			parent1=xp1,
	parent2=xp2,			parent2=xp2,
	)			)

	rev = cl.rev(node)			rev = cl.rev(node)
	revision = cl.changelogrevision(rev)			revision = cl.changelogrevision(rev)
	added.append(node)			added.append(node)

	# We need to preserve the mapping of changelog revision to node			# We need to preserve the mapping of changelog revision to node
	# so we can set the linkrev accordingly when manifests are added.			# so we can set the linkrev accordingly when manifests are added.
	manifestnodes[rev] = revision.manifest			manifestnodes[rev] = revision.manifest

				repo.register_changeset(rev, revision)

	nodesbyphase = {phase: set() for phase in phases.phasenames.values()}			nodesbyphase = {phase: set() for phase in phases.phasenames.values()}
	remotebookmarks = {}			remotebookmarks = {}

	# addgroup() expects a 7-tuple describing revisions. This normalizes			# addgroup() expects a 7-tuple describing revisions. This normalizes
	# the wire data to that format.			# the wire data to that format.
	#			#
	# This loop also aggregates non-revision metadata, such as phase			# This loop also aggregates non-revision metadata, such as phase
	# data.			# data.

This is an archive of the discontinued Mercurial Phabricator instance.

repository: introduce register_changeset callback
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 25226

mercurial/changegroup.py

mercurial/commit.py

mercurial/exchangev2.py

mercurial/interfaces/repository.py

mercurial/localrepo.py

	"""Return the list of bookmarks pointing to the specified node."""			"""Return the list of bookmarks pointing to the specified node."""

	def branchmap():			def branchmap():
	"""Return a mapping of branch to heads in that branch."""			"""Return a mapping of branch to heads in that branch."""

	def revbranchcache():			def revbranchcache():
	pass			pass

				def register_changeset(rev, changelogrevision):
				"""Extension point for caches for new nodes.

				Multiple consumers are expected to need parts of the changelogrevision,
				so it is provided as optimization to avoid duplicate lookups. A simple
				cache would be fragile when other revisions are accessed, too."""
				pass

	def branchtip(branchtip, ignoremissing=False):			def branchtip(branchtip, ignoremissing=False):
	"""Return the tip node for a given branch."""			"""Return the tip node for a given branch."""

	def lookup(key):			def lookup(key):
	"""Resolve the node for a revision."""			"""Resolve the node for a revision."""

	def lookupbranch(key):			def lookupbranch(key):
	"""Look up the branch name of the given revision or branch name."""			"""Look up the branch name of the given revision or branch name."""

	return self._branchcaches[self]			return self._branchcaches[self]

	@unfilteredmethod			@unfilteredmethod
	def revbranchcache(self):			def revbranchcache(self):
	if not self._revbranchcache:			if not self._revbranchcache:
	self._revbranchcache = branchmap.revbranchcache(self.unfiltered())			self._revbranchcache = branchmap.revbranchcache(self.unfiltered())
	return self._revbranchcache			return self._revbranchcache

				def register_changeset(self, rev, changelogrevision):
				pass

	def branchtip(self, branch, ignoremissing=False):			def branchtip(self, branch, ignoremissing=False):
	"""return the tip node for a given branch			"""return the tip node for a given branch

	If ignoremissing is True, then this method will not raise an error.			If ignoremissing is True, then this method will not raise an error.
	This is helpful for callers that only expect None for a missing branch			This is helpful for callers that only expect None for a missing branch
	(e.g. namespace).			(e.g. namespace).

	"""			"""

This is an archive of the discontinued Mercurial Phabricator instance.

repository: introduce register_changeset callbackClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 25226

mercurial/changegroup.py

mercurial/commit.py

mercurial/exchangev2.py

mercurial/interfaces/repository.py

mercurial/localrepo.py

repository: introduce register_changeset callback
ClosedPublic

Revision Contents
Changeset List