This is an archive of the discontinued Mercurial Phabricator instance.

branchmap: explicitly warm+write all subsets of the branchmap caches
ClosedPublic

Authored by spectral on Aug 2 2019, 9:26 PM.

Download Raw Diff

Details

Reviewers

marmoute
pulkit

Group Reviewers

hg-reviewers

Commits

rHGcdf0e9523de1: branchmap: explicitly warm+write all subsets of the branchmap caches

Summary

'full' claims it will warm all of the caches that are known about, but this was
not the case - it did not actually warm the branchmap caches for subsets that we
haven't requested, or for subsets that are still considered "valid". By
explicitly writing them to disk, we can force the subsets for ex: "served" to be
written ("immutable" and "base"), making it cheaper to calculate "served" the
next time it needs to be updated.

Diff Detail

Repository

rHG Mercurial

Lint

Lint Skipped

Unit

Unit Tests Skipped

Event Timeline

spectral created this revision.Aug 2 2019, 9:26 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptAug 2 2019, 9:26 PM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

spectral added a child revision: D6711: branchheads: store wdir-dependent caches in wcache (issue6181).Aug 2 2019, 9:26 PM

Needs some test updates.

Overall principle seems good. I made couple of inline comment.

mercurial/localrepo.py
2200	Should we have this list explicitly stored in a list next to the filtermap ? That would seems more robust to future changes.
2224	Why the explicite write here ? We don't seems to need it for the previous section. Is this because if the cache of the previous subset is valid, the write would be skipped ? If so, consider clarifying it in your comment.

spectral marked an inline comment as done.Aug 5 2019, 9:07 PM

spectral added a parent revision: D6719: branchmap: refresh all "heads" of the branchmap subsets.

spectral retitled this revision from branchmap: properly refresh/warm all branchmap caches to branchmap: explicitly warm+write all subsets of the branchmap caches.

spectral edited the summary of this revision. (Show Details)

spectral updated this revision to Diff 16129.

spectral added inline comments.Aug 5 2019, 9:12 PM

mercurial/localrepo.py
2224	Actually it's because the documentation for the function states that it will "warm the caches", "even the ones usually loaded more lazily". If nothing in hg actually explicitly requests the subset, it won't be written: $ hg init; echo hi > foo; hg ci -qAm foo; ls .hg/cache branch2-served evoext-obscache-00 rbc-names-v1 rbc-revs-v1 This would have, I'd thought, written out -served, -immutable, and -base, since -immutable and -base are subsets of -served, but that doesn't seem to happen. Even if I run `hg debugupdatecache` (before this change) they don't get written: branch2-served evoext-obscache-00 hgtagsfnodes1 rbc-names-v1 rbc-revs-v1 tags2 tags2-served If the intent of `hg debugupdatecache` is to actually warm all levels of cache, it should probably warm -immutable and -base, so that they're kept up to date? Or is that undesirable for some reason (maybe it causes additional computation every time the cache for -served is updated if -immutable and -base exist, since they'd also possibly have to be updated? I'd think it'd be the opposite (-base is very cheap to calculate, and unlikely to go stale, can be used to make calculating immutable quicker, and that can be used to make calculating served quicker.. without them, then served has to start from scratch each time; this seems to be the reason for the subsettable :)), but I'm not super familiar with the caching code (and uses of it) to know if this is actually true in practice. That said, I agree that these are two separate concerns, and the number of tests that need to be changed is pretty significant for this one, so I've split this change into two.

Forcing this write seems like a good idea. Having it in its own
changeset seems like a good idea (and please add a comment about forcing
the write).

In D6710#98322, @marmoute wrote:

Forcing this write seems like a good idea. Having it in its own
changeset seems like a good idea (and please add a comment about forcing
the write).

I've split the 'full' change from the one changing what subsets we invalidate already, this one will be used for the 'full' change since it had the most comments. Comment has been added. Please take another look at the whole stack.

We could warm them in increasing order to improve efficiency. However this is for the full cache warming so this looks good enough. (consider doing them in order in a follow up)

Note: this change seems independant from the previous one, so one might be able to take it on its own

pulkit accepted this revision.Aug 8 2019, 4:42 PM

This revision is now accepted and ready to land.Aug 8 2019, 4:42 PM

Thanks @marmoute for the review.

spectral added a commit: rHGcdf0e9523de1: branchmap: explicitly warm+write all subsets of the branchmap caches.Aug 8 2019, 6:18 PM

Closed by commit rHGcdf0e9523de1: branchmap: explicitly warm+write all subsets of the branchmap caches (authored by spectral). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/localrepo.py (15 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	16115		Aug 2 2019, 9:26 PM	★	★
Diff 2	16129		Aug 5 2019, 9:07 PM	★	★
Diff 3	16163	rHGcdf0e9523de12a98b9395192b7a25108a7d0b36d	Aug 5 2019, 4:31 PM	★	★

Commit	Parents	Author	Summary	Date
f5e0fea188ed	e079e001d536	Kyle Lippincott		Aug 2 2019, 8:55 PM

Status	Author	Revision
Abandoned	spectral	D6711 branchheads: store wdir-dependent caches in wcache (issue6181)
Closed	spectral	D6710 branchmap: explicitly warm+write all subsets of the branchmap caches
Abandoned	spectral	D6719 branchmap: refresh all "heads" of the branchmap subsets

Diff 16115

mercurial/localrepo.py

	up-to-date data. Even the ones usually loaded more lazily.			up-to-date data. Even the ones usually loaded more lazily.
	"""			"""
	if tr is not None and tr.hookargs.get('source') == 'strip':			if tr is not None and tr.hookargs.get('source') == 'strip':
	# During strip, many caches are invalid but			# During strip, many caches are invalid but
	# later call to `destroyed` will refresh them.			# later call to `destroyed` will refresh them.
	return			return

	if tr is None or tr.changes['origrepolen'] < len(self):			if tr is None or tr.changes['origrepolen'] < len(self):
	# accessing the 'ser ved' branchmap should refresh all the others,			# There are three "heads" to the cache hierarchy: visible,
				# visible-hidden, and served.hidden. Updating any of these three
				# should cause all of the others (currently: served, immutable,
				# base) that are stale to be updated.
	self.ui.debug('updating the branch cache\n')			self.ui.debug('updating the branch cache\n')
	self.filtered('served').branchmap()			for filt in ['visible', 'visible-hidden', 'served.hidden']:
				marmouteUnsubmitted Done Should we have this list explicitly stored in a list next to the filtermap ? That would seems more robust to future changes. marmoute: Should we have this list explicitly stored in a list next to the filtermap ? That would seems…
	self.filtered('served.hidden').branchmap()			self.filtered(filt).branchmap()

	if full:			if full:
	unfi = self.unfiltered()			unfi = self.unfiltered()
	rbc = unfi.revbranchcache()			rbc = unfi.revbranchcache()
	for r in unfi.changelog:			for r in unfi.changelog:
	rbc.branchinfo(r)			rbc.branchinfo(r)
	rbc.write()			rbc.write()

	# ensure the working copy parents are in the manifestfulltextcache			# ensure the working copy parents are in the manifestfulltextcache
	for ctx in self['.'].parents():			for ctx in self['.'].parents():
	ctx.manifest() # accessing the manifest is enough			ctx.manifest() # accessing the manifest is enough

	# accessing fnode cache warms the cache			# accessing fnode cache warms the cache
	tagsmod.fnoderevs(self.ui, unfi, unfi.changelog.revs())			tagsmod.fnoderevs(self.ui, unfi, unfi.changelog.revs())
	# accessing tags warm the cache			# accessing tags warm the cache
	self.tags()			self.tags()
	self.filtered('served').tags()			self.filtered('served').tags()

				# Warm the branchmap caches even for caches we haven't needed yet,
				# including forcing a write to disk.
				for filt in repoview.filtertable.keys():
				filtered = self.filtered(filt)
				filtered.branchmap().write(filtered)
				marmouteUnsubmitted Not Done Why the explicite write here ? We don't seems to need it for the previous section. Is this because if the cache of the previous subset is valid, the write would be skipped ? If so, consider clarifying it in your comment. marmoute: Why the explicite write here ? We don't seems to need it for the previous section. Is this…
				spectralAuthorUnsubmitted Done Actually it's because the documentation for the function states that it will "warm the caches", "even the ones usually loaded more lazily". If nothing in hg actually explicitly requests the subset, it won't be written: $ hg init; echo hi > foo; hg ci -qAm foo; ls .hg/cache branch2-served evoext-obscache-00 rbc-names-v1 rbc-revs-v1 This would have, I'd thought, written out -served, -immutable, and -base, since -immutable and -base are subsets of -served, but that doesn't seem to happen. Even if I run `hg debugupdatecache` (before this change) they don't get written: branch2-served evoext-obscache-00 hgtagsfnodes1 rbc-names-v1 rbc-revs-v1 tags2 tags2-served If the intent of `hg debugupdatecache` is to actually warm all levels of cache, it should probably warm -immutable and -base, so that they're kept up to date? Or is that undesirable for some reason (maybe it causes additional computation every time the cache for -served is updated if -immutable and -base exist, since they'd also possibly have to be updated? I'd think it'd be the opposite (-base is very cheap to calculate, and unlikely to go stale, can be used to make calculating immutable quicker, and that can be used to make calculating served quicker.. without them, then served has to start from scratch each time; this seems to be the reason for the subsettable :)), but I'm not super familiar with the caching code (and uses of it) to know if this is actually true in practice. That said, I agree that these are two separate concerns, and the number of tests that need to be changed is pretty significant for this one, so I've split this change into two. spectral: Actually it's because the documentation for the function states that it will "warm the caches"…

	def invalidatecaches(self):			def invalidatecaches(self):

	if r'_tagscache' in vars(self):			if r'_tagscache' in vars(self):
	# can't use delattr on proxy			# can't use delattr on proxy
	del self.__dict__[r'_tagscache']			del self.__dict__[r'_tagscache']

	self._branchcaches.clear()			self._branchcaches.clear()
	self.invalidatevolatilesets()			self.invalidatevolatilesets()