This is an archive of the discontinued Mercurial Phabricator instance.

Differential D4642

localrepo: iteratively derive local repository type
ClosedPublic

Authored by indygreg on Sep 18 2018, 6:40 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Commits

rHGe4e881572382: localrepo: iteratively derive local repository type

Summary

This commit implements the dynamic local repository type derivation
that was explained in the recent commit
bfeab472e3c0 "localrepo: create new function for instantiating a local
repo object."

Instead of a static localrepository class/type which must be customized
after construction, we now dynamically construct a type by building up
base classes/types to represent specific repository interfaces.

Conceptually, the end state is similar to what was happening when
various extensions would monkeypatch the class of newly-constructed
repo instances. However, the approach is inverted. Instead of making
the instance then customizing it, we do the customization up front
by influencing the behavior of the type then we instantiate that
custom type.

This approach gives us much more flexibility. For example, we can
use completely separate classes for implementing different aspects
of the repository. For example, we could have one class representing
revlog-based file storage and another representing non-revlog based
file storage. When then choose which implementation to use based on
the presence of repo requirements.

A concern with this approach is that it creates a lot more types
and complexity and that complexity adds overhead. Yes, it is true that
this approach will result in more types being created. Yes, this is
more complicated than traditional "instantiate a static type." However,
I believe the alternatives to supporting alternate storage backends
are just as complicated. (Before I arrived at this solution, I had
patches storing factory functions on local repo instances for e.g.
constructing a file storage instance. We ended up having a handful
of these. And this was logically identical to assigning custom
methods. Since we were logically changing the type of the instance,
I figured it would be better to just use specialized types instead
of introducing levels of abstraction at run-time.)

On the performance front, I don't believe that having N base classes
has any significant performance overhead compared to just a single base
class. Intuition says that Python will need to iterate the base classes
to find an attribute. However, CPython caches method lookups: as long as
the class or MRO isn't changing, method attribute lookup should be
constant time after first access. And non-method attributes are stored
in dict, of which there is only 1 per object, so the number of
base classes for dict is irrelevant.

Anyway, this commit splits up the monolithic completelocalrepository
interface into sub-interfaces: 1 for file storage and 1 representing
everything else.

We've taught `makelocalrepository()` to call a series of factory
functions which will produce types implementing specific interfaces.
It then calls type() to create a new type from the built-up list of
base types.

This commit should be considered a start and not the end state. I
suspect we'll hit a number of problems as we start to implement
alternate storage backends:

Passing custom arguments to init and setting custom attributes on dict.
Customizing the set of interfaces that are needed. e.g. the "readonly" intent could translate to not requesting an interface providing methods related to writing.
More ergonomic way for extensions to insert themselves so their callbacks aren't unconditionally called.
Wanting to modify vfs instances, other arguments passed to init.

That being said, this code is usable in its current state and I'm
convinced future commits will demonstrate the value in this approach.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

indygreg created this revision.Sep 18 2018, 6:40 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptSep 18 2018, 6:40 PM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

indygreg added a child revision: D4643: filelog: custom filelog to be used with narrow repos.Sep 18 2018, 6:40 PM

+ Extensions should wrap these factory functions to customize repository type
+ creation. Note that an extension's wrapped function may be called even if
+ that extension is not loaded for the repo being constructed. Extensions
+ should check if their `__name__` appears in the
+ `extensionmodulenames` set passed to the factory function and no-op if
+ not.

I assume this will be revisited later. I think it's a source of bugs to
relying on extensions to check if they are enabled. It's also cumbersome
to wrap a function referenced from another table.

Closed by commit rHGe4e881572382: localrepo: iteratively derive local repository type (authored by indygreg). · Explain WhySep 23 2018, 5:38 AM

This revision was automatically updated to reflect the committed changes.

In D4642#71269, @yuja wrote:

+ Extensions should wrap these factory functions to customize repository type
+ creation. Note that an extension's wrapped function may be called even if
+ that extension is not loaded for the repo being constructed. Extensions
+ should check if their `__name__` appears in the
+ `extensionmodulenames` set passed to the factory function and no-op if
+ not.

I assume this will be revisited later. I think it's a source of bugs to
relying on extensions to check if they are enabled. It's also cumbersome
to wrap a function referenced from another table.

I agree it is not great and I hope to revisit this problem.

FWIW we have similar problems with extensions.wrapfunction(). Many function wrappings bury their head in the sand with regards to universal function wrapping in multi-repo contexts. e.g. in hgweb, repo A can load an extension which wraps a function. Repo B doesn't want that extension loaded but the process still has the function wrapped. I'm not sure how to best handle this :/

In D4642#71399, @indygreg wrote:

In D4642#71269, @yuja wrote:

+ Extensions should wrap these factory functions to customize repository type
+ creation. Note that an extension's wrapped function may be called even if
+ that extension is not loaded for the repo being constructed. Extensions
+ should check if their `__name__` appears in the
+ `extensionmodulenames` set passed to the factory function and no-op if
+ not.

I assume this will be revisited later. I think it's a source of bugs to
relying on extensions to check if they are enabled. It's also cumbersome
to wrap a function referenced from another table.

I agree it is not great and I hope to revisit this problem.
FWIW we have similar problems with extensions.wrapfunction(). Many function wrappings bury their head in the sand with regards to universal function wrapping in multi-repo contexts. e.g. in hgweb, repo A can load an extension which wraps a function. Repo B doesn't want that extension loaded but the process still has the function wrapped. I'm not sure how to best handle this :/

I like the @decorator style registration that evolve does. I wonder if that can be combined with the repository "features" functionality, such that the extension can still globally wrap functions, and add features in say, reposetup(). The decorator could be declared with a feature that needs to be present, and that backing code check for it by default before either calling the wrapper or orig(). I guess one problem is that not every function takes a repo. But without a repo, I'm not sure how any decision can be made outside of individual extensions.

Even if this only covers 75% of the cases, it at least gives visibility to the problem to extension authors. I didn't realize it was a problem until I ran into it with LFS.

That's a good idea to use decorators for this problem space! I'll think about that as I wrote more code for instantiating repo types...

Revision Contents
Changeset List

			Path	Packages
M			mercurial/localrepo.py (115 lines)
M			mercurial/repository.py (27 lines)
M			mercurial/statichttprepo.py (3 lines)
M			tests/test-check-interfaces.py (4 lines)

Status	Author	Revision
Closed	indygreg	D4649 narrow: remove narrowrevlog
Closed	indygreg	D4648 localrepo: enable ellipsis flag on revlogs when repo is narrow
Closed	indygreg	D4647 revlog: add opener option to enable ellipsis flag processor
Closed	indygreg	D4646 revlog: store flag processors per revlog
Closed	indygreg	D4645 revlog: define ellipsis flag processors in core
Closed	indygreg	D4644 narrow: remove custom filelog type
Closed	indygreg	D4643 filelog: custom filelog to be used with narrow repos
Closed	indygreg	D4642 localrepo: iteratively derive local repository type
Closed	indygreg	D4641 localrepo: pass root manifest into manifestlog.__init__

Diff 11268

mercurial/localrepo.py

	# is capable of opening. Functions will typically add elements to the			# is capable of opening. Functions will typically add elements to the
	# set to reflect that the extension knows how to handle that requirements.			# set to reflect that the extension knows how to handle that requirements.
	featuresetupfuncs = set()			featuresetupfuncs = set()

	def makelocalrepository(baseui, path, intents=None):			def makelocalrepository(baseui, path, intents=None):
	"""Create a local repository object.			"""Create a local repository object.

	Given arguments needed to construct a local repository, this function			Given arguments needed to construct a local repository, this function
	derives a type suitable for representing that repository and returns an			performs various early repository loading functionality (such as
	instance of it.			reading the ``.hg/requires`` and ``.hg/hgrc`` files), validates that
				the repository can be opened, derives a type suitable for representing
				that repository, and returns an instance of it.

	The returned object conforms to the ``repository.completelocalrepository``			The returned object conforms to the ``repository.completelocalrepository``
	interface.			interface.

				The repository type is derived by calling a series of factory functions
				for each aspect/interface of the final repository. These are defined by
				``REPO_INTERFACES``.

				Each factory function is called to produce a type implementing a specific
				interface. The cumulative list of returned types will be combined into a
				new type and that type will be instantiated to represent the local
				repository.

				The factory functions each receive various state that may be consulted
				as part of deriving a type.

				Extensions should wrap these factory functions to customize repository type
				creation. Note that an extension's wrapped function may be called even if
				that extension is not loaded for the repo being constructed. Extensions
				should check if their ``__name__`` appears in the
				``extensionmodulenames`` set passed to the factory function and no-op if
				not.
	"""			"""
	ui = baseui.copy()			ui = baseui.copy()
	# Prevent copying repo configuration.			# Prevent copying repo configuration.
	ui.copy = baseui.copy			ui.copy = baseui.copy

	# Working directory VFS rooted at repository root.			# Working directory VFS rooted at repository root.
	wdirvfs = vfsmod.vfs(path, expandpath=True, realpath=True)			wdirvfs = vfsmod.vfs(path, expandpath=True, realpath=True)

	# process any new extensions that it may have pulled in.			# process any new extensions that it may have pulled in.
	try:			try:
	ui.readconfig(hgvfs.join(b'hgrc'), root=wdirvfs.base)			ui.readconfig(hgvfs.join(b'hgrc'), root=wdirvfs.base)
	except IOError:			except IOError:
	pass			pass
	else:			else:
	extensions.loadall(ui)			extensions.loadall(ui)

				# Set of module names of extensions loaded for this repository.
				extensionmodulenames = {m.__name__ for n, m in extensions.extensions(ui)}

	supportedrequirements = gathersupportedrequirements(ui)			supportedrequirements = gathersupportedrequirements(ui)

	# We first validate the requirements are known.			# We first validate the requirements are known.
	ensurerequirementsrecognized(requirements, supportedrequirements)			ensurerequirementsrecognized(requirements, supportedrequirements)

	# Then we validate that the known set is reasonable to use together.			# Then we validate that the known set is reasonable to use together.
	ensurerequirementscompatible(ui, requirements)			ensurerequirementscompatible(ui, requirements)


	storevfs = store.vfs			storevfs = store.vfs
	storevfs.options = resolvestorevfsoptions(ui, requirements)			storevfs.options = resolvestorevfsoptions(ui, requirements)

	# The cache vfs is used to manage cache files.			# The cache vfs is used to manage cache files.
	cachevfs = vfsmod.vfs(cachepath, cacheaudited=True)			cachevfs = vfsmod.vfs(cachepath, cacheaudited=True)
	cachevfs.createmode = store.createmode			cachevfs.createmode = store.createmode

	return localrepository(			# Now resolve the type for the repository object. We do this by repeatedly
				# calling a factory function to produces types for specific aspects of the
				# repo's operation. The aggregate returned types are used as base classes
				# for a dynamically-derived type, which will represent our new repository.

				bases = []
				extrastate = {}

				for iface, fn in REPO_INTERFACES:
				# We pass all potentially useful state to give extensions tons of
				# flexibility.
				typ = fn(ui=ui,
				intents=intents,
				requirements=requirements,
				wdirvfs=wdirvfs,
				hgvfs=hgvfs,
				store=store,
				storevfs=storevfs,
				storeoptions=storevfs.options,
				cachevfs=cachevfs,
				extensionmodulenames=extensionmodulenames,
				extrastate=extrastate,
				baseclasses=bases)

				if not isinstance(typ, type):
				raise error.ProgrammingError('unable to construct type for %s' %
				iface)

				bases.append(typ)

				# type() allows you to use characters in type names that wouldn't be
				# recognized as Python symbols in source code. We abuse that to add
				# rich information about our constructed repo.
				name = pycompat.sysstr(b'derivedrepo:%s<%s>' % (
				wdirvfs.base,
				b','.join(sorted(requirements))))

				cls = type(name, tuple(bases), {})

				return cls(
	baseui=baseui,			baseui=baseui,
	ui=ui,			ui=ui,
	origroot=path,			origroot=path,
	wdirvfs=wdirvfs,			wdirvfs=wdirvfs,
	hgvfs=hgvfs,			hgvfs=hgvfs,
	requirements=requirements,			requirements=requirements,
	supportedrequirements=supportedrequirements,			supportedrequirements=supportedrequirements,
	sharedpath=storebasepath,			sharedpath=storebasepath,
	options[b'maxchainlen'] = maxchainlen			options[b'maxchainlen'] = maxchainlen

	for r in requirements:			for r in requirements:
	if r.startswith(b'exp-compression-'):			if r.startswith(b'exp-compression-'):
	options[b'compengine'] = r[len(b'exp-compression-'):]			options[b'compengine'] = r[len(b'exp-compression-'):]

	return options			return options

	@interfaceutil.implementer(repository.completelocalrepository)			def makemain(**kwargs):
				"""Produce a type conforming to ``ilocalrepositorymain``."""
				return localrepository

				@interfaceutil.implementer(repository.ilocalrepositoryfilestorage)
				class revlogfilestorage(object):
				"""File storage when using revlogs."""

				def file(self, path):
				if path[0] == b'/':
				path = path[1:]

				return filelog.filelog(self.svfs, path)

				def makefilestorage(requirements, **kwargs):
				"""Produce a type conforming to ``ilocalrepositoryfilestorage``."""
				return revlogfilestorage

				# List of repository interfaces and factory functions for them. Each
				# will be called in order during ``makelocalrepository()`` to iteratively
				# derive the final type for a local repository instance.
				REPO_INTERFACES = [
				(repository.ilocalrepositorymain, makemain),
				(repository.ilocalrepositoryfilestorage, makefilestorage),
				]

				@interfaceutil.implementer(repository.ilocalrepositorymain)
	class localrepository(object):			class localrepository(object):
				"""Main class for representing local repositories.

				All local repositories are instances of this class.

				Constructed on its own, instances of this class are not usable as
				repository objects. To obtain a usable repository object, call
				``hg.repository()``, ``localrepo.instance()``, or
				``localrepo.makelocalrepository()``. The latter is the lowest-level.
				``instance()`` adds support for creating new repositories.
				``hg.repository()`` adds more extension integration, including calling
				``reposetup()``. Generally speaking, ``hg.repository()`` should be
				used.
				"""

	# obsolete experimental requirements:			# obsolete experimental requirements:
	# - manifestv2: An experimental new manifest format that allowed			# - manifestv2: An experimental new manifest format that allowed
	# for stem compression of long paths. Experiment ended up not			# for stem compression of long paths. Experiment ended up not
	# being successful (repository sizes went up due to worse delta			# being successful (repository sizes went up due to worse delta
	# chains), and the code was deleted in 4.6.			# chains), and the code was deleted in 4.6.
	supportedformats = {			supportedformats = {
	'revlogv1',			'revlogv1',
	'''the type of shared repository (None if not shared)'''			'''the type of shared repository (None if not shared)'''
	if self.sharedpath != self.path:			if self.sharedpath != self.path:
	return 'store'			return 'store'
	return None			return None

	def wjoin(self, f, *insidef):			def wjoin(self, f, *insidef):
	return self.vfs.reljoin(self.root, f, *insidef)			return self.vfs.reljoin(self.root, f, *insidef)

	def file(self, f):
	if f[0] == '/':
	f = f[1:]
	return filelog.filelog(self.svfs, f)

	def setparents(self, p1, p2=nullid):			def setparents(self, p1, p2=nullid):
	with self.dirstate.parentchange():			with self.dirstate.parentchange():
	copies = self.dirstate.setparents(p1, p2)			copies = self.dirstate.setparents(p1, p2)
	pctx = self[p1]			pctx = self[p1]
	if copies:			if copies:
	# Adjust copy records, the dirstate cannot do it, it			# Adjust copy records, the dirstate cannot do it, it
	# requires access to parents manifests. Preserve them			# requires access to parents manifests. Preserve them
	# only for entries added to first parent.			# only for entries added to first parent.

mercurial/repository.py

	"""Clear caches associated with this collection."""			"""Clear caches associated with this collection."""

	def rev(node):			def rev(node):
	"""Obtain the revision number for a binary node.			"""Obtain the revision number for a binary node.

	Raises ``error.LookupError`` if the node is not known.			Raises ``error.LookupError`` if the node is not known.
	"""			"""

	class completelocalrepository(interfaceutil.Interface):			class ilocalrepositoryfilestorage(interfaceutil.Interface):
	"""Monolithic interface for local repositories.			"""Local repository sub-interface providing access to tracked file storage.

				This interface defines how a repository accesses storage for a single
				tracked file path.
				"""

				def file(f):
				"""Obtain a filelog for a tracked path.

				The returned type conforms to the ``ifilestorage`` interface.
				"""

				class ilocalrepositorymain(interfaceutil.Interface):
				"""Main interface for local repositories.

	This currently captures the reality of things - not how things should be.			This currently captures the reality of things - not how things should be.
	"""			"""

	supportedformats = interfaceutil.Attribute(			supportedformats = interfaceutil.Attribute(
	"""Set of requirements that apply to stream clone.			"""Set of requirements that apply to stream clone.

	This is actually a class attribute and is shared among all instances.			This is actually a class attribute and is shared among all instances.
	pass			pass

	def shared():			def shared():
	"""The type of shared repository or None."""			"""The type of shared repository or None."""

	def wjoin(f, *insidef):			def wjoin(f, *insidef):
	"""Calls self.vfs.reljoin(self.root, f, *insidef)"""			"""Calls self.vfs.reljoin(self.root, f, *insidef)"""

	def file(f):
	"""Obtain a filelog for a tracked path.

	The returned type conforms to the ``ifilestorage`` interface.
	"""

	def setparents(p1, p2):			def setparents(p1, p2):
	"""Set the parent nodes of the working directory."""			"""Set the parent nodes of the working directory."""

	def filectx(path, changeid=None, fileid=None):			def filectx(path, changeid=None, fileid=None):
	"""Obtain a filectx for the given file revision."""			"""Obtain a filectx for the given file revision."""

	def getcwd():			def getcwd():
	"""Obtain the current working directory from the dirstate."""			"""Obtain the current working directory from the dirstate."""
	def listkeys(namespace):			def listkeys(namespace):
	pass			pass

	def debugwireargs(one, two, three=None, four=None, five=None):			def debugwireargs(one, two, three=None, four=None, five=None):
	pass			pass

	def savecommitmessage(text):			def savecommitmessage(text):
	pass			pass

				class completelocalrepository(ilocalrepositorymain,
				ilocalrepositoryfilestorage):
				"""Complete interface for a local repository."""

mercurial/statichttprepo.py

	return statichttpvfs			return statichttpvfs

	class statichttppeer(localrepo.localpeer):			class statichttppeer(localrepo.localpeer):
	def local(self):			def local(self):
	return None			return None
	def canpush(self):			def canpush(self):
	return False			return False

	class statichttprepository(localrepo.localrepository):			class statichttprepository(localrepo.localrepository,
				localrepo.revlogfilestorage):
	supported = localrepo.localrepository._basesupported			supported = localrepo.localrepository._basesupported

	def __init__(self, ui, path):			def __init__(self, ui, path):
	self._url = path			self._url = path
	self.ui = ui			self.ui = ui

	self.root = path			self.root = path
	u = util.url(path.rstrip('/') + "/.hg")			u = util.url(path.rstrip('/') + "/.hg")

tests/test-check-interfaces.py

	checkzobject(bundlerepo.bundlepeer(dummyrepo()))			checkzobject(bundlerepo.bundlepeer(dummyrepo()))

	ziverify.verifyClass(repository.ipeerbase, statichttprepo.statichttppeer)			ziverify.verifyClass(repository.ipeerbase, statichttprepo.statichttppeer)
	checkzobject(statichttprepo.statichttppeer(dummyrepo()))			checkzobject(statichttprepo.statichttppeer(dummyrepo()))

	ziverify.verifyClass(repository.ipeerbase, unionrepo.unionpeer)			ziverify.verifyClass(repository.ipeerbase, unionrepo.unionpeer)
	checkzobject(unionrepo.unionpeer(dummyrepo()))			checkzobject(unionrepo.unionpeer(dummyrepo()))

	ziverify.verifyClass(repository.completelocalrepository,			ziverify.verifyClass(repository.ilocalrepositorymain,
	localrepo.localrepository)			localrepo.localrepository)
				ziverify.verifyClass(repository.ilocalrepositoryfilestorage,
				localrepo.revlogfilestorage)
	repo = localrepo.makelocalrepository(ui, rootdir)			repo = localrepo.makelocalrepository(ui, rootdir)
	checkzobject(repo)			checkzobject(repo)

	ziverify.verifyClass(wireprototypes.baseprotocolhandler,			ziverify.verifyClass(wireprototypes.baseprotocolhandler,
	wireprotoserver.sshv1protocolhandler)			wireprotoserver.sshv1protocolhandler)
	ziverify.verifyClass(wireprototypes.baseprotocolhandler,			ziverify.verifyClass(wireprototypes.baseprotocolhandler,
	wireprotoserver.sshv2protocolhandler)			wireprotoserver.sshv2protocolhandler)
	ziverify.verifyClass(wireprototypes.baseprotocolhandler,			ziverify.verifyClass(wireprototypes.baseprotocolhandler,

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	11176		Sep 18 2018, 6:40 PM	★	★
Diff 2	11268	rHGe4e8815723821db0351cbca4fb33647f50f7a9b2	Sep 18 2018, 6:29 PM	★	★