This is an archive of the discontinued Mercurial Phabricator instance.

Differential D4366

treemanifest: introduce lazy loading of subdirs
ClosedPublic

Authored by spectral on Aug 24 2018, 1:36 AM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Commits

rHG93486cc46125: treemanifest: introduce lazy loading of subdirs
rHG43af08f3205f: treemanifest: introduce lazy loading of subdirs

Summary

An earlier patch series made it so that what to load was up to the calling code,
which works fine until manifests are copied - when they're copied, they're
loaded completely and thus we lose the entire benefit.

By lazy loading everything, we can avoid having to pass in the matcher to ~every
manifest function, and handle copies correctly as well. This changeset doesn't
go as far as it could with loading only the necessary subsets, that will happen
in later changes in this series; at the moment, except in a few situations, we
just load everything the moment we want to interact with treemanifest._dirs.
This is thus most likely to be a small slowdown if treemanifests is in use
regardless of whether narrow is in use, but hopefully easier to verify
correctness and review.

This is part of a series of speedups, it is not expected to produce any real speed
improvements itself, but the numbers show that it doesn't produce a large speed
penalty in any common case, and for the cases it does provide a penalty in, it
is not a large absolute amount (even if it is a large percentage amount).

Timing numbers according to command:

hyperfine --prepare <preparation_script> 'hg status'

HGRCPATH points to a file with the following contents:

[extensions]
narrow =
strip =
rebase =

mozilla-unified (called m-u below) was at revision #468856.

      regular hash: eb39298e432d
treemanifests hash: 0553b7f29eaf

large-dir-repo (called l-d-r below) was generated with the following script:

#!/bin/bash
hg init large-dir-repo
mkdir -p large-dir-repo/third_party/rust/log
touch large-dir-repo/third_party/rust/log/foo.txt
for i in $(seq 1 30000); do
    d=$(mktemp -d large-dir-repo/third_party/XXXXXXXXX)
    touch $d/file.txt
done
hg -R large-dir-repo ci -Am 'rev0' --user test --date '0 0'
echo hi > large-dir-repo/third_party/rust/log/bar.txt
hg -R large-dir-repo ci -Am 'rev1' --user test --date '0 0'
echo hi > large-dir-repo/third_party/rust/log/baz.txt
hg -R large-dir-repo ci -Am 'rev2' --user test --date '0 0'

for the repos that use narrow, the narrowspec was this:

[include]
rootfilesin:accessible/jsat
rootfilesin:accessible/tests/mochitest/jsat
rootfilesin:mobile/android/chrome/content
rootfilesin:mobile/android/modules/geckoview
rootfilesin:third_party/rust/log
[exclude]

This narrowspec was chosen due to the size of the third_party/rust directory
(this directory was *not* modified in revision #468856 in mozilla-unified),
plus all the directories that *were* modified in revision #468856 of
mozilla-unified.

Importantly, when using narrow, these repos had everything checked out (in the
case of large-dir-repo, that means all 30,001 directories), *before* adding the
narrowspec. This is to simulate the behavior when using a virtual filesystem
that shows everything for the user even if they haven't added it to the
narrowspec yet. This is not a supported configuration, and hg update and `hg
rebase` will not really do the "correct" thing if there are mutations outside
of the narrowspec (which is not the case in these tests, due to a carefully
crafted narrowspec), but non-mutating commands should behave correctly.

I'm not claiming anything less than a 5% speed win as improvements due to this
change; these are probably eiter measurement artifacts or constant time
improvements. The numbers that aren't changing are shown primarily to prove that
this doesn't make anything worse in any case I plan on testing during this
series.

'before' is hg from commit 6268fed3
'N' indicates narrow in use
'T' indicates treemanifest in use

Please note that these commands and the narrowspec are a little different than
the ones in a similar table that I made in a3cabe9415e1.

Important: it is my understanding that these numbers below are *not super reliable*,
the large slowdowns may be artifacts of some odd interaction between GC and
python module/code complexity. Another changeset of mine (D4351) had shown large
timing differences when ~empty, uncalled functions were added to match.py,
though only when using --color=never or redirecting to /dev/null. We seem to be
on some cusp of complexity or code size that is causing, at my best guess
(according to linux perf benchmarks) GC to alter behavior and cause a
200-400ms difference in timings. I haven't had a chance to replicate these
results on another machine.

diff --git:
repo  | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u   |   |   | 1.580 s +-  0.034 s    | 1.576 s +-  0.022 s   |  99.7%
m-u   |   | x | 1.568 s +-  0.025 s    | 1.584 s +-  0.044 s   | 101.0%
m-u   | x |   | 1.569 s +-  0.031 s    | 1.554 s +-  0.025 s   |  99.0%
m-u   | x | x | 107.3 ms +-   1.6 ms   | 106.3 ms +-   1.5 ms  |  99.1%
l-d-r |   |   | 232.5 ms +-   5.9 ms   | 233.5 ms +-   5.3 ms  | 100.4%
l-d-r |   | x | 236.6 ms +-   6.3 ms   | 233.6 ms +-   7.0 ms  |  98.7%
l-d-r | x |   | 118.4 ms +-   2.1 ms   | 118.4 ms +-   1.4 ms  | 100.0%
l-d-r | x | x | 116.8 ms +-   1.5 ms   | 118.9 ms +-   1.6 ms  | 101.8%

diff -c . --git:
repo  | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u   |   |   | 354.4 ms +-  16.6 ms   | 351.0 ms +-   6.9 ms  |  99.0%
m-u   |   | x | 207.2 ms +-   3.0 ms   | 206.2 ms +-   2.7 ms  |  99.5%
m-u   | x |   | 422.0 ms +-  26.0 ms   | 351.2 ms +-   6.4 ms  |  83.2% <--
m-u   | x | x | 166.7 ms +-   2.1 ms   | 169.5 ms +-   4.1 ms  | 101.7%
l-d-r |   |   | 98.4 ms +-   4.5 ms    | 98.5 ms +-   2.1 ms   | 100.1%
l-d-r |   | x | 5.519 s +-  0.060 s    | 5.149 s +-  0.042 s   |  93.3% <--
l-d-r | x |   | 99.1 ms +-   3.2 ms    | 102.6 ms +-   9.7 ms  | 103.5% <--?
l-d-r | x | x | 994.9 ms +-  10.7 ms   | 1.026 s +-  0.012 s   | 103.1% <--?

rebase -r . --keep -d .^^:
repo  | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u   |   |   | 6.639 s +-  0.168 s    | 6.559 s +-  0.097 s   |  98.8%
m-u   |   | x | 6.601 s +-  0.143 s    | 6.640 s +-  0.207 s   | 100.6%
m-u   | x |   | 6.582 s +-  0.098 s    | 6.543 s +-  0.098 s   |  99.4%
m-u   | x | x | 678.4 ms +-  57.7 ms   | 703.7 ms +-  52.4 ms  | 103.7% <--?
l-d-r |   |   | 780.0 ms +-  23.9 ms   | 776.0 ms +-  12.6 ms  |  99.5%
l-d-r |   | x | 7.520 s +-  0.255 s    | 7.395 s +-  0.044 s   |  98.3%
l-d-r | x |   | 331.9 ms +-  16.5 ms   | 327.0 ms +-   3.4 ms  |  98.5%
l-d-r | x | x | 6.228 s +-  0.113 s    | 5.924 s +-  0.044 s   |  95.1%

status --change . --copies:
repo  | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u   |   |   | 330.8 ms +-   7.2 ms   | 329.0 ms +-   7.1 ms  |  99.5%
m-u   |   | x | 182.9 ms +-   2.7 ms   | 183.5 ms +-   2.7 ms  | 100.3%
m-u   | x |   | 330.0 ms +-   7.6 ms   | 327.1 ms +-   5.4 ms  |  99.1%
m-u   | x | x | 146.2 ms +-   2.4 ms   | 147.1 ms +-   1.3 ms  | 100.6%
l-d-r |   |   | 95.3 ms +-   1.4 ms    | 95.9 ms +-   1.5 ms   | 100.6%
l-d-r |   | x | 5.157 s +-  0.035 s    | 5.166 s +-  0.058 s   | 100.2%
l-d-r | x |   | 99.7 ms +-   3.0 ms    | 100.2 ms +-   4.4 ms  | 100.5%
l-d-r | x | x | 993.6 ms +-  13.1 ms   | 1.025 s +-  0.015 s   | 103.2% <--?

status --copies:
repo  | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u   |   |   | 2.348 s +-  0.031 s    | 2.329 s +-  0.019 s   |  99.2%
m-u   |   | x | 2.337 s +-  0.026 s    | 2.346 s +-  0.034 s   | 100.4%
m-u   | x |   | 2.354 s +-  0.015 s    | 2.342 s +-  0.021 s   |  99.5%
m-u   | x | x | 120.6 ms +-   4.3 ms   | 119.2 ms +-   2.1 ms  |  98.8%
l-d-r |   |   | 731.5 ms +-  11.1 ms   | 719.6 ms +-   9.8 ms  |  98.4%
l-d-r |   | x | 729.0 ms +-  15.5 ms   | 725.7 ms +-  10.6 ms  |  99.5%
l-d-r | x |   | 211.0 ms +-   3.9 ms   | 212.8 ms +-   3.7 ms  | 100.9%
l-d-r | x | x | 211.5 ms +-   4.2 ms   | 211.0 ms +-   3.3 ms  |  99.8%

update $rev^; ~/src/hg/hg{hg}/hg update $rev:
repo  | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before
------+---+---+------------------------+-----------------------+------------
m-u   |   |   | 3.910 s +-  0.055 s    | 3.920 s +-  0.075 s   | 100.3%
m-u   |   | x | 3.613 s +-  0.056 s    | 3.630 s +-  0.056 s   | 100.5%
m-u   | x |   | 3.873 s +-  0.055 s    | 3.864 s +-  0.049 s   |  99.8%
m-u   | x | x | 400.4 ms +-   7.4 ms   | 403.6 ms +-   5.0 ms  | 100.8%
l-d-r |   |   | 531.6 ms +-  10.0 ms   | 528.8 ms +-   9.6 ms  |  99.5%
l-d-r |   | x | 10.377 s +-  0.049 s   | 9.955 s +-  0.046 s   |  95.9%
l-d-r | x |   | 308.3 ms +-   4.4 ms   | 306.8 ms +-   3.7 ms  |  99.5%
l-d-r | x | x | 1.805 s +-  0.015 s    | 1.834 s +-  0.020 s   | 101.6%

Diff Detail

Repository

rHG Mercurial

Lint

Lint Skipped

Unit

Unit Tests Skipped

Event Timeline

spectral created this revision.Aug 24 2018, 1:36 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptAug 24 2018, 1:36 AM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

spectral added a child revision: D4367: treemanifest: attempt to avoid loading all lazily-loaded subdirs in _isempty.Aug 24 2018, 1:36 AM

I'm not sure how I feel about so many methods having the if dir in self._lazydirs: self._loadlazy(dir) pattern.

On one hand, action at a distance when it involves caching can be dangerous. And doing the lookup inline will avoid a Python function call.

On the other, it seems very redundant.

Overall I'm OK with the patch. I just feel like using a collections.defaultdict or implementing __missing__ to automagically resolve missing keys might work out better. Do you think this is a reasonable request?

In D4366#67144, @indygreg wrote:

I'm not sure how I feel about so many methods having the if dir in self._lazydirs: self._loadlazy(dir) pattern.
On one hand, action at a distance when it involves caching can be dangerous. And doing the lookup inline will avoid a Python function call.
On the other, it seems very redundant.
Overall I'm OK with the patch. I just feel like using a collections.defaultdict or implementing __missing__ to automagically resolve missing keys might work out better. Do you think this is a reasonable request?

I think it's a reasonable request, but I'm not sure it's feasible with how collections.defaultdict/__missing__ work. They are *only* called by getitem. We'd need a separate dict subclass for _dirs that handles contains as well, at the very least, for all the cases of if dir not in self._dirs: return <something>

I'm playing around with such a thing, it's not difficult but I think might be more awkward than helpful. Even if that doesn't bear fruit, while implementing it I noticed something: I'm not actually populating self._lazydirs in this patch, I appear to have lost it during split/merge/whatever. Oops. I have to hope that the numbers I have in the commit description were before I lost this, since otherwise it's completely incomprehensible that this patch would cause a performance benefit (well, maybe not, if dead code can cause a performance loss, why not the other way around?)

spectral edited the summary of this revision. (Show Details)Sep 6 2018, 3:01 PM

spectral updated this revision to Diff 10815.

In D4366#67153, @spectral wrote:

In D4366#67144, @indygreg wrote:

I'm not sure how I feel about so many methods having the if dir in self._lazydirs: self._loadlazy(dir) pattern.
On one hand, action at a distance when it involves caching can be dangerous. And doing the lookup inline will avoid a Python function call.
On the other, it seems very redundant.
Overall I'm OK with the patch. I just feel like using a collections.defaultdict or implementing __missing__ to automagically resolve missing keys might work out better. Do you think this is a reasonable request?

I think it's a reasonable request, but I'm not sure it's feasible with how collections.defaultdict/__missing__ work. They are *only* called by getitem. We'd need a separate dict subclass for _dirs that handles contains as well, at the very least, for all the cases of if dir not in self._dirs: return <something>
I'm playing around with such a thing, it's not difficult but I think might be more awkward than helpful. Even if that doesn't bear fruit, while implementing it I noticed something: I'm not actually populating self._lazydirs in this patch, I appear to have lost it during split/merge/whatever. Oops. I have to hope that the numbers I have in the commit description were before I lost this, since otherwise it's completely incomprehensible that this patch would cause a performance benefit (well, maybe not, if dead code can cause a performance loss, why not the other way around?)

Turns out that it got quite messy and caused some significant performance loss, though it's likely that at least some of the performance difference was something I did wrong. To handle the two dictionaries, and that things migrate from one to the other, I needed to override __missing__, get, __setitem__, __nonempty__, __iter__, items, iteritems, values, ... I basically had to reimplement all of dict, and that felt a fair amount messier than the duplication here. If you think that might still be preferable, I can try harder to figure out why there's a significant (~15-30% full runtime on some of the commands) performance difference.

Closed by commit rHG43af08f3205f: treemanifest: introduce lazy loading of subdirs (authored by spectral). · Explain WhySep 10 2018, 2:45 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/manifest.py (77 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	10544		Aug 24 2018, 1:36 AM	★	★
Diff 2	10815		Sep 6 2018, 3:01 PM	★	★
Diff 3	10870	rHG43af08f3205f7ba1743c444d6c57c62f9c6b7ce8	Aug 16 2018, 3:31 PM	★	★

Commit	Parents	Author	Summary	Date
		spectral		Aug 16 2018, 3:31 PM

Status	Author	Revision
Closed	spectral	D4371 treemanifest: use visitchildrenset when doing a walk
Closed	spectral	D4370 treemanifest: use visitchildrenset when filtering a manifest to a matcher
Closed	spectral	D4369 treemanifest: avoid loading everything just to get their nodeid
Closed	spectral	D4368 treemanifest: avoid unnecessary copies/processing when using alwaysmatcher
Closed	spectral	D4367 treemanifest: attempt to avoid loading all lazily-loaded subdirs in _isempty
Closed	spectral	D4366 treemanifest: introduce lazy loading of subdirs
Closed	spectral	D4365 match: make exactmatcher.visitchildrenset return file children as well
Closed	spectral	D4364 match: document that visitchildrenset might return files

Diff 10815

mercurial/manifest.py

	class treemanifest(object):			class treemanifest(object):
	def __init__(self, dir='', text=''):			def __init__(self, dir='', text=''):
	self._dir = dir			self._dir = dir
	self._node = nullid			self._node = nullid
	self._loadfunc = _noop			self._loadfunc = _noop
	self._copyfunc = _noop			self._copyfunc = _noop
	self._dirty = False			self._dirty = False
	self._dirs = {}			self._dirs = {}
				self._lazydirs = {}
	# Using _lazymanifest here is a little slower than plain old dicts			# Using _lazymanifest here is a little slower than plain old dicts
	self._files = {}			self._files = {}
	self._flags = {}			self._flags = {}
	if text:			if text:
	def readsubtree(subdir, subm):			def readsubtree(subdir, subm):
	raise AssertionError('treemanifest constructor only accepts '			raise AssertionError('treemanifest constructor only accepts '
	'flat manifests')			'flat manifests')
	self.parse(text, readsubtree)			self.parse(text, readsubtree)
	self._dirty = True # Mark flat manifest dirty after parsing			self._dirty = True # Mark flat manifest dirty after parsing

	def _subpath(self, path):			def _subpath(self, path):
	return self._dir + path			return self._dir + path

				def _loadalllazy(self):
				for k, (path, node, readsubtree) in self._lazydirs.iteritems():
				self._dirs[k] = readsubtree(path, node)
				self._lazydirs = {}

				def _loadlazy(self, d):
				path, node, readsubtree = self._lazydirs[d]
				self._dirs[d] = readsubtree(path, node)
				del self._lazydirs[d]

	def __len__(self):			def __len__(self):
	self._load()			self._load()
	size = len(self._files)			size = len(self._files)
				self._loadalllazy()
	for m in self._dirs.values():			for m in self._dirs.values():
	size += m.__len__()			size += m.__len__()
	return size			return size

	def __nonzero__(self):			def __nonzero__(self):
	# Faster than "__len() != 0" since it avoids loading sub-manifests			# Faster than "__len() != 0" since it avoids loading sub-manifests
	return not self._isempty()			return not self._isempty()

	__bool__ = __nonzero__			__bool__ = __nonzero__

	def _isempty(self):			def _isempty(self):
	self._load() # for consistency; already loaded by all callers			self._load() # for consistency; already loaded by all callers
				self._loadalllazy()
	return (not self._files and (not self._dirs or			return (not self._files and (not self._dirs or
	all(m._isempty() for m in self._dirs.values())))			all(m._isempty() for m in self._dirs.values())))

	def __repr__(self):			def __repr__(self):
	return ('<treemanifest dir=%s, node=%s, loaded=%s, dirty=%s at 0x%x>' %			return ('<treemanifest dir=%s, node=%s, loaded=%s, dirty=%s at 0x%x>' %
	(self._dir, hex(self._node),			(self._dir, hex(self._node),
	bool(self._loadfunc is _noop),			bool(self._loadfunc is _noop),
	self._dirty, id(self)))			self._dirty, id(self)))
	return self._node			return self._node

	def setnode(self, node):			def setnode(self, node):
	self._node = node			self._node = node
	self._dirty = False			self._dirty = False

	def iterentries(self):			def iterentries(self):
	self._load()			self._load()
				self._loadalllazy()
	for p, n in sorted(itertools.chain(self._dirs.items(),			for p, n in sorted(itertools.chain(self._dirs.items(),
	self._files.items())):			self._files.items())):
	if p in self._files:			if p in self._files:
	yield self._subpath(p), n, self._flags.get(p, '')			yield self._subpath(p), n, self._flags.get(p, '')
	else:			else:
	for x in n.iterentries():			for x in n.iterentries():
	yield x			yield x

	def items(self):			def items(self):
	self._load()			self._load()
				self._loadalllazy()
	for p, n in sorted(itertools.chain(self._dirs.items(),			for p, n in sorted(itertools.chain(self._dirs.items(),
	self._files.items())):			self._files.items())):
	if p in self._files:			if p in self._files:
	yield self._subpath(p), n			yield self._subpath(p), n
	else:			else:
	for f, sn in n.iteritems():			for f, sn in n.iteritems():
	yield f, sn			yield f, sn

	iteritems = items			iteritems = items

	def iterkeys(self):			def iterkeys(self):
	self._load()			self._load()
				self._loadalllazy()
	for p in sorted(itertools.chain(self._dirs, self._files)):			for p in sorted(itertools.chain(self._dirs, self._files)):
	if p in self._files:			if p in self._files:
	yield self._subpath(p)			yield self._subpath(p)
	else:			else:
	for f in self._dirs[p]:			for f in self._dirs[p]:
	yield f			yield f

	def keys(self):			def keys(self):
	return list(self.iterkeys())			return list(self.iterkeys())

	def __iter__(self):			def __iter__(self):
	return self.iterkeys()			return self.iterkeys()

	def __contains__(self, f):			def __contains__(self, f):
	if f is None:			if f is None:
	return False			return False
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)

	if dir not in self._dirs:			if dir not in self._dirs:
	return False			return False

	return self._dirs[dir].__contains__(subpath)			return self._dirs[dir].__contains__(subpath)
	else:			else:
	return f in self._files			return f in self._files

	def get(self, f, default=None):			def get(self, f, default=None):
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)

	if dir not in self._dirs:			if dir not in self._dirs:
	return default			return default
	return self._dirs[dir].get(subpath, default)			return self._dirs[dir].get(subpath, default)
	else:			else:
	return self._files.get(f, default)			return self._files.get(f, default)

	def __getitem__(self, f):			def __getitem__(self, f):
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)

	return self._dirs[dir].__getitem__(subpath)			return self._dirs[dir].__getitem__(subpath)
	else:			else:
	return self._files[f]			return self._files[f]

	def flags(self, f):			def flags(self, f):
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)

	if dir not in self._dirs:			if dir not in self._dirs:
	return ''			return ''
	return self._dirs[dir].flags(subpath)			return self._dirs[dir].flags(subpath)
	else:			else:
	if f in self._dirs:			if f in self._lazydirs or f in self._dirs:
	return ''			return ''
	return self._flags.get(f, '')			return self._flags.get(f, '')

	def find(self, f):			def find(self, f):
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)

	return self._dirs[dir].find(subpath)			return self._dirs[dir].find(subpath)
	else:			else:
	return self._files[f], self._flags.get(f, '')			return self._files[f], self._flags.get(f, '')

	def __delitem__(self, f):			def __delitem__(self, f):
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)

	self._dirs[dir].__delitem__(subpath)			self._dirs[dir].__delitem__(subpath)
	# If the directory is now empty, remove it			# If the directory is now empty, remove it
	if self._dirs[dir]._isempty():			if self._dirs[dir]._isempty():
	del self._dirs[dir]			del self._dirs[dir]
	else:			else:
	del self._files[f]			del self._files[f]
	if f in self._flags:			if f in self._flags:
	del self._flags[f]			del self._flags[f]
	self._dirty = True			self._dirty = True

	def __setitem__(self, f, n):			def __setitem__(self, f, n):
	assert n is not None			assert n is not None
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)
	if dir not in self._dirs:			if dir not in self._dirs:
	self._dirs[dir] = treemanifest(self._subpath(dir))			self._dirs[dir] = treemanifest(self._subpath(dir))
	self._dirs[dir].__setitem__(subpath, n)			self._dirs[dir].__setitem__(subpath, n)
	else:			else:
	self._files[f] = n[:21] # to match manifestdict's behavior			self._files[f] = n[:21] # to match manifestdict's behavior
	self._dirty = True			self._dirty = True

	def _load(self):			def _load(self):
	if self._loadfunc is not _noop:			if self._loadfunc is not _noop:
	lf, self._loadfunc = self._loadfunc, _noop			lf, self._loadfunc = self._loadfunc, _noop
	lf(self)			lf(self)
	elif self._copyfunc is not _noop:			elif self._copyfunc is not _noop:
	cf, self._copyfunc = self._copyfunc, _noop			cf, self._copyfunc = self._copyfunc, _noop
	cf(self)			cf(self)

	def setflag(self, f, flags):			def setflag(self, f, flags):
	"""Set the flags (symlink, executable) for path f."""			"""Set the flags (symlink, executable) for path f."""
	self._load()			self._load()
	dir, subpath = _splittopdir(f)			dir, subpath = _splittopdir(f)
	if dir:			if dir:
				if dir in self._lazydirs:
				self._loadlazy(dir)
	if dir not in self._dirs:			if dir not in self._dirs:
	self._dirs[dir] = treemanifest(self._subpath(dir))			self._dirs[dir] = treemanifest(self._subpath(dir))
	self._dirs[dir].setflag(subpath, flags)			self._dirs[dir].setflag(subpath, flags)
	else:			else:
	self._flags[f] = flags			self._flags[f] = flags
	self._dirty = True			self._dirty = True

	def copy(self):			def copy(self):
	copy = treemanifest(self._dir)			copy = treemanifest(self._dir)
	copy._node = self._node			copy._node = self._node
	copy._dirty = self._dirty			copy._dirty = self._dirty
	if self._copyfunc is _noop:			if self._copyfunc is _noop:
	def _copyfunc(s):			def _copyfunc(s):
	self._load()			self._load()
	for d in self._dirs:			# OPT: it'd be nice to not load everything here. Unfortunately
	s._dirs[d] = self._dirs[d].copy()			# this makes a mess of the "dirty" state tracking if we don't.
				self._loadalllazy()
				sdirs = s._dirs
				for d, v in self._dirs.iteritems():
				sdirs[d] = v.copy()
	s._files = dict.copy(self._files)			s._files = dict.copy(self._files)
	s._flags = dict.copy(self._flags)			s._flags = dict.copy(self._flags)
	if self._loadfunc is _noop:			if self._loadfunc is _noop:
	_copyfunc(copy)			_copyfunc(copy)
	else:			else:
	copy._copyfunc = _copyfunc			copy._copyfunc = _copyfunc
	else:			else:
	copy._copyfunc = self._copyfunc			copy._copyfunc = self._copyfunc
	return copy			return copy

	def filesnotin(self, m2, match=None):			def filesnotin(self, m2, match=None):
	'''Set of files in this manifest that are not in the other'''			'''Set of files in this manifest that are not in the other'''
	if match:			if match:
	m1 = self.matches(match)			m1 = self.matches(match)
	m2 = m2.matches(match)			m2 = m2.matches(match)
	return m1.filesnotin(m2)			return m1.filesnotin(m2)

	files = set()			files = set()
	def _filesnotin(t1, t2):			def _filesnotin(t1, t2):
	if t1._node == t2._node and not t1._dirty and not t2._dirty:			if t1._node == t2._node and not t1._dirty and not t2._dirty:
	return			return
	t1._load()			t1._load()
	t2._load()			t2._load()
				t1._loadalllazy()
				t2._loadalllazy()
	for d, m1 in t1._dirs.iteritems():			for d, m1 in t1._dirs.iteritems():
	if d in t2._dirs:			if d in t2._dirs:
	m2 = t2._dirs[d]			m2 = t2._dirs[d]
	_filesnotin(m1, m2)			_filesnotin(m1, m2)
	else:			else:
	files.update(m1.iterkeys())			files.update(m1.iterkeys())

	for fn in t1._files:			for fn in t1._files:

	def dirs(self):			def dirs(self):
	return self._alldirs			return self._alldirs

	def hasdir(self, dir):			def hasdir(self, dir):
	self._load()			self._load()
	topdir, subdir = _splittopdir(dir)			topdir, subdir = _splittopdir(dir)
	if topdir:			if topdir:
				if topdir in self._lazydirs:
				self._loadlazy(topdir)
	if topdir in self._dirs:			if topdir in self._dirs:
	return self._dirs[topdir].hasdir(subdir)			return self._dirs[topdir].hasdir(subdir)
	return False			return False
	return (dir + '/') in self._dirs			dirslash = dir + '/'
				return dirslash in self._dirs or dirslash in self._lazydirs

	def walk(self, match):			def walk(self, match):
	'''Generates matching file names.			'''Generates matching file names.

	Equivalent to manifest.matches(match).iterkeys(), but without creating			Equivalent to manifest.matches(match).iterkeys(), but without creating
	an entirely new manifest.			an entirely new manifest.

	It also reports nonexistent files by marking them bad with match.bad().			It also reports nonexistent files by marking them bad with match.bad().

	def _walk(self, match):			def _walk(self, match):
	'''Recursively generates matching file names for walk().'''			'''Recursively generates matching file names for walk().'''
	if not match.visitdir(self._dir[:-1] or '.'):			if not match.visitdir(self._dir[:-1] or '.'):
	return			return

	# yield this dir's files and walk its submanifests			# yield this dir's files and walk its submanifests
	self._load()			self._load()
				self._loadalllazy()
	for p in sorted(list(self._dirs) + list(self._files)):			for p in sorted(list(self._dirs) + list(self._files)):
	if p in self._files:			if p in self._files:
	fullp = self._subpath(p)			fullp = self._subpath(p)
	if match(fullp):			if match(fullp):
	yield fullp			yield fullp
	else:			else:
	for f in self._dirs[p]._walk(match):			for f in self._dirs[p]._walk(match):
	yield f			yield f
	for fn in self._files:			for fn in self._files:
	fullp = self._subpath(fn)			fullp = self._subpath(fn)
	if not match(fullp):			if not match(fullp):
	continue			continue
	ret._files[fn] = self._files[fn]			ret._files[fn] = self._files[fn]
	if fn in self._flags:			if fn in self._flags:
	ret._flags[fn] = self._flags[fn]			ret._flags[fn] = self._flags[fn]

				# OPT: use visitchildrenset to avoid loading everything
				self._loadalllazy()
	for dir, subm in self._dirs.iteritems():			for dir, subm in self._dirs.iteritems():
	m = subm._matches(match)			m = subm._matches(match)
	if not m._isempty():			if not m._isempty():
	ret._dirs[dir] = m			ret._dirs[dir] = m

	if not ret._isempty():			if not ret._isempty():
	ret._dirty = True			ret._dirty = True
	return ret			return ret
	return m1.diff(m2, clean=clean)			return m1.diff(m2, clean=clean)
	result = {}			result = {}
	emptytree = treemanifest()			emptytree = treemanifest()
	def _diff(t1, t2):			def _diff(t1, t2):
	if t1._node == t2._node and not t1._dirty and not t2._dirty:			if t1._node == t2._node and not t1._dirty and not t2._dirty:
	return			return
	t1._load()			t1._load()
	t2._load()			t2._load()
				# OPT: do we need to load everything?
				t1._loadalllazy()
				t2._loadalllazy()
	for d, m1 in t1._dirs.iteritems():			for d, m1 in t1._dirs.iteritems():
	m2 = t2._dirs.get(d, emptytree)			m2 = t2._dirs.get(d, emptytree)
	_diff(m1, m2)			_diff(m1, m2)

	for d, m2 in t2._dirs.iteritems():			for d, m2 in t2._dirs.iteritems():
	if d not in t1._dirs:			if d not in t1._dirs:
	_diff(emptytree, m2)			_diff(emptytree, m2)


	_diff(self, m2)			_diff(self, m2)
	return result			return result

	def unmodifiedsince(self, m2):			def unmodifiedsince(self, m2):
	return not self._dirty and not m2._dirty and self._node == m2._node			return not self._dirty and not m2._dirty and self._node == m2._node

	def parse(self, text, readsubtree):			def parse(self, text, readsubtree):
				selflazy = self._lazydirs
				subpath = self._subpath
	for f, n, fl in _parse(text):			for f, n, fl in _parse(text):
	if fl == 't':			if fl == 't':
	f = f + '/'			f = f + '/'
	self._dirs[f] = readsubtree(self._subpath(f), n)			selflazy[f] = (subpath(f), n, readsubtree)
	elif '/' in f:			elif '/' in f:
	# This is a flat manifest, so use __setitem__ and setflag rather			# This is a flat manifest, so use __setitem__ and setflag rather
	# than assigning directly to _files and _flags, so we can			# than assigning directly to _files and _flags, so we can
	# assign a path in a subdirectory, and to mark dirty (compared			# assign a path in a subdirectory, and to mark dirty (compared
	# to nullid).			# to nullid).
	self[f] = n			self[f] = n
	if fl:			if fl:
	self.setflag(f, fl)			self.setflag(f, fl)
	return _text(self.iterentries())			return _text(self.iterentries())

	def dirtext(self):			def dirtext(self):
	"""Get the full data of this directory as a bytestring. Make sure that			"""Get the full data of this directory as a bytestring. Make sure that
	any submanifests have been written first, so their nodeids are correct.			any submanifests have been written first, so their nodeids are correct.
	"""			"""
	self._load()			self._load()
	flags = self.flags			flags = self.flags
				lazydirs = [(d[:-1], node, 't') for
				d, (path, node, readsubtree) in self._lazydirs.iteritems()]
	dirs = [(d[:-1], self._dirs[d]._node, 't') for d in self._dirs]			dirs = [(d[:-1], self._dirs[d]._node, 't') for d in self._dirs]
	files = [(f, self._files[f], flags(f)) for f in self._files]			files = [(f, self._files[f], flags(f)) for f in self._files]
	return _text(sorted(dirs + files))			return _text(sorted(dirs + files + lazydirs))

	def read(self, gettext, readsubtree):			def read(self, gettext, readsubtree):
	def _load_for_read(s):			def _load_for_read(s):
	s.parse(gettext(), readsubtree)			s.parse(gettext(), readsubtree)
	s._dirty = False			s._dirty = False
	self._loadfunc = _load_for_read			self._loadfunc = _load_for_read

	def writesubtrees(self, m1, m2, writesubtree):			def writesubtrees(self, m1, m2, writesubtree):
	self._load() # for consistency; should never have any effect here			self._load() # for consistency; should never have any effect here
	m1._load()			m1._load()
	m2._load()			m2._load()
	emptytree = treemanifest()			emptytree = treemanifest()
				# OPT: Do we really need to load everything? Presumably things in lazy
				# aren't dirty and don't need to be written.
				self._loadalllazy()
				m1._loadalllazy()
				m2._loadalllazy()
	for d, subm in self._dirs.iteritems():			for d, subm in self._dirs.iteritems():
	subp1 = m1._dirs.get(d, emptytree)._node			subp1 = m1._dirs.get(d, emptytree)._node
	subp2 = m2._dirs.get(d, emptytree)._node			subp2 = m2._dirs.get(d, emptytree)._node
	if subp1 == nullid:			if subp1 == nullid:
	subp1, subp2 = subp2, subp1			subp1, subp2 = subp2, subp1
	writesubtree(subm, subp1, subp2)			writesubtree(subm, subp1, subp2)

	def walksubtrees(self, matcher=None):			def walksubtrees(self, matcher=None):
	"""Returns an iterator of the subtrees of this manifest, including this			"""Returns an iterator of the subtrees of this manifest, including this
	manifest itself.			manifest itself.

	If `matcher` is provided, it only returns subtrees that match.			If `matcher` is provided, it only returns subtrees that match.
	"""			"""
	if matcher and not matcher.visitdir(self._dir[:-1] or '.'):			if matcher and not matcher.visitdir(self._dir[:-1] or '.'):
	return			return
	if not matcher or matcher(self._dir[:-1]):			if not matcher or matcher(self._dir[:-1]):
	yield self			yield self

	self._load()			self._load()
				# OPT: use visitchildrenset to avoid loading everything.
				self._loadalllazy()
	for d, subm in self._dirs.iteritems():			for d, subm in self._dirs.iteritems():
	for subtree in subm.walksubtrees(matcher=matcher):			for subtree in subm.walksubtrees(matcher=matcher):
	yield subtree			yield subtree

	class manifestfulltextcache(util.lrucachedict):			class manifestfulltextcache(util.lrucachedict):
	"""File-backed LRU cache for the manifest cache			"""File-backed LRU cache for the manifest cache

	File consists of entries, up to EOF:			File consists of entries, up to EOF: