remotefilelog/repack.py
323	packs

remotefilelog/repack.py
239	Since we are only picking at least 2 packs for repacking after this change instead of 3 as it was earlier, there can be a problem of convergence (number of packs considered for repacking being almost always less than the new packs being generated) which was pointed out by @durham. In particular, since the second highest generation is starting at `100MB`, if we have enough packs > `100MB` we will only pack only 2 of them at once. To take care of this, I am increasing the default `repacksizelimit` to `1GB` from `100MB`. This should have the desired effect. Note that the `maxrepackpacks` limit ensures that all the smaller packs are not considered for the repacking even though they might fall under the `repacksizelimit`.

remotefilelog/repack.py
239	Reverted this to `100MB` because the change from a minimum of 2 to 3 files was only required when we had the following behavior: We either packed all the files in a generation if they were less than the size limit. Or we packed only 2 files from the generation (which we later changed to 3). This is not really required if we are packing across generations till we hit the size limit.

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	3741		Nov 21 2017, 2:59 PM	★	★
Diff 2	3745		Nov 21 2017, 4:18 PM	★	★
Diff 3	3757		Nov 21 2017, 9:10 PM	★	★
Diff 4	4017		Nov 30 2017, 4:23 PM	★	★
Diff 5	4068		Dec 1 2017, 7:49 PM	★	★

Diff 4068

remotefilelog/repack.py

	from __future__ import absolute_import			from __future__ import absolute_import

				import itertools
	import os			import os
	from hgext3rd.extutil import runshellcommand, fcntllock			from hgext3rd.extutil import runshellcommand, fcntllock
	from mercurial import (			from mercurial import (
	error,			error,
	extensions,			extensions,
	mdiff,			mdiff,
	policy,			policy,
	scmutil,			scmutil,
	generations = ui.configlist("remotefilelog", "data.generations",			generations = ui.configlist("remotefilelog", "data.generations",
	['1GB', '100MB', '1MB'])			['1GB', '100MB', '1MB'])
	generations = list(sorted((util.sizetoint(s) for s in generations),			generations = list(sorted((util.sizetoint(s) for s in generations),
	reverse=True))			reverse=True))
	generations.append(0)			generations.append(0)

	gencountlimit = ui.configint('remotefilelog', 'data.gencountlimit', 2)			gencountlimit = ui.configint('remotefilelog', 'data.gencountlimit', 2)
	repacksizelimit = ui.configbytes('remotefilelog', 'data.repacksizelimit',			repacksizelimit = ui.configbytes('remotefilelog', 'data.repacksizelimit',
	'100MB')			'100MB')
				singhsrbAuthorUnsubmitted Not Done Since we are only picking at least 2 packs for repacking after this change instead of 3 as it was earlier, there can be a problem of convergence (number of packs considered for repacking being almost always less than the new packs being generated) which was pointed out by @durham. In particular, since the second highest generation is starting at `100MB`, if we have enough packs > `100MB` we will only pack only 2 of them at once. To take care of this, I am increasing the default `repacksizelimit` to `1GB` from `100MB`. This should have the desired effect. Note that the `maxrepackpacks` limit ensures that all the smaller packs are not considered for the repacking even though they might fall under the `repacksizelimit`. singhsrb: Since we are only picking at least 2 packs for repacking after this change instead of 3 as it…
				singhsrbAuthorUnsubmitted Not Done Reverted this to `100MB` because the change from a minimum of 2 to 3 files was only required when we had the following behavior: We either packed all the files in a generation if they were less than the size limit. Or we packed only 2 files from the generation (which we later changed to 3). This is not really required if we are packing across generations till we hit the size limit. singhsrb: Reverted this to `100MB` because the change from a minimum of 2 to 3 files was only required…
	maxrepackpacks = ui.configint('remotefilelog', 'data.maxrepackpacks', 50)			maxrepackpacks = ui.configint('remotefilelog', 'data.maxrepackpacks', 50)

	return _computeincrementalpack(ui, files, generations, datapack.PACKSUFFIX,			return _computeincrementalpack(ui, files, generations, datapack.PACKSUFFIX,
	datapack.INDEXSUFFIX, gencountlimit,			datapack.INDEXSUFFIX, gencountlimit,
	repacksizelimit, maxrepackpacks)			repacksizelimit, maxrepackpacks)

	def _computeincrementalhistorypack(ui, files):			def _computeincrementalhistorypack(ui, files):
	generations = ui.configlist("remotefilelog", "history.generations",			generations = ui.configlist("remotefilelog", "history.generations",
	size = stat.st_size			size = stat.st_size
	sizes[prefix] = size			sizes[prefix] = size
	for i, limit in enumerate(limits):			for i, limit in enumerate(limits):
	if size > limit:			if size > limit:
	generations[i].append(prefix)			generations[i].append(prefix)
	break			break

	# Steps for picking what packs to repack:			# Steps for picking what packs to repack:
	# 1. Pick the largest generation with >2 pack files.			# 1. Pick the largest generation with > gencountlimit pack files.
	# 2. Take the smallest three packs.			# 2. If no such generation exists, consider pack files across generations.
	# 3. While total-size-of-packs < repacksizelimit: add another pack			# Exclude packs from the highest generation while considering packs
				# across generations because it can have huge packs.
				# 3. Starting with two smallest packs, take as many packs as we can within
				# the constraints. The current constraints include the total size and the
				# number of packs considered for the repacking.

	# Find the largest generation with more than gencountlimit packs			# Find the largest generation with more than gencountlimit packs.
				# Packs will always be sorted to be smallest last, for easy popping later.
	genpacks = []			genpacks = []
	for i, limit in enumerate(limits):			for i, limit in enumerate(limits):
	if len(generations[i]) > gencountlimit:			if len(generations[i]) > gencountlimit:
	# Sort to be smallest last, for easy popping later
	genpacks.extend(sorted(generations[i], reverse=True,			genpacks.extend(sorted(generations[i], reverse=True,
	key=lambda x: sizes[x]))			key=lambda x: sizes[x]))
	break			break

	# Take as many packs from the generation as we can			if not genpacks:
	chosenpacks = genpacks[-3:]			# No generation has gencountlimit packs. Therefore, we need to select
	genpacks = genpacks[:-3]			# packs across generations.
				genpacks = sorted(
				itertools.chain(*generations[1:]),
				reverse = True,
				key = lambda x: sizes[x]
				)

				if len(genpacks) < 2:
				# There is no need to repack since we have 0 or 1 packs.
				phillcoUnsubmitted Done packs phillco: packs
				chosenpacks = []
				else:
				# At least 2 packs will always be chosen irrespective of any
				# constraints.
				chosenpacks = genpacks[-2:]
				genpacks = genpacks[:-2]
	repacksize = sum(sizes[n] for n in chosenpacks)			repacksize = sum(sizes[n] for n in chosenpacks)
	while (repacksize < repacksizelimit and genpacks and			while (repacksize < repacksizelimit and genpacks and
	len(chosenpacks) < maxrepackpacks):			len(chosenpacks) < maxrepackpacks):
	chosenpacks.append(genpacks.pop())			chosenpacks.append(genpacks.pop())
	repacksize += sizes[chosenpacks[-1]]			repacksize += sizes[chosenpacks[-1]]

	# If there aren't any good candidates for a repack,
	# repack the two largest ones.
	if not chosenpacks and len(generations[0]) > 1:
	chosenpacks = generations[0]

	return chosenpacks			return chosenpacks

	def _runrepack(repo, data, history, packpath, category, fullhistory=None,			def _runrepack(repo, data, history, packpath, category, fullhistory=None,
	options=None):			options=None):
	shallowutil.mkstickygroupdir(repo.ui, packpath)			shallowutil.mkstickygroupdir(repo.ui, packpath)

	def isold(repo, filename, node):			def isold(repo, filename, node):
	"""Check if the file node is older than a limit.			"""Check if the file node is older than a limit.

tests/test-treemanifest-repack.t

	# Test incremental repacking of trees			# Test incremental repacking of trees
	$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'			$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'
	$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'			$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'
	$ ls_l .hg/store/packs/manifests \| grep datapack			$ ls_l .hg/store/packs/manifests \| grep datapack
	-r--r--r-- 248 21501384df03b8489b366c5218be639fa08830e4.datapack			-r--r--r-- 248 21501384df03b8489b366c5218be639fa08830e4.datapack
	-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack			-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack
	-r--r--r-- 248 d7e689a91ac63385be120a118af9ce8663748f28.datapack			-r--r--r-- 248 d7e689a91ac63385be120a118af9ce8663748f28.datapack

	- repack incremental does nothing here because there are so few packs			- repack incremental always repacks at least 2 smallest packs outside the
	$ hg repack --incremental --config remotefilelog.data.generations=300,200 --config remotefilelog.data.repacksizelimit=300			highest generation despite the constraints even if it has to go across
				generations.
				$ hg repack --incremental --config remotefilelog.data.generations=300,200 \
				> --config remotefilelog.data.repacksizelimit=300
	$ ls_l .hg/store/packs/manifests \| grep datapack			$ ls_l .hg/store/packs/manifests \| grep datapack
	-r--r--r-- 248 21501384df03b8489b366c5218be639fa08830e4.datapack			-r--r--r-- 505 63e9ec504e6f48299553359c9a00bc85d562fc01.datapack
	-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack			-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack
	-r--r--r-- 248 d7e689a91ac63385be120a118af9ce8663748f28.datapack

	$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'			$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'
	$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'			$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'
	$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'			$ echo b >> dir/b && hg commit -Aqm 'modify dir/b'
	$ ls_l .hg/store/packs/manifests \| grep datapack			$ ls_l .hg/store/packs/manifests \| grep datapack
	-r--r--r-- 248 21501384df03b8489b366c5218be639fa08830e4.datapack
	-r--r--r-- 248 347263bf1efbdb5bf7e1d1565b6b504073fb9093.datapack			-r--r--r-- 248 347263bf1efbdb5bf7e1d1565b6b504073fb9093.datapack
	-r--r--r-- 248 544a3b46a61732209116ae50847ec333b75e3765.datapack			-r--r--r-- 248 544a3b46a61732209116ae50847ec333b75e3765.datapack
				-r--r--r-- 505 63e9ec504e6f48299553359c9a00bc85d562fc01.datapack
	-r--r--r-- 248 863908ef8149261ab0d891c2344d8e8766c39441.datapack			-r--r--r-- 248 863908ef8149261ab0d891c2344d8e8766c39441.datapack
	-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack			-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack
	-r--r--r-- 248 d7e689a91ac63385be120a118af9ce8663748f28.datapack
	$ cd .hg/store/packs/manifests			- Now, we have 3 packs in the generation with > 200 bytes. Therefore, the next
	$ cp d7e689a91ac63385be120a118af9ce8663748f28.datapack x7e689a91ac63385be120a118af9ce8663748f28.datapack			incremental repack will consider packs from that generation. Also, the size
	$ cp d7e689a91ac63385be120a118af9ce8663748f28.dataidx x7e689a91ac63385be120a118af9ce8663748f28.dataidx			limit will be honored and one of the packs with size 248 won't be considered for
	$ cp 21501384df03b8489b366c5218be639fa08830e4.datapack x1501384df03b8489b366c5218be639fa08830e4.datapack			repacking.
	$ cp 21501384df03b8489b366c5218be639fa08830e4.dataidx x1501384df03b8489b366c5218be639fa08830e4.dataidx			$ hg repack --incremental --config remotefilelog.data.generations=300,200 \
	$ cp 347263bf1efbdb5bf7e1d1565b6b504073fb9093.datapack x47263bf1efbdb5bf7e1d1565b6b504073fb9093.datapack			> --config remotefilelog.data.repacksizelimit=300
	$ cp 347263bf1efbdb5bf7e1d1565b6b504073fb9093.dataidx x47263bf1efbdb5bf7e1d1565b6b504073fb9093.dataidx			$ ls_l .hg/store/packs/manifests \| grep datapack
	$ cd ../../../../			-r--r--r-- 505 63e9ec504e6f48299553359c9a00bc85d562fc01.datapack
				-r--r--r-- 505 75394b4a2dce16d46dcaa882386a6d8b91246f96.datapack
	- repack incremental kicks in once there are a number of packs			-r--r--r-- 248 863908ef8149261ab0d891c2344d8e8766c39441.datapack
	- (set the repacksizelimit so that we test that we only repack up to 1500 bytes,			-r--r--r-- 386 d15c09a9a5a13bb689bd9764455a415a20dc885e.datapack
	- and it leaves one datapack behind)
	$ hg repack --incremental --config remotefilelog.data.generations=300,200 --config remotefilelog.data.repacksizelimit=1500B			- Now, we have 3 packs in the generation with > 300 bytes. Therefore, the next
				incremental repack will consider packs from that generation. Also, the size
				limit will be honored and one of the packs with size 505 won't be considered for
				repacking.
				$ hg repack --incremental --config remotefilelog.data.generations=300,200 \
				> --config remotefilelog.data.repacksizelimit=300
				$ ls_l .hg/store/packs/manifests \| grep datapack
				-r--r--r-- 505 63e9ec504e6f48299553359c9a00bc85d562fc01.datapack
				-r--r--r-- 890 813e64604219dfc585465b77dcd570a0bc631022.datapack
				-r--r--r-- 248 863908ef8149261ab0d891c2344d8e8766c39441.datapack

				- No generation has sufficient number of packs to be considered for the repack.
				Therefore, we will repack across generations till we don't go beyond the
				constraints. In this case, we have to set the size limit high enough for all
				packs to be considered for the repacking and we have to ensure the packs are not
				in the highest generation because they are not considered for repacking.
				$ hg repack --incremental --config remotefilelog.data.repacksizelimit=1600B \
				> --config remotefilelog.data.generations=900,300,200
	$ ls_l .hg/store/packs/manifests \| grep datapack \| wc -l			$ ls_l .hg/store/packs/manifests \| grep datapack \| wc -l
	.*3 (re)			.*1 (re)
	$ ls_l .hg/store/packs/manifests \| grep datapack \| grep 248
	-r--r--r-- 248 *.datapack (glob)
	- Clean up the pile of packs we made
	$ hg repack

	Test repacking from revlogs to pack files on the server			Test repacking from revlogs to pack files on the server
	$ cd ../master			$ cd ../master

	$ cat >> .hg/hgrc <<EOF			$ cat >> .hg/hgrc <<EOF
	> [extensions]			> [extensions]
	> treemanifest=$TESTDIR/../treemanifest			> treemanifest=$TESTDIR/../treemanifest
	> remotefilelog=$TESTDIR/../remotefilelog			> remotefilelog=$TESTDIR/../remotefilelog
	-r--r--r-- 264 e9093d2d887ff14457d43338fcb3994e92051853.datapack			-r--r--r-- 264 e9093d2d887ff14457d43338fcb3994e92051853.datapack

	- Only one pack, means don't repack it. Only turn revlogs into a pack.			- Only one pack, means don't repack it. Only turn revlogs into a pack.
	$ hg repack --incremental --config remotefilelog.data.generations=300,20			$ hg repack --incremental --config remotefilelog.data.generations=300,20
	$ ls_l .hg/cache/packs/manifests/ \| grep datapack			$ ls_l .hg/cache/packs/manifests/ \| grep datapack
	-r--r--r-- 264 e9093d2d887ff14457d43338fcb3994e92051853.datapack			-r--r--r-- 264 e9093d2d887ff14457d43338fcb3994e92051853.datapack
	-r--r--r-- 154 f9657fdc11d7c9847208da3f1245b38c5981df79.datapack			-r--r--r-- 154 f9657fdc11d7c9847208da3f1245b38c5981df79.datapack

	- Two packs doesn't meet the bar for repack. Only turn revlogs into a pack.			- Two packs meets the bar. Repack new revlogs and old pack into one.
	$ echo >> a			$ echo >> a
	$ hg commit -m 'modify a'			$ hg commit -m 'modify a'
	$ hg repack --incremental --config remotefilelog.data.generations=300,20			$ hg repack --incremental --config remotefilelog.data.generations=300,20
	$ ls_l .hg/cache/packs/manifests/ \| grep datapack			$ ls_l .hg/cache/packs/manifests/ \| grep datapack
	-r--r--r-- 154 0adbde90bc92c6f23e46180a9d7885c8e2499173.datapack
	-r--r--r-- 264 e9093d2d887ff14457d43338fcb3994e92051853.datapack
	-r--r--r-- 154 f9657fdc11d7c9847208da3f1245b38c5981df79.datapack

	- Three packs meets the bar. Repack new revlogs and old pack into one.
	$ hg repack --incremental --config remotefilelog.data.generations=300,20
	$ ls_l .hg/cache/packs/manifests/ \| grep datapack
	-r--r--r-- 496 bc6c2ebb080844d7a227dacbc847a5b375ec620c.datapack			-r--r--r-- 496 bc6c2ebb080844d7a227dacbc847a5b375ec620c.datapack

			Path	Packages
M			remotefilelog/repack.py (48 lines)
M			tests/test-treemanifest-repack.t (70 lines)

This is an archive of the discontinued Mercurial Phabricator instance.

incremental-repack: prefer small packs across generations over largest ones
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 4068

remotefilelog/repack.py

tests/test-treemanifest-repack.t

This is an archive of the discontinued Mercurial Phabricator instance.

incremental-repack: prefer small packs across generations over largest onesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 4068

remotefilelog/repack.py

tests/test-treemanifest-repack.t

incremental-repack: prefer small packs across generations over largest ones
AbandonedPublic

Revision Contents
Changeset List