This is an archive of the discontinued Mercurial Phabricator instance.

Differential D732

lfs: implement remotefilelog prefetch
ClosedPublic

Authored by dsp on Sep 18 2017, 10:21 PM.

Download Raw Diff

Details

Reviewers

quark

Group Reviewers

Restricted Project

Commits

rFBHGX67bcc7709933: lfs: implement remotefilelog prefetch

Summary

When calling prefetch in remotefilelog, also prefetch lfs
files.

We are using the same hook mechanism that remotefilelog is already using
for LFS by having remotefilelog call into LFS.

Test Plan

run tests on test-lfs-remotefilelog-prefetch.t

Diff Detail

Repository

rFBHGX Facebook Mercurial Extensions

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dsp created this revision.Sep 18 2017, 10:21 PM

Herald added a reviewer: Restricted Project. · View Herald TranscriptSep 18 2017, 10:21 PM

Nice feature! Could you move the fileserverclient wrapper related methods from lfs to remtoefilelog?

Currently, lfs does not couple with any remotefilelog internals (ex. fileserverclient) intentionally as an attempt to make future upstreaming work easier. The filelog wrapper code works both for remotefilelog and hg filelog.

This revision now requires changes to proceed.Sep 18 2017, 10:51 PM

lgtm but @quark's request is probably reasonable.

hgext3rd/lfs/wrapper.py
258	Generally `hash` is used to refer to the 40 character version, and `node` refers to the 20 character one. So usually we dont' import the node module directly because it prevents us from using 'node' as a variable, so we just import bin/hex directly.

dsp updated this revision to Diff 1914.Sep 19 2017, 11:12 PM

Thanks!

This revision is now accepted and ready to land.Sep 20 2017, 9:45 AM

mjpieters mentioned this in D765: Linting fixes, remove usused import and safehasattr.Sep 21 2017, 11:03 AM

Closed by commit rFBHGX67bcc7709933: lfs: implement remotefilelog prefetch (authored by dsp). · Explain WhySep 21 2017, 5:53 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

		Path
M		hgext3rd/lfs/wrapper.py (5 lines)
M		remotefilelog/__init__.py (1 line)
M		remotefilelog/fileserverclient.py (24 lines)
M		tests/library.sh (24 lines)
A	M	tests/test-remotefilelog-lfs-prefetch.t (99 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	1890		Sep 18 2017, 10:21 PM	★	★
Diff 2	1914		Sep 19 2017, 11:12 PM	★	★
Diff 3	1980	rFBHGX67bcc77099332ff176e6838673eda535244ec639	Sep 20 2017, 5:02 PM	★	★

Diff 1980

hgext3rd/lfs/wrapper.py

	# wrapper.py - methods wrapping core mercurial logic			# wrapper.py - methods wrapping core mercurial logic
	#			#
	# Copyright 2017 Facebook, Inc.			# Copyright 2017 Facebook, Inc.
	#			#
	# This software may be used and distributed according to the terms of the			# This software may be used and distributed according to the terms of the
	# GNU General Public License version 2 or any later version.			# GNU General Public License version 2 or any later version.

	from __future__ import absolute_import			from __future__ import absolute_import

	import hashlib			import hashlib

	from mercurial import (			from mercurial import (
	error,			error,
	filelog,			filelog,
				node,
	revlog,			revlog,
	util,			util,
	)			)
	from mercurial.i18n import _			from mercurial.i18n import _
	from mercurial.node import bin, nullid, short			from mercurial.node import bin, nullid, short

	from . import (			from . import (
	blobstore,			blobstore,
	for name in ['lfslocalblobstore', 'lfsremoteblobstore']:			for name in ['lfslocalblobstore', 'lfsremoteblobstore']:
	if util.safehasattr(othervfs, name):			if util.safehasattr(othervfs, name):
	setattr(self, name, getattr(othervfs, name))			setattr(self, name, getattr(othervfs, name))

	def _canskipupload(repo):			def _canskipupload(repo):
	# if remotestore is a null store, upload is a no-op and can be skipped			# if remotestore is a null store, upload is a no-op and can be skipped
	return isinstance(repo.svfs.lfsremoteblobstore, blobstore._nullremote)			return isinstance(repo.svfs.lfsremoteblobstore, blobstore._nullremote)

				def candownload(repo):
				# if remotestore is a null store, downloads will lead to nothing
				return not isinstance(repo.svfs.lfsremoteblobstore, blobstore._nullremote)

	def uploadblobsfromrevs(repo, revs):			def uploadblobsfromrevs(repo, revs):
	'''upload lfs blobs introduced by revs			'''upload lfs blobs introduced by revs

	Note: also used by other extensions e. g. infinitepush. avoid renaming.			Note: also used by other extensions e. g. infinitepush. avoid renaming.
	'''			'''
	if _canskipupload(repo):			if _canskipupload(repo):
	return			return
	pointers = extractpointers(repo, revs)			pointers = extractpointers(repo, revs)
	return result			return result

	def uploadblobs(repo, pointers):			def uploadblobs(repo, pointers):
	"""upload given pointers from local blobstore"""			"""upload given pointers from local blobstore"""
	if not pointers:			if not pointers:
	return			return

	remoteblob = repo.svfs.lfsremoteblobstore			remoteblob = repo.svfs.lfsremoteblobstore
	remoteblob.writebatch(pointers, repo.svfs.lfslocalblobstore)			remoteblob.writebatch(pointers, repo.svfs.lfslocalblobstore)
				durhamUnsubmitted Not Done Generally `hash` is used to refer to the 40 character version, and `node` refers to the 20 character one. So usually we dont' import the node module directly because it prevents us from using 'node' as a variable, so we just import bin/hex directly. durham: Generally `hash` is used to refer to the 40 character version, and `node` refers to the 20…

remotefilelog/init.py

	def _lfsloaded(loaded=False):			def _lfsloaded(loaded=False):
	lfsmod = None			lfsmod = None
	try:			try:
	lfsmod = extensions.find('lfs')			lfsmod = extensions.find('lfs')
	except KeyError:			except KeyError:
	pass			pass
	if lfsmod:			if lfsmod:
	lfsmod.wrapfilelog(remotefilelog.remotefilelog)			lfsmod.wrapfilelog(remotefilelog.remotefilelog)
				fileserverclient._lfsmod = lfsmod
	extensions.afterloaded('lfs', _lfsloaded)			extensions.afterloaded('lfs', _lfsloaded)

	# debugdata needs remotefilelog.len to work			# debugdata needs remotefilelog.len to work
	extensions.wrapcommand(commands.table, 'debugdata', debugdatashallow)			extensions.wrapcommand(commands.table, 'debugdata', debugdatashallow)

	def cloneshallow(orig, ui, repo, args, *opts):			def cloneshallow(orig, ui, repo, args, *opts):
	if opts.get('shallow'):			if opts.get('shallow'):
	repos = []			repos = []

remotefilelog/fileserverclient.py

	import hashlib, os, time, io, struct			import hashlib, os, time, io, struct
	import itertools			import itertools

	from mercurial.i18n import _			from mercurial.i18n import _
	from mercurial.node import hex, bin, nullid			from mercurial.node import hex, bin, nullid
	from mercurial import (			from mercurial import (
	error,			error,
	httppeer,			httppeer,
				revlog,
	sshpeer,			sshpeer,
	util,			util,
	util,
	wireproto,			wireproto,
	)			)

	from . import (			from . import (
	connectionpool,			connectionpool,
	constants,			constants,
	shallowutil,			shallowutil,
	wirepack,			wirepack,
	)			)
	from .contentstore import unioncontentstore			from .contentstore import unioncontentstore
	from .metadatastore import unionmetadatastore			from .metadatastore import unionmetadatastore
	from .lz4wrapper import lz4decompress			from .lz4wrapper import lz4decompress

	# Statistics for debugging			# Statistics for debugging
	fetchcost = 0			fetchcost = 0
	fetches = 0			fetches = 0
	fetched = 0			fetched = 0
	fetchmisses = 0			fetchmisses = 0

				_lfsmod = None
	_downloading = _('downloading')			_downloading = _('downloading')

	def getcachekey(reponame, file, id):			def getcachekey(reponame, file, id):
	pathhash = hashlib.sha1(file).hexdigest()			pathhash = hashlib.sha1(file).hexdigest()
	return os.path.join(reponame, pathhash[:2], pathhash[2:], id)			return os.path.join(reponame, pathhash[:2], pathhash[2:], id)

	def getlocalkey(file, id):			def getlocalkey(file, id):
	pathhash = hashlib.sha1(file).hexdigest()			pathhash = hashlib.sha1(file).hexdigest()
	missingids = [(file, hex(id)) for file, id in missingids]			missingids = [(file, hex(id)) for file, id in missingids]
	fetched += len(missingids)			fetched += len(missingids)
	start = time.time()			start = time.time()
	missingids = self.request(missingids)			missingids = self.request(missingids)
	if missingids:			if missingids:
	raise error.Abort(_("unable to download %d files") %			raise error.Abort(_("unable to download %d files") %
	len(missingids))			len(missingids))
	fetchcost += time.time() - start			fetchcost += time.time() - start
				self._lfsprefetch(fileids)

				def _lfsprefetch(self, fileids):
				if not _lfsmod or not hasattr(self.repo.svfs, 'lfslocalblobstore'):
				return
				if not _lfsmod.wrapper.candownload(self.repo):
				return
				pointers = []
				store = self.repo.svfs.lfslocalblobstore
				for file, id in fileids:
				nodehash = bin(id)
				rlog = self.repo.file(file)
				if rlog.flags(nodehash) & revlog.REVIDX_EXTSTORED:
				text = rlog.revision(nodehash, raw=True)
				p = _lfsmod.pointer.deserialize(text)
				oid = p.oid()
				if not store.has(oid):
				pointers.append(p)
				if len(pointers) > 0:
				self.repo.svfs.lfsremoteblobstore.readbatch(pointers, store)
				assert all(store.has(p.oid()) for p in pointers)

	def logstacktrace(self):			def logstacktrace(self):
	import traceback			import traceback
	self.ui.log('remotefilelog', 'excess remotefilelog fetching:\n%s',			self.ui.log('remotefilelog', 'excess remotefilelog fetching:\n%s',
	''.join(traceback.format_stack()))			''.join(traceback.format_stack()))

tests/library.sh

	[remotefilelog]			[remotefilelog]
	reponame=master			reponame=master
	datapackversion=1			datapackversion=1
	[phases]			[phases]
	publish=False			publish=False
	EOF			EOF
	}			}

				hgcloneshallowlfs() {
				local name
				local dest
				local lfsdir
				orig=$1
				shift
				dest=$1
				shift
				lfsdir=$1
				shift
				hg clone --shallow --config "extensions.lfs=" --config "lfs.url=$lfsdir" --config remotefilelog.reponame=master $orig $dest $@
				cat >> $dest/.hg/hgrc <<EOF
				[extensions]
				lfs=
				[lfs]
				url=$lfsdir
				[remotefilelog]
				reponame=master
				datapackversion=1
				[phases]
				publish=False
				EOF
				}

	hginit() {			hginit() {
	local name			local name
	name=$1			name=$1
	shift			shift
	hg init $name $@			hg init $name $@
	}			}

	clearcache() {			clearcache() {

tests/test-remotefilelog-lfs-prefetch.t

This file was added.

				$ PYTHONPATH=$TESTDIR/..:$PYTHONPATH
				$ export PYTHONPATH
				$ LFSPATH=$TESTTMP/lfs
				$ export LFSPATH
				$ mkdir $LFSPATH

				$ . "$TESTDIR/library.sh"

				$ hginit master
				$ cd master
				$ cat >> $HGRCPATH <<EOF
				> [extensions]
				> lfs=$TESTDIR/../hgext3rd/lfs
				> [lfs]
				> url=file://$LFSPATH
				> EOF
				$ cat >> .hg/hgrc <<EOF
				> [remotefilelog]
				> server=True
				> EOF
				$ echo x > x
				$ echo z > z
				$ hg commit -qAm x
				$ echo x2 > x
				$ echo y > y
				$ hg commit -qAm y
				$ echo large > large
				$ hg --config 'lfs.threshold=1' commit -qAm y
				$ hg bookmark foo
				$ hg debuglfsupload -r tip

				$ cd ..

				# prefetch a revision

				$ hgcloneshallowlfs ssh://user@dummy/master shallow file://$LFSPATH --noupdate
				streaming all changes
				2 files to transfer, 774 bytes of data
				transferred 774 bytes in * seconds (*/sec) (glob)
				searching for changes
				no changes found
				$ cd shallow

				$ hg prefetch -r 0
				2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob)

				$ hg cat -r 0 x
				x

				# prefetch a range of revisions

				$ clearcache
				$ hg prefetch -r 0::1
				4 files fetched over 1 fetches - (4 misses, 0.00% hit ratio) over *s (glob)

				$ hg cat -r 0 x
				x
				$ hg cat -r 1 x
				x2

				# prefetch certain files

				$ clearcache
				$ hg prefetch -r 1 x
				1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)

				$ hg cat -r 1 x
				x2

				$ hg cat -r 1 y
				y
				1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)

				# prefetch large file

				$ hg prefetch -r 2
				2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob)

				# prefetch on pull when configured

				$ printf "[remotefilelog]\npullprefetch=bookmark()\n" >> .hg/hgrc
				$ hg strip tip
				saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/730e2b7b175c-acada81e-backup.hg (glob)

				$ clearcache
				$ hg pull
				pulling from ssh://user@dummy/master
				searching for changes
				adding changesets
				adding manifests
				adding file changes
				added 1 changesets with 0 changes to 0 files
				updating bookmark foo
				(run 'hg update' to get a working copy)
				prefetching file contents
				4 files fetched over 1 fetches - (4 misses, 0.00% hit ratio) over *s (glob)

				$ hg up tip
				4 files updated, 0 files merged, 0 files removed, 0 files unresolved