diff --git a/hgext/remotefilelog/README.md b/hgext/remotefilelog/README.md new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/README.md @@ -0,0 +1,111 @@ +remotefilelog +============= + +The remotefilelog extension allows Mercurial to clone shallow copies of a repository such that all file contents are left on the server and only downloaded on demand by the client. This greatly speeds up clone and pull performance for repositories that have long histories or that are growing quickly. + +In addition, the extension allows using a caching layer (such as memcache) to serve the file contents, thus providing better scalability and reducing server load. + +Installing +========== + +**NOTE:** See the limitations section below to check if remotefilelog will work for your use case. + +remotefilelog can be installed like any other Mercurial extension. Download the source code and add the remotefilelog subdirectory to your `hgrc`: + + :::ini + [extensions] + remotefilelog=path/to/remotefilelog/remotefilelog + +The extension currently has a hard dependency on lz4, so the [lz4 python library](https://pypi.python.org/pypi/lz4) must be installed on both servers and clients. + +Configuring +----------- + +**Server** + +* `server` (required) - Set to 'True' to indicate that the server can serve shallow clones. +* `serverexpiration` - The server keeps a local cache of recently requested file revision blobs in .hg/remotefilelogcache. This setting specifies how many days they should be kept locally. Defaults to 30. + +An example server configuration: + + :::ini + [remotefilelog] + server = True + serverexpiration = 14 + +**Client** + +* `cachepath` (required) - the location to store locally cached file revisions +* `cachelimit` - the maximum size of the cachepath. By default it's 1000 GB. +* `cachegroup` - the default unix group for the cachepath. Useful on shared systems so multiple users can read and write to the same cache. +* `cacheprocess` - the external process that will handle the remote caching layer. If not set, all requests will go to the Mercurial server. +* `fallbackpath` - the Mercurial repo path to fetch file revisions from. By default it uses the paths.default repo. This setting is useful for cloning from shallow clones and still talking to the central server for file revisions. +* `includepattern` - a list of regex patterns matching files that should be kept remotely. Defaults to all files. +* `excludepattern` - a list of regex patterns matching files that should not be kept remotely and should always be downloaded. +* `pullprefetch` - a revset of commits whose file content should be prefetched after every pull. The most common value for this will be '(bookmark() + head()) & public()'. This is useful in environments where offline work is common, since it will enable offline updating to, rebasing to, and committing on every head and bookmark. + +An example client configuration: + + :::ini + [remotefilelog] + cachepath = /dev/shm/hgcache + cachelimit = 2 GB + +Using as a largefiles replacement +--------------------------------- + +remotefilelog can theoretically be used as a replacement for the largefiles extension. You can use the `includepattern` setting to specify which directories or file types are considered large and they will be left on the server. Unlike the largefiles extension, this can be done without converting the server repository. Only the client configuration needs to specify the patterns. + +The include/exclude settings haven't been extensively tested, so this feature is still considered experimental. + +An example largefiles style client configuration: + + :::ini + [remotefilelog] + cachepath = /dev/shm/hgcache + cachelimit = 2 GB + includepattern = *.sql3 + bin/* + +Usage +===== + +Once you have configured the server, you can get a shallow clone by doing: + + :::bash + hg clone --shallow ssh://server//path/repo + +After that, all normal mercurial commands should work. + +Occasionly the client or server caches may grow too big. Run `hg gc` to clean up the cache. It will remove cached files that appear to no longer be necessary, or any files that exceed the configured maximum size. This does not improve performance; it just frees up space. + +Limitations +=========== + +1. The extension must be used with Mercurial 3.3 (commit d7d08337b3f6) or higher (earlier versions of the extension work with earlier versions of Mercurial though, up to Mercurial 2.7). + +2. remotefilelog has only been tested on linux with case-sensitive filesystems. It should work on other unix systems but may have problems on case-insensitive filesystems. + +3. remotefilelog only works with ssh based Mercurial repos. http based repos are currently not supported, though it shouldn't be too difficult for some motivated individual to implement. + +4. Tags are not supported in completely shallow repos. If you use tags in your repo you will have to specify `excludepattern=.hgtags` in your client configuration to ensure that file is downloaded. The include/excludepattern settings are experimental at the moment and have yet to be deployed in a production environment. + +5. A few commands will be slower. `hg log ` will be much slower since it has to walk the entire commit history instead of just the filelog. Use `hg log -f ` instead, which remains very fast. + +Contributing +============ + +Patches are welcome as pull requests, though they will be collapsed and rebased to maintain a linear history. Tests can be run via: + + :::bash + cd tests + ./run-tests --with-hg=path/to/hgrepo/hg + +We (Facebook) have to ask for a "Contributor License Agreement" from someone who sends in a patch or code that we want to include in the codebase. This is a legal requirement; a similar situation applies to Apache and other ASF projects. + +If we ask you to fill out a CLA we'll direct you to our [online CLA page](https://developers.facebook.com/opensource/cla) where you can complete it easily. We use the same form as the Apache CLA so that friction is minimal. + +License +======= + +remotefilelog is made available under the terms of the GNU General Public License version 2, or any later version. See the COPYING file that accompanies this distribution for the full text of the license. diff --git a/hgext/remotefilelog/__init__.py b/hgext/remotefilelog/__init__.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/__init__.py @@ -0,0 +1,1106 @@ +# __init__.py - remotefilelog extension +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +"""remotefilelog causes Mercurial to lazilly fetch file contents (EXPERIMENTAL) + +Configs: + + ``packs.maxchainlen`` specifies the maximum delta chain length in pack files + ``packs.maxpacksize`` specifies the maximum pack file size + ``packs.maxpackfilecount`` specifies the maximum number of packs in the + shared cache (trees only for now) + ``remotefilelog.backgroundprefetch`` runs prefetch in background when True + ``remotefilelog.bgprefetchrevs`` specifies revisions to fetch on commit and + update, and on other commands that use them. Different from pullprefetch. + ``remotefilelog.gcrepack`` does garbage collection during repack when True + ``remotefilelog.nodettl`` specifies maximum TTL of a node in seconds before + it is garbage collected + ``remotefilelog.repackonhggc`` runs repack on hg gc when True + ``remotefilelog.prefetchdays`` specifies the maximum age of a commit in + days after which it is no longer prefetched. + ``remotefilelog.prefetchdelay`` specifies delay between background + prefetches in seconds after operations that change the working copy parent + ``remotefilelog.data.gencountlimit`` constraints the minimum number of data + pack files required to be considered part of a generation. In particular, + minimum number of packs files > gencountlimit. + ``remotefilelog.data.generations`` list for specifying the lower bound of + each generation of the data pack files. For example, list ['100MB','1MB'] + or ['1MB', '100MB'] will lead to three generations: [0, 1MB), [ + 1MB, 100MB) and [100MB, infinity). + ``remotefilelog.data.maxrepackpacks`` the maximum number of pack files to + include in an incremental data repack. + ``remotefilelog.data.repackmaxpacksize`` the maximum size of a pack file for + it to be considered for an incremental data repack. + ``remotefilelog.data.repacksizelimit`` the maximum total size of pack files + to include in an incremental data repack. + ``remotefilelog.history.gencountlimit`` constraints the minimum number of + history pack files required to be considered part of a generation. In + particular, minimum number of packs files > gencountlimit. + ``remotefilelog.history.generations`` list for specifying the lower bound of + each generation of the historhy pack files. For example, list [ + '100MB', '1MB'] or ['1MB', '100MB'] will lead to three generations: [ + 0, 1MB), [1MB, 100MB) and [100MB, infinity). + ``remotefilelog.history.maxrepackpacks`` the maximum number of pack files to + include in an incremental history repack. + ``remotefilelog.history.repackmaxpacksize`` the maximum size of a pack file + for it to be considered for an incremental history repack. + ``remotefilelog.history.repacksizelimit`` the maximum total size of pack + files to include in an incremental history repack. + ``remotefilelog.backgroundrepack`` automatically consolidate packs in the + background + ``remotefilelog.cachepath`` path to cache + ``remotefilelog.cachegroup`` if set, make cache directory sgid to this + group + ``remotefilelog.cacheprocess`` binary to invoke for fetching file data + ``remotefilelog.debug`` turn on remotefilelog-specific debug output + ``remotefilelog.excludepattern`` pattern of files to exclude from pulls + ``remotefilelog.includepattern``pattern of files to include in pulls + ``remotefilelog.fetchpacks`` if set, fetch pre-packed files from the server + ``remotefilelog.fetchwarning``: message to print when too many + single-file fetches occur + ``remotefilelog.getfilesstep`` number of files to request in a single RPC + ``remotefilelog.getfilestype`` if set to 'threaded' use threads to fetch + files, otherwise use optimistic fetching + ``remotefilelog.pullprefetch`` revset for selecting files that should be + eagerly downloaded rather than lazily + ``remotefilelog.reponame`` name of the repo. If set, used to partition + data from other repos in a shared store. + ``remotefilelog.server`` if true, enable server-side functionality + ``remotefilelog.servercachepath`` path for caching blobs on the server + ``remotefilelog.serverexpiration`` number of days to keep cached server + blobs + ``remotefilelog.validatecache`` if set, check cache entries for corruption + before returning blobs + ``remotefilelog.validatecachelog`` if set, check cache entries for + corruption before returning metadata + +""" +from __future__ import absolute_import + +import os +import time +import traceback + +from mercurial.node import hex +from mercurial.i18n import _ +from mercurial import ( + changegroup, + changelog, + cmdutil, + commands, + configitems, + context, + copies, + debugcommands as hgdebugcommands, + dispatch, + error, + exchange, + extensions, + hg, + localrepo, + match, + merge, + node as nodemod, + patch, + registrar, + repair, + repoview, + revset, + scmutil, + smartset, + templatekw, + util, +) +from . import ( + debugcommands, + fileserverclient, + remotefilectx, + remotefilelog, + remotefilelogserver, + repack as repackmod, + shallowbundle, + shallowrepo, + shallowstore, + shallowutil, + shallowverifier, +) + +# ensures debug commands are registered +hgdebugcommands.command + +try: + from mercurial import streamclone + streamclone._walkstreamfiles + hasstreamclone = True +except Exception: + hasstreamclone = False + +cmdtable = {} +command = registrar.command(cmdtable) + +configtable = {} +configitem = registrar.configitem(configtable) + +configitem('remotefilelog', 'debug', default=False) + +configitem('remotefilelog', 'reponame', default='') +configitem('remotefilelog', 'cachepath', default=None) +configitem('remotefilelog', 'cachegroup', default=None) +configitem('remotefilelog', 'cacheprocess', default=None) +configitem('remotefilelog', 'cacheprocess.includepath', default=None) +configitem("remotefilelog", "cachelimit", default="1000 GB") + +configitem('remotefilelog', 'fetchpacks', default=False) +configitem('remotefilelog', 'fallbackpath', default=configitems.dynamicdefault, + alias=[('remotefilelog', 'fallbackrepo')]) + +configitem('remotefilelog', 'validatecachelog', default=None) +configitem('remotefilelog', 'validatecache', default='on') +configitem('remotefilelog', 'server', default=None) +configitem('remotefilelog', 'servercachepath', default=None) +configitem("remotefilelog", "serverexpiration", default=30) +configitem('remotefilelog', 'backgroundrepack', default=False) +configitem('remotefilelog', 'bgprefetchrevs', default=None) +configitem('remotefilelog', 'pullprefetch', default=None) +configitem('remotefilelog', 'backgroundprefetch', default=False) +configitem('remotefilelog', 'prefetchdelay', default=120) +configitem('remotefilelog', 'prefetchdays', default=14) + +configitem('remotefilelog', 'getfilesstep', default=10000) +configitem('remotefilelog', 'getfilestype', default='optimistic') +configitem('remotefilelog', 'batchsize', configitems.dynamicdefault) +configitem('remotefilelog', 'fetchwarning', default='') + +configitem('remotefilelog', 'includepattern', default=None) +configitem('remotefilelog', 'excludepattern', default=None) + +configitem('remotefilelog', 'gcrepack', default=False) +configitem('remotefilelog', 'repackonhggc', default=False) +configitem('remotefilelog', 'datapackversion', default=0) +configitem('repack', 'chainorphansbysize', default=True) + +configitem('packs', 'maxpacksize', default=0) +configitem('packs', 'maxchainlen', default=1000) + +configitem('remotefilelog', 'historypackv1', default=False) +# default TTL limit is 30 days +_defaultlimit = 60 * 60 * 24 * 30 +configitem('remotefilelog', 'nodettl', default=_defaultlimit) + +configitem('remotefilelog', 'data.gencountlimit', default=2), +configitem('remotefilelog', 'data.generations', + default=['1GB', '100MB', '1MB']) +configitem('remotefilelog', 'data.maxrepackpacks', default=50) +configitem('remotefilelog', 'data.repackmaxpacksize', default='4GB') +configitem('remotefilelog', 'data.repacksizelimit', default='100MB') + +configitem('remotefilelog', 'history.gencountlimit', default=2), +configitem('remotefilelog', 'history.generations', default=['100MB']) +configitem('remotefilelog', 'history.maxrepackpacks', default=50) +configitem('remotefilelog', 'history.repackmaxpacksize', default='400MB') +configitem('remotefilelog', 'history.repacksizelimit', default='100MB') + +# Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for +# extensions which SHIP WITH MERCURIAL. Non-mainline extensions should +# be specifying the version(s) of Mercurial they are tested with, or +# leave the attribute unspecified. +testedwith = 'ships-with-hg-core' + +repoclass = localrepo.localrepository +repoclass._basesupported.add(shallowrepo.requirement) + +def uisetup(ui): + """Wraps user facing Mercurial commands to swap them out with shallow + versions. + """ + hg.wirepeersetupfuncs.append(fileserverclient.peersetup) + + entry = extensions.wrapcommand(commands.table, 'clone', cloneshallow) + entry[1].append(('', 'shallow', None, + _("create a shallow clone which uses remote file " + "history"))) + + extensions.wrapcommand(commands.table, 'debugindex', + debugcommands.debugindex) + extensions.wrapcommand(commands.table, 'debugindexdot', + debugcommands.debugindexdot) + extensions.wrapcommand(commands.table, 'log', log) + extensions.wrapcommand(commands.table, 'pull', pull) + + # Prevent 'hg manifest --all' + def _manifest(orig, ui, repo, *args, **opts): + if shallowrepo.requirement in repo.requirements and opts.get('all'): + raise error.Abort(_("--all is not supported in a shallow repo")) + + return orig(ui, repo, *args, **opts) + extensions.wrapcommand(commands.table, "manifest", _manifest) + + # Wrap remotefilelog with lfs code + def _lfsloaded(loaded=False): + lfsmod = None + try: + lfsmod = extensions.find('lfs') + except KeyError: + pass + if lfsmod: + lfsmod.wrapfilelog(remotefilelog.remotefilelog) + fileserverclient._lfsmod = lfsmod + extensions.afterloaded('lfs', _lfsloaded) + + # debugdata needs remotefilelog.len to work + extensions.wrapcommand(commands.table, 'debugdata', debugdatashallow) + +def cloneshallow(orig, ui, repo, *args, **opts): + if opts.get('shallow'): + repos = [] + def pull_shallow(orig, self, *args, **kwargs): + if shallowrepo.requirement not in self.requirements: + repos.append(self.unfiltered()) + # set up the client hooks so the post-clone update works + setupclient(self.ui, self.unfiltered()) + + # setupclient fixed the class on the repo itself + # but we also need to fix it on the repoview + if isinstance(self, repoview.repoview): + self.__class__.__bases__ = (self.__class__.__bases__[0], + self.unfiltered().__class__) + self.requirements.add(shallowrepo.requirement) + self._writerequirements() + + # Since setupclient hadn't been called, exchange.pull was not + # wrapped. So we need to manually invoke our version of it. + return exchangepull(orig, self, *args, **kwargs) + else: + return orig(self, *args, **kwargs) + extensions.wrapfunction(exchange, 'pull', pull_shallow) + + # Wrap the stream logic to add requirements and to pass include/exclude + # patterns around. + def setup_streamout(repo, remote): + # Replace remote.stream_out with a version that sends file + # patterns. + def stream_out_shallow(orig): + caps = remote.capabilities() + if shallowrepo.requirement in caps: + opts = {} + if repo.includepattern: + opts['includepattern'] = '\0'.join(repo.includepattern) + if repo.excludepattern: + opts['excludepattern'] = '\0'.join(repo.excludepattern) + return remote._callstream('stream_out_shallow', **opts) + else: + return orig() + extensions.wrapfunction(remote, 'stream_out', stream_out_shallow) + if hasstreamclone: + def stream_wrap(orig, op): + setup_streamout(op.repo, op.remote) + return orig(op) + extensions.wrapfunction( + streamclone, 'maybeperformlegacystreamclone', stream_wrap) + + def canperformstreamclone(orig, pullop, bundle2=False): + # remotefilelog is currently incompatible with the + # bundle2 flavor of streamclones, so force us to use + # v1 instead. + if 'v2' in pullop.remotebundle2caps.get('stream', []): + pullop.remotebundle2caps['stream'] = [ + c for c in pullop.remotebundle2caps['stream'] + if c != 'v2'] + if bundle2: + return False, None + supported, requirements = orig(pullop, bundle2=bundle2) + if requirements is not None: + requirements.add(shallowrepo.requirement) + return supported, requirements + extensions.wrapfunction( + streamclone, 'canperformstreamclone', canperformstreamclone) + else: + def stream_in_shallow(orig, repo, remote, requirements): + setup_streamout(repo, remote) + requirements.add(shallowrepo.requirement) + return orig(repo, remote, requirements) + extensions.wrapfunction( + localrepo.localrepository, 'stream_in', stream_in_shallow) + + try: + orig(ui, repo, *args, **opts) + finally: + if opts.get('shallow'): + for r in repos: + if util.safehasattr(r, 'fileservice'): + r.fileservice.close() + +def debugdatashallow(orig, *args, **kwds): + oldlen = remotefilelog.remotefilelog.__len__ + try: + remotefilelog.remotefilelog.__len__ = lambda x: 1 + return orig(*args, **kwds) + finally: + remotefilelog.remotefilelog.__len__ = oldlen + +def reposetup(ui, repo): + if not isinstance(repo, localrepo.localrepository): + return + + # put here intentionally bc doesnt work in uisetup + ui.setconfig('hooks', 'update.prefetch', wcpprefetch) + ui.setconfig('hooks', 'commit.prefetch', wcpprefetch) + + isserverenabled = ui.configbool('remotefilelog', 'server') + isshallowclient = shallowrepo.requirement in repo.requirements + + if isserverenabled and isshallowclient: + raise RuntimeError("Cannot be both a server and shallow client.") + + if isshallowclient: + setupclient(ui, repo) + + if isserverenabled: + remotefilelogserver.setupserver(ui, repo) + +def setupclient(ui, repo): + if not isinstance(repo, localrepo.localrepository): + return + + # Even clients get the server setup since they need to have the + # wireprotocol endpoints registered. + remotefilelogserver.onetimesetup(ui) + onetimeclientsetup(ui) + + shallowrepo.wraprepo(repo) + repo.store = shallowstore.wrapstore(repo.store) + +clientonetime = False +def onetimeclientsetup(ui): + global clientonetime + if clientonetime: + return + clientonetime = True + + changegroup.cgpacker = shallowbundle.shallowcg1packer + + extensions.wrapfunction(changegroup, '_addchangegroupfiles', + shallowbundle.addchangegroupfiles) + extensions.wrapfunction( + changegroup, 'makechangegroup', shallowbundle.makechangegroup) + + def storewrapper(orig, requirements, path, vfstype): + s = orig(requirements, path, vfstype) + if shallowrepo.requirement in requirements: + s = shallowstore.wrapstore(s) + + return s + extensions.wrapfunction(localrepo, 'makestore', storewrapper) + + extensions.wrapfunction(exchange, 'pull', exchangepull) + + # prefetch files before update + def applyupdates(orig, repo, actions, wctx, mctx, overwrite, labels=None): + if shallowrepo.requirement in repo.requirements: + manifest = mctx.manifest() + files = [] + for f, args, msg in actions['g']: + files.append((f, hex(manifest[f]))) + # batch fetch the needed files from the server + repo.fileservice.prefetch(files) + return orig(repo, actions, wctx, mctx, overwrite, labels=labels) + extensions.wrapfunction(merge, 'applyupdates', applyupdates) + + # Prefetch merge checkunknownfiles + def checkunknownfiles(orig, repo, wctx, mctx, force, actions, + *args, **kwargs): + if shallowrepo.requirement in repo.requirements: + files = [] + sparsematch = repo.maybesparsematch(mctx.rev()) + for f, (m, actionargs, msg) in actions.iteritems(): + if sparsematch and not sparsematch(f): + continue + if m in ('c', 'dc', 'cm'): + files.append((f, hex(mctx.filenode(f)))) + elif m == 'dg': + f2 = actionargs[0] + files.append((f2, hex(mctx.filenode(f2)))) + # batch fetch the needed files from the server + repo.fileservice.prefetch(files) + return orig(repo, wctx, mctx, force, actions, *args, **kwargs) + extensions.wrapfunction(merge, '_checkunknownfiles', checkunknownfiles) + + # Prefetch files before status attempts to look at their size and contents + def checklookup(orig, self, files): + repo = self._repo + if shallowrepo.requirement in repo.requirements: + prefetchfiles = [] + for parent in self._parents: + for f in files: + if f in parent: + prefetchfiles.append((f, hex(parent.filenode(f)))) + # batch fetch the needed files from the server + repo.fileservice.prefetch(prefetchfiles) + return orig(self, files) + extensions.wrapfunction(context.workingctx, '_checklookup', checklookup) + + # Prefetch the logic that compares added and removed files for renames + def findrenames(orig, repo, matcher, added, removed, *args, **kwargs): + if shallowrepo.requirement in repo.requirements: + files = [] + parentctx = repo['.'] + for f in removed: + files.append((f, hex(parentctx.filenode(f)))) + # batch fetch the needed files from the server + repo.fileservice.prefetch(files) + return orig(repo, matcher, added, removed, *args, **kwargs) + extensions.wrapfunction(scmutil, '_findrenames', findrenames) + + # prefetch files before mergecopies check + def computenonoverlap(orig, repo, c1, c2, *args, **kwargs): + u1, u2 = orig(repo, c1, c2, *args, **kwargs) + if shallowrepo.requirement in repo.requirements: + m1 = c1.manifest() + m2 = c2.manifest() + files = [] + + sparsematch1 = repo.maybesparsematch(c1.rev()) + if sparsematch1: + sparseu1 = [] + for f in u1: + if sparsematch1(f): + files.append((f, hex(m1[f]))) + sparseu1.append(f) + u1 = sparseu1 + + sparsematch2 = repo.maybesparsematch(c2.rev()) + if sparsematch2: + sparseu2 = [] + for f in u2: + if sparsematch2(f): + files.append((f, hex(m2[f]))) + sparseu2.append(f) + u2 = sparseu2 + + # batch fetch the needed files from the server + repo.fileservice.prefetch(files) + return u1, u2 + extensions.wrapfunction(copies, '_computenonoverlap', computenonoverlap) + + # prefetch files before pathcopies check + def computeforwardmissing(orig, a, b, match=None): + missing = list(orig(a, b, match=match)) + repo = a._repo + if shallowrepo.requirement in repo.requirements: + mb = b.manifest() + + files = [] + sparsematch = repo.maybesparsematch(b.rev()) + if sparsematch: + sparsemissing = [] + for f in missing: + if sparsematch(f): + files.append((f, hex(mb[f]))) + sparsemissing.append(f) + missing = sparsemissing + + # batch fetch the needed files from the server + repo.fileservice.prefetch(files) + return missing + extensions.wrapfunction(copies, '_computeforwardmissing', + computeforwardmissing) + + # close cache miss server connection after the command has finished + def runcommand(orig, lui, repo, *args, **kwargs): + try: + return orig(lui, repo, *args, **kwargs) + finally: + # repo can be None when running in chg: + # - at startup, reposetup was called because serve is not norepo + # - a norepo command like "help" is called + if repo and shallowrepo.requirement in repo.requirements: + repo.fileservice.close() + extensions.wrapfunction(dispatch, 'runcommand', runcommand) + + # disappointing hacks below + templatekw.getrenamedfn = getrenamedfn + extensions.wrapfunction(revset, 'filelog', filelogrevset) + revset.symbols['filelog'] = revset.filelog + extensions.wrapfunction(cmdutil, 'walkfilerevs', walkfilerevs) + + # prevent strip from stripping remotefilelogs + def _collectbrokencsets(orig, repo, files, striprev): + if shallowrepo.requirement in repo.requirements: + files = list([f for f in files if not repo.shallowmatch(f)]) + return orig(repo, files, striprev) + extensions.wrapfunction(repair, '_collectbrokencsets', _collectbrokencsets) + + # Don't commit filelogs until we know the commit hash, since the hash + # is present in the filelog blob. + # This violates Mercurial's filelog->manifest->changelog write order, + # but is generally fine for client repos. + pendingfilecommits = [] + def addrawrevision(orig, self, rawtext, transaction, link, p1, p2, node, + flags, cachedelta=None, _metatuple=None): + if isinstance(link, int): + pendingfilecommits.append( + (self, rawtext, transaction, link, p1, p2, node, flags, + cachedelta, _metatuple)) + return node + else: + return orig(self, rawtext, transaction, link, p1, p2, node, flags, + cachedelta, _metatuple=_metatuple) + extensions.wrapfunction( + remotefilelog.remotefilelog, 'addrawrevision', addrawrevision) + + def changelogadd(orig, self, *args): + oldlen = len(self) + node = orig(self, *args) + newlen = len(self) + if oldlen != newlen: + for oldargs in pendingfilecommits: + log, rt, tr, link, p1, p2, n, fl, c, m = oldargs + linknode = self.node(link) + if linknode == node: + log.addrawrevision(rt, tr, linknode, p1, p2, n, fl, c, m) + else: + raise error.ProgrammingError( + 'pending multiple integer revisions are not supported') + else: + # "link" is actually wrong here (it is set to len(changelog)) + # if changelog remains unchanged, skip writing file revisions + # but still do a sanity check about pending multiple revisions + if len(set(x[3] for x in pendingfilecommits)) > 1: + raise error.ProgrammingError( + 'pending multiple integer revisions are not supported') + del pendingfilecommits[:] + return node + extensions.wrapfunction(changelog.changelog, 'add', changelogadd) + + # changectx wrappers + def filectx(orig, self, path, fileid=None, filelog=None): + if fileid is None: + fileid = self.filenode(path) + if (shallowrepo.requirement in self._repo.requirements and + self._repo.shallowmatch(path)): + return remotefilectx.remotefilectx(self._repo, path, + fileid=fileid, changectx=self, filelog=filelog) + return orig(self, path, fileid=fileid, filelog=filelog) + extensions.wrapfunction(context.changectx, 'filectx', filectx) + + def workingfilectx(orig, self, path, filelog=None): + if (shallowrepo.requirement in self._repo.requirements and + self._repo.shallowmatch(path)): + return remotefilectx.remoteworkingfilectx(self._repo, + path, workingctx=self, filelog=filelog) + return orig(self, path, filelog=filelog) + extensions.wrapfunction(context.workingctx, 'filectx', workingfilectx) + + # prefetch required revisions before a diff + def trydiff(orig, repo, revs, ctx1, ctx2, modified, added, removed, + copy, getfilectx, *args, **kwargs): + if shallowrepo.requirement in repo.requirements: + prefetch = [] + mf1 = ctx1.manifest() + for fname in modified + added + removed: + if fname in mf1: + fnode = getfilectx(fname, ctx1).filenode() + # fnode can be None if it's a edited working ctx file + if fnode: + prefetch.append((fname, hex(fnode))) + if fname not in removed: + fnode = getfilectx(fname, ctx2).filenode() + if fnode: + prefetch.append((fname, hex(fnode))) + + repo.fileservice.prefetch(prefetch) + + return orig(repo, revs, ctx1, ctx2, modified, added, removed, + copy, getfilectx, *args, **kwargs) + extensions.wrapfunction(patch, 'trydiff', trydiff) + + # Prevent verify from processing files + # a stub for mercurial.hg.verify() + def _verify(orig, repo): + lock = repo.lock() + try: + return shallowverifier.shallowverifier(repo).verify() + finally: + lock.release() + + extensions.wrapfunction(hg, 'verify', _verify) + + scmutil.fileprefetchhooks.add('remotefilelog', _fileprefetchhook) + +def getrenamedfn(repo, endrev=None): + rcache = {} + + def getrenamed(fn, rev): + '''looks up all renames for a file (up to endrev) the first + time the file is given. It indexes on the changerev and only + parses the manifest if linkrev != changerev. + Returns rename info for fn at changerev rev.''' + if rev in rcache.setdefault(fn, {}): + return rcache[fn][rev] + + try: + fctx = repo[rev].filectx(fn) + for ancestor in fctx.ancestors(): + if ancestor.path() == fn: + renamed = ancestor.renamed() + rcache[fn][ancestor.rev()] = renamed + + return fctx.renamed() + except error.LookupError: + return None + + return getrenamed + +def walkfilerevs(orig, repo, match, follow, revs, fncache): + if not shallowrepo.requirement in repo.requirements: + return orig(repo, match, follow, revs, fncache) + + # remotefilelog's can't be walked in rev order, so throw. + # The caller will see the exception and walk the commit tree instead. + if not follow: + raise cmdutil.FileWalkError("Cannot walk via filelog") + + wanted = set() + minrev, maxrev = min(revs), max(revs) + + pctx = repo['.'] + for filename in match.files(): + if filename not in pctx: + raise error.Abort(_('cannot follow file not in parent ' + 'revision: "%s"') % filename) + fctx = pctx[filename] + + linkrev = fctx.linkrev() + if linkrev >= minrev and linkrev <= maxrev: + fncache.setdefault(linkrev, []).append(filename) + wanted.add(linkrev) + + for ancestor in fctx.ancestors(): + linkrev = ancestor.linkrev() + if linkrev >= minrev and linkrev <= maxrev: + fncache.setdefault(linkrev, []).append(ancestor.path()) + wanted.add(linkrev) + + return wanted + +def filelogrevset(orig, repo, subset, x): + """``filelog(pattern)`` + Changesets connected to the specified filelog. + + For performance reasons, ``filelog()`` does not show every changeset + that affects the requested file(s). See :hg:`help log` for details. For + a slower, more accurate result, use ``file()``. + """ + + if not shallowrepo.requirement in repo.requirements: + return orig(repo, subset, x) + + # i18n: "filelog" is a keyword + pat = revset.getstring(x, _("filelog requires a pattern")) + m = match.match(repo.root, repo.getcwd(), [pat], default='relpath', + ctx=repo[None]) + s = set() + + if not match.patkind(pat): + # slow + for r in subset: + ctx = repo[r] + cfiles = ctx.files() + for f in m.files(): + if f in cfiles: + s.add(ctx.rev()) + break + else: + # partial + files = (f for f in repo[None] if m(f)) + for f in files: + fctx = repo[None].filectx(f) + s.add(fctx.linkrev()) + for actx in fctx.ancestors(): + s.add(actx.linkrev()) + + return smartset.baseset([r for r in subset if r in s]) + +@command('gc', [], _('hg gc [REPO...]'), norepo=True) +def gc(ui, *args, **opts): + '''garbage collect the client and server filelog caches + ''' + cachepaths = set() + + # get the system client cache + systemcache = shallowutil.getcachepath(ui, allowempty=True) + if systemcache: + cachepaths.add(systemcache) + + # get repo client and server cache + repopaths = [] + pwd = ui.environ.get('PWD') + if pwd: + repopaths.append(pwd) + + repopaths.extend(args) + repos = [] + for repopath in repopaths: + try: + repo = hg.peer(ui, {}, repopath) + repos.append(repo) + + repocache = shallowutil.getcachepath(repo.ui, allowempty=True) + if repocache: + cachepaths.add(repocache) + except error.RepoError: + pass + + # gc client cache + for cachepath in cachepaths: + gcclient(ui, cachepath) + + # gc server cache + for repo in repos: + remotefilelogserver.gcserver(ui, repo._repo) + +def gcclient(ui, cachepath): + # get list of repos that use this cache + repospath = os.path.join(cachepath, 'repos') + if not os.path.exists(repospath): + ui.warn(_("no known cache at %s\n") % cachepath) + return + + reposfile = open(repospath, 'r') + repos = set([r[:-1] for r in reposfile.readlines()]) + reposfile.close() + + # build list of useful files + validrepos = [] + keepkeys = set() + + _analyzing = _("analyzing repositories") + + sharedcache = None + filesrepacked = False + + count = 0 + for path in repos: + ui.progress(_analyzing, count, unit="repos", total=len(repos)) + count += 1 + try: + path = ui.expandpath(os.path.normpath(path)) + except TypeError as e: + ui.warn(_("warning: malformed path: %r:%s\n") % (path, e)) + traceback.print_exc() + continue + try: + peer = hg.peer(ui, {}, path) + repo = peer._repo + except error.RepoError: + continue + + validrepos.append(path) + + # Protect against any repo or config changes that have happened since + # this repo was added to the repos file. We'd rather this loop succeed + # and too much be deleted, than the loop fail and nothing gets deleted. + if shallowrepo.requirement not in repo.requirements: + continue + + if not util.safehasattr(repo, 'name'): + ui.warn(_("repo %s is a misconfigured remotefilelog repo\n") % path) + continue + + # If garbage collection on repack and repack on hg gc are enabled + # then loose files are repacked and garbage collected. + # Otherwise regular garbage collection is performed. + repackonhggc = repo.ui.configbool('remotefilelog', 'repackonhggc') + gcrepack = repo.ui.configbool('remotefilelog', 'gcrepack') + if repackonhggc and gcrepack: + try: + repackmod.incrementalrepack(repo) + filesrepacked = True + continue + except (IOError, repackmod.RepackAlreadyRunning): + # If repack cannot be performed due to not enough disk space + # continue doing garbage collection of loose files w/o repack + pass + + reponame = repo.name + if not sharedcache: + sharedcache = repo.sharedstore + + # Compute a keepset which is not garbage collected + def keyfn(fname, fnode): + return fileserverclient.getcachekey(reponame, fname, hex(fnode)) + keepkeys = repackmod.keepset(repo, keyfn=keyfn, lastkeepkeys=keepkeys) + + ui.progress(_analyzing, None) + + # write list of valid repos back + oldumask = os.umask(0o002) + try: + reposfile = open(repospath, 'w') + reposfile.writelines([("%s\n" % r) for r in validrepos]) + reposfile.close() + finally: + os.umask(oldumask) + + # prune cache + if sharedcache is not None: + sharedcache.gc(keepkeys) + elif not filesrepacked: + ui.warn(_("warning: no valid repos in repofile\n")) + +def log(orig, ui, repo, *pats, **opts): + if shallowrepo.requirement not in repo.requirements: + return orig(ui, repo, *pats, **opts) + + follow = opts.get('follow') + revs = opts.get('rev') + if pats: + # Force slowpath for non-follow patterns and follows that start from + # non-working-copy-parent revs. + if not follow or revs: + # This forces the slowpath + opts['removed'] = True + + # If this is a non-follow log without any revs specified, recommend that + # the user add -f to speed it up. + if not follow and not revs: + match, pats = scmutil.matchandpats(repo['.'], pats, opts) + isfile = not match.anypats() + if isfile: + for file in match.files(): + if not os.path.isfile(repo.wjoin(file)): + isfile = False + break + + if isfile: + ui.warn(_("warning: file log can be slow on large repos - " + + "use -f to speed it up\n")) + + return orig(ui, repo, *pats, **opts) + +def revdatelimit(ui, revset): + """Update revset so that only changesets no older than 'prefetchdays' days + are included. The default value is set to 14 days. If 'prefetchdays' is set + to zero or negative value then date restriction is not applied. + """ + days = ui.configint('remotefilelog', 'prefetchdays') + if days > 0: + revset = '(%s) & date(-%s)' % (revset, days) + return revset + +def readytofetch(repo): + """Check that enough time has passed since the last background prefetch. + This only relates to prefetches after operations that change the working + copy parent. Default delay between background prefetches is 2 minutes. + """ + timeout = repo.ui.configint('remotefilelog', 'prefetchdelay') + fname = repo.vfs.join('lastprefetch') + + ready = False + with open(fname, 'a'): + # the with construct above is used to avoid race conditions + modtime = os.path.getmtime(fname) + if (time.time() - modtime) > timeout: + os.utime(fname, None) + ready = True + + return ready + +def wcpprefetch(ui, repo, **kwargs): + """Prefetches in background revisions specified by bgprefetchrevs revset. + Does background repack if backgroundrepack flag is set in config. + """ + shallow = shallowrepo.requirement in repo.requirements + bgprefetchrevs = ui.config('remotefilelog', 'bgprefetchrevs') + isready = readytofetch(repo) + + if not (shallow and bgprefetchrevs and isready): + return + + bgrepack = repo.ui.configbool('remotefilelog', 'backgroundrepack') + # update a revset with a date limit + bgprefetchrevs = revdatelimit(ui, bgprefetchrevs) + + def anon(): + if util.safehasattr(repo, 'ranprefetch') and repo.ranprefetch: + return + repo.ranprefetch = True + repo.backgroundprefetch(bgprefetchrevs, repack=bgrepack) + + repo._afterlock(anon) + +def pull(orig, ui, repo, *pats, **opts): + result = orig(ui, repo, *pats, **opts) + + if shallowrepo.requirement in repo.requirements: + # prefetch if it's configured + prefetchrevset = ui.config('remotefilelog', 'pullprefetch') + bgrepack = repo.ui.configbool('remotefilelog', 'backgroundrepack') + bgprefetch = repo.ui.configbool('remotefilelog', 'backgroundprefetch') + + if prefetchrevset: + ui.status(_("prefetching file contents\n")) + revs = scmutil.revrange(repo, [prefetchrevset]) + base = repo['.'].rev() + if bgprefetch: + repo.backgroundprefetch(prefetchrevset, repack=bgrepack) + else: + repo.prefetch(revs, base=base) + if bgrepack: + repackmod.backgroundrepack(repo, incremental=True) + elif bgrepack: + repackmod.backgroundrepack(repo, incremental=True) + + return result + +def exchangepull(orig, repo, remote, *args, **kwargs): + # Hook into the callstream/getbundle to insert bundle capabilities + # during a pull. + def localgetbundle(orig, source, heads=None, common=None, bundlecaps=None, + **kwargs): + if not bundlecaps: + bundlecaps = set() + bundlecaps.add('remotefilelog') + return orig(source, heads=heads, common=common, bundlecaps=bundlecaps, + **kwargs) + + if util.safehasattr(remote, '_callstream'): + remote._localrepo = repo + elif util.safehasattr(remote, 'getbundle'): + extensions.wrapfunction(remote, 'getbundle', localgetbundle) + + return orig(repo, remote, *args, **kwargs) + +def _fileprefetchhook(repo, revs, match): + if shallowrepo.requirement in repo.requirements: + allfiles = [] + for rev in revs: + if rev == nodemod.wdirrev or rev is None: + continue + ctx = repo[rev] + mf = ctx.manifest() + sparsematch = repo.maybesparsematch(ctx.rev()) + for path in ctx.walk(match): + if path.endswith('/'): + # Tree manifest that's being excluded as part of narrow + continue + if (not sparsematch or sparsematch(path)) and path in mf: + allfiles.append((path, hex(mf[path]))) + repo.fileservice.prefetch(allfiles) + +@command('debugremotefilelog', [ + ('d', 'decompress', None, _('decompress the filelog first')), + ], _('hg debugremotefilelog '), norepo=True) +def debugremotefilelog(ui, path, **opts): + return debugcommands.debugremotefilelog(ui, path, **opts) + +@command('verifyremotefilelog', [ + ('d', 'decompress', None, _('decompress the filelogs first')), + ], _('hg verifyremotefilelogs '), norepo=True) +def verifyremotefilelog(ui, path, **opts): + return debugcommands.verifyremotefilelog(ui, path, **opts) + +@command('debugdatapack', [ + ('', 'long', None, _('print the long hashes')), + ('', 'node', '', _('dump the contents of node'), 'NODE'), + ], _('hg debugdatapack '), norepo=True) +def debugdatapack(ui, *paths, **opts): + return debugcommands.debugdatapack(ui, *paths, **opts) + +@command('debughistorypack', [ + ], _('hg debughistorypack '), norepo=True) +def debughistorypack(ui, path, **opts): + return debugcommands.debughistorypack(ui, path) + +@command('debugkeepset', [ + ], _('hg debugkeepset')) +def debugkeepset(ui, repo, **opts): + # The command is used to measure keepset computation time + def keyfn(fname, fnode): + return fileserverclient.getcachekey(repo.name, fname, hex(fnode)) + repackmod.keepset(repo, keyfn) + return + +@command('debugwaitonrepack', [ + ], _('hg debugwaitonrepack')) +def debugwaitonrepack(ui, repo, **opts): + return debugcommands.debugwaitonrepack(repo) + +@command('debugwaitonprefetch', [ + ], _('hg debugwaitonprefetch')) +def debugwaitonprefetch(ui, repo, **opts): + return debugcommands.debugwaitonprefetch(repo) + +def resolveprefetchopts(ui, opts): + if not opts.get('rev'): + revset = ['.', 'draft()'] + + prefetchrevset = ui.config('remotefilelog', 'pullprefetch', None) + if prefetchrevset: + revset.append('(%s)' % prefetchrevset) + bgprefetchrevs = ui.config('remotefilelog', 'bgprefetchrevs', None) + if bgprefetchrevs: + revset.append('(%s)' % bgprefetchrevs) + revset = '+'.join(revset) + + # update a revset with a date limit + revset = revdatelimit(ui, revset) + + opts['rev'] = [revset] + + if not opts.get('base'): + opts['base'] = None + + return opts + +@command('prefetch', [ + ('r', 'rev', [], _('prefetch the specified revisions'), _('REV')), + ('', 'repack', False, _('run repack after prefetch')), + ('b', 'base', '', _("rev that is assumed to already be local")), + ] + commands.walkopts, _('hg prefetch [OPTIONS] [FILE...]')) +def prefetch(ui, repo, *pats, **opts): + """prefetch file revisions from the server + + Prefetchs file revisions for the specified revs and stores them in the + local remotefilelog cache. If no rev is specified, the default rev is + used which is the union of dot, draft, pullprefetch and bgprefetchrev. + File names or patterns can be used to limit which files are downloaded. + + Return 0 on success. + """ + if not shallowrepo.requirement in repo.requirements: + raise error.Abort(_("repo is not shallow")) + + opts = resolveprefetchopts(ui, opts) + revs = scmutil.revrange(repo, opts.get('rev')) + repo.prefetch(revs, opts.get('base'), pats, opts) + + # Run repack in background + if opts.get('repack'): + repackmod.backgroundrepack(repo, incremental=True) + +@command('repack', [ + ('', 'background', None, _('run in a background process'), None), + ('', 'incremental', None, _('do an incremental repack'), None), + ('', 'packsonly', None, _('only repack packs (skip loose objects)'), None), + ], _('hg repack [OPTIONS]')) +def repack_(ui, repo, *pats, **opts): + if opts.get('background'): + repackmod.backgroundrepack(repo, incremental=opts.get('incremental'), + packsonly=opts.get('packsonly', False)) + return + + options = {'packsonly': opts.get('packsonly')} + + try: + if opts.get('incremental'): + repackmod.incrementalrepack(repo, options=options) + else: + repackmod.fullrepack(repo, options=options) + except repackmod.RepackAlreadyRunning as ex: + # Don't propogate the exception if the repack is already in + # progress, since we want the command to exit 0. + repo.ui.warn('%s\n' % ex) diff --git a/hgext/remotefilelog/basepack.py b/hgext/remotefilelog/basepack.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/basepack.py @@ -0,0 +1,543 @@ +from __future__ import absolute_import + +import collections +import errno +import hashlib +import mmap +import os +import struct +import time + +from mercurial.i18n import _ +from mercurial import ( + policy, + pycompat, + util, + vfs as vfsmod, +) +from . import shallowutil + +osutil = policy.importmod(r'osutil') + +# The pack version supported by this implementation. This will need to be +# rev'd whenever the byte format changes. Ex: changing the fanout prefix, +# changing any of the int sizes, changing the delta algorithm, etc. +PACKVERSIONSIZE = 1 +INDEXVERSIONSIZE = 2 + +FANOUTSTART = INDEXVERSIONSIZE + +# Constant that indicates a fanout table entry hasn't been filled in. (This does +# not get serialized) +EMPTYFANOUT = -1 + +# The fanout prefix is the number of bytes that can be addressed by the fanout +# table. Example: a fanout prefix of 1 means we use the first byte of a hash to +# look in the fanout table (which will be 2^8 entries long). +SMALLFANOUTPREFIX = 1 +LARGEFANOUTPREFIX = 2 + +# The number of entries in the index at which point we switch to a large fanout. +# It is chosen to balance the linear scan through a sparse fanout, with the +# size of the bisect in actual index. +# 2^16 / 8 was chosen because it trades off (1 step fanout scan + 5 step +# bisect) with (8 step fanout scan + 1 step bisect) +# 5 step bisect = log(2^16 / 8 / 255) # fanout +# 10 step fanout scan = 2^16 / (2^16 / 8) # fanout space divided by entries +SMALLFANOUTCUTOFF = 2**16 / 8 + +# The amount of time to wait between checking for new packs. This prevents an +# exception when data is moved to a new pack after the process has already +# loaded the pack list. +REFRESHRATE = 0.1 + +if pycompat.isposix: + # With glibc 2.7+ the 'e' flag uses O_CLOEXEC when opening. + # The 'e' flag will be ignored on older versions of glibc. + PACKOPENMODE = 'rbe' +else: + PACKOPENMODE = 'rb' + +class _cachebackedpacks(object): + def __init__(self, packs, cachesize): + self._packs = set(packs) + self._lrucache = util.lrucachedict(cachesize) + self._lastpack = None + + # Avoid cold start of the cache by populating the most recent packs + # in the cache. + for i in reversed(range(min(cachesize, len(packs)))): + self._movetofront(packs[i]) + + def _movetofront(self, pack): + # This effectively makes pack the first entry in the cache. + self._lrucache[pack] = True + + def _registerlastpackusage(self): + if self._lastpack is not None: + self._movetofront(self._lastpack) + self._lastpack = None + + def add(self, pack): + self._registerlastpackusage() + + # This method will mostly be called when packs are not in cache. + # Therefore, adding pack to the cache. + self._movetofront(pack) + self._packs.add(pack) + + def __iter__(self): + self._registerlastpackusage() + + # Cache iteration is based on LRU. + for pack in self._lrucache: + self._lastpack = pack + yield pack + + cachedpacks = set(pack for pack in self._lrucache) + # Yield for paths not in the cache. + for pack in self._packs - cachedpacks: + self._lastpack = pack + yield pack + + # Data not found in any pack. + self._lastpack = None + +class basepackstore(object): + # Default cache size limit for the pack files. + DEFAULTCACHESIZE = 100 + + def __init__(self, ui, path): + self.ui = ui + self.path = path + + # lastrefesh is 0 so we'll immediately check for new packs on the first + # failure. + self.lastrefresh = 0 + + packs = [] + for filepath, __, __ in self._getavailablepackfilessorted(): + try: + pack = self.getpack(filepath) + except Exception as ex: + # An exception may be thrown if the pack file is corrupted + # somehow. Log a warning but keep going in this case, just + # skipping this pack file. + # + # If this is an ENOENT error then don't even bother logging. + # Someone could have removed the file since we retrieved the + # list of paths. + if getattr(ex, 'errno', None) != errno.ENOENT: + ui.warn(_('unable to load pack %s: %s\n') % (filepath, ex)) + continue + packs.append(pack) + + self.packs = _cachebackedpacks(packs, self.DEFAULTCACHESIZE) + + def _getavailablepackfiles(self): + """For each pack file (a index/data file combo), yields: + (full path without extension, mtime, size) + + mtime will be the mtime of the index/data file (whichever is newer) + size is the combined size of index/data file + """ + indexsuffixlen = len(self.INDEXSUFFIX) + packsuffixlen = len(self.PACKSUFFIX) + + ids = set() + sizes = collections.defaultdict(lambda: 0) + mtimes = collections.defaultdict(lambda: []) + try: + for filename, type, stat in osutil.listdir(self.path, stat=True): + id = None + if filename[-indexsuffixlen:] == self.INDEXSUFFIX: + id = filename[:-indexsuffixlen] + elif filename[-packsuffixlen:] == self.PACKSUFFIX: + id = filename[:-packsuffixlen] + + # Since we expect to have two files corresponding to each ID + # (the index file and the pack file), we can yield once we see + # it twice. + if id: + sizes[id] += stat.st_size # Sum both files' sizes together + mtimes[id].append(stat.st_mtime) + if id in ids: + yield (os.path.join(self.path, id), max(mtimes[id]), + sizes[id]) + else: + ids.add(id) + except OSError as ex: + if ex.errno != errno.ENOENT: + raise + + def _getavailablepackfilessorted(self): + """Like `_getavailablepackfiles`, but also sorts the files by mtime, + yielding newest files first. + + This is desirable, since it is more likely newer packfiles have more + desirable data. + """ + files = [] + for path, mtime, size in self._getavailablepackfiles(): + files.append((mtime, size, path)) + files = sorted(files, reverse=True) + for mtime, size, path in files: + yield path, mtime, size + + def gettotalsizeandcount(self): + """Returns the total disk size (in bytes) of all the pack files in + this store, and the count of pack files. + + (This might be smaller than the total size of the ``self.path`` + directory, since this only considers fuly-writen pack files, and not + temporary files or other detritus on the directory.) + """ + totalsize = 0 + count = 0 + for __, __, size in self._getavailablepackfiles(): + totalsize += size + count += 1 + return totalsize, count + + def getmetrics(self): + """Returns metrics on the state of this store.""" + size, count = self.gettotalsizeandcount() + return { + 'numpacks': count, + 'totalpacksize': size, + } + + def getpack(self, path): + raise NotImplementedError() + + def getmissing(self, keys): + missing = keys + for pack in self.packs: + missing = pack.getmissing(missing) + + # Ensures better performance of the cache by keeping the most + # recently accessed pack at the beginning in subsequent iterations. + if not missing: + return missing + + if missing: + for pack in self.refresh(): + missing = pack.getmissing(missing) + + return missing + + def markledger(self, ledger, options=None): + for pack in self.packs: + pack.markledger(ledger) + + def markforrefresh(self): + """Tells the store that there may be new pack files, so the next time it + has a lookup miss it should check for new files.""" + self.lastrefresh = 0 + + def refresh(self): + """Checks for any new packs on disk, adds them to the main pack list, + and returns a list of just the new packs.""" + now = time.time() + + # If we experience a lot of misses (like in the case of getmissing() on + # new objects), let's only actually check disk for new stuff every once + # in a while. Generally this code path should only ever matter when a + # repack is going on in the background, and that should be pretty rare + # to have that happen twice in quick succession. + newpacks = [] + if now > self.lastrefresh + REFRESHRATE: + self.lastrefresh = now + previous = set(p.path for p in self.packs) + for filepath, __, __ in self._getavailablepackfilessorted(): + if filepath not in previous: + newpack = self.getpack(filepath) + newpacks.append(newpack) + self.packs.add(newpack) + + return newpacks + +class versionmixin(object): + # Mix-in for classes with multiple supported versions + VERSION = None + SUPPORTED_VERSIONS = [0] + + def _checkversion(self, version): + if version in self.SUPPORTED_VERSIONS: + if self.VERSION is None: + # only affect this instance + self.VERSION = version + elif self.VERSION != version: + raise RuntimeError('inconsistent version: %s' % version) + else: + raise RuntimeError('unsupported version: %s' % version) + +class basepack(versionmixin): + # The maximum amount we should read via mmap before remmaping so the old + # pages can be released (100MB) + MAXPAGEDIN = 100 * 1024**2 + + SUPPORTED_VERSIONS = [0] + + def __init__(self, path): + self.path = path + self.packpath = path + self.PACKSUFFIX + self.indexpath = path + self.INDEXSUFFIX + + self.indexsize = os.stat(self.indexpath).st_size + self.datasize = os.stat(self.packpath).st_size + + self._index = None + self._data = None + self.freememory() # initialize the mmap + + version = struct.unpack('!B', self._data[:PACKVERSIONSIZE])[0] + self._checkversion(version) + + version, config = struct.unpack('!BB', self._index[:INDEXVERSIONSIZE]) + self._checkversion(version) + + if 0b10000000 & config: + self.params = indexparams(LARGEFANOUTPREFIX, version) + else: + self.params = indexparams(SMALLFANOUTPREFIX, version) + + @util.propertycache + def _fanouttable(self): + params = self.params + rawfanout = self._index[FANOUTSTART:FANOUTSTART + params.fanoutsize] + fanouttable = [] + for i in pycompat.xrange(0, params.fanoutcount): + loc = i * 4 + fanoutentry = struct.unpack('!I', rawfanout[loc:loc + 4])[0] + fanouttable.append(fanoutentry) + return fanouttable + + @util.propertycache + def _indexend(self): + if self.VERSION == 0: + return self.indexsize + else: + nodecount = struct.unpack_from('!Q', self._index, + self.params.indexstart - 8)[0] + return self.params.indexstart + nodecount * self.INDEXENTRYLENGTH + + def freememory(self): + """Unmap and remap the memory to free it up after known expensive + operations. Return True if self._data and self._index were reloaded. + """ + if self._index: + if self._pagedin < self.MAXPAGEDIN: + return False + + self._index.close() + self._data.close() + + # TODO: use an opener/vfs to access these paths + with open(self.indexpath, PACKOPENMODE) as indexfp: + # memory-map the file, size 0 means whole file + self._index = mmap.mmap(indexfp.fileno(), 0, + access=mmap.ACCESS_READ) + with open(self.packpath, PACKOPENMODE) as datafp: + self._data = mmap.mmap(datafp.fileno(), 0, access=mmap.ACCESS_READ) + + self._pagedin = 0 + return True + + def getmissing(self, keys): + raise NotImplementedError() + + def markledger(self, ledger, options=None): + raise NotImplementedError() + + def cleanup(self, ledger): + raise NotImplementedError() + + def __iter__(self): + raise NotImplementedError() + + def iterentries(self): + raise NotImplementedError() + +class mutablebasepack(versionmixin): + + def __init__(self, ui, packdir, version=0): + self._checkversion(version) + + opener = vfsmod.vfs(packdir) + opener.createmode = 0o444 + self.opener = opener + + self.entries = {} + + shallowutil.mkstickygroupdir(ui, packdir) + self.packfp, self.packpath = opener.mkstemp( + suffix=self.PACKSUFFIX + '-tmp') + self.idxfp, self.idxpath = opener.mkstemp( + suffix=self.INDEXSUFFIX + '-tmp') + self.packfp = os.fdopen(self.packfp, 'w+') + self.idxfp = os.fdopen(self.idxfp, 'w+') + self.sha = hashlib.sha1() + self._closed = False + + # The opener provides no way of doing permission fixup on files created + # via mkstemp, so we must fix it ourselves. We can probably fix this + # upstream in vfs.mkstemp so we don't need to use the private method. + opener._fixfilemode(opener.join(self.packpath)) + opener._fixfilemode(opener.join(self.idxpath)) + + # Write header + # TODO: make it extensible (ex: allow specifying compression algorithm, + # a flexible key/value header, delta algorithm, fanout size, etc) + versionbuf = struct.pack('!B', self.VERSION) # unsigned 1 byte int + self.writeraw(versionbuf) + + def __enter__(self): + return self + + def __exit__(self, exc_type, exc_value, traceback): + if exc_type is None: + self.close() + else: + self.abort() + + def abort(self): + # Unclean exit + self._cleantemppacks() + + def writeraw(self, data): + self.packfp.write(data) + self.sha.update(data) + + def close(self, ledger=None): + if self._closed: + return + + try: + sha = self.sha.hexdigest() + self.packfp.close() + self.writeindex() + + if len(self.entries) == 0: + # Empty pack + self._cleantemppacks() + self._closed = True + return None + + self.opener.rename(self.packpath, sha + self.PACKSUFFIX) + try: + self.opener.rename(self.idxpath, sha + self.INDEXSUFFIX) + except Exception as ex: + try: + self.opener.unlink(sha + self.PACKSUFFIX) + except Exception: + pass + # Throw exception 'ex' explicitly since a normal 'raise' would + # potentially throw an exception from the unlink cleanup. + raise ex + except Exception: + # Clean up temp packs in all exception cases + self._cleantemppacks() + raise + + self._closed = True + result = self.opener.join(sha) + if ledger: + ledger.addcreated(result) + return result + + def _cleantemppacks(self): + try: + self.opener.unlink(self.packpath) + except Exception: + pass + try: + self.opener.unlink(self.idxpath) + except Exception: + pass + + def writeindex(self): + rawindex = '' + + largefanout = len(self.entries) > SMALLFANOUTCUTOFF + if largefanout: + params = indexparams(LARGEFANOUTPREFIX, self.VERSION) + else: + params = indexparams(SMALLFANOUTPREFIX, self.VERSION) + + fanouttable = [EMPTYFANOUT] * params.fanoutcount + + # Precompute the location of each entry + locations = {} + count = 0 + for node in sorted(self.entries.iterkeys()): + location = count * self.INDEXENTRYLENGTH + locations[node] = location + count += 1 + + # Must use [0] on the unpack result since it's always a tuple. + fanoutkey = struct.unpack(params.fanoutstruct, + node[:params.fanoutprefix])[0] + if fanouttable[fanoutkey] == EMPTYFANOUT: + fanouttable[fanoutkey] = location + + rawfanouttable = '' + last = 0 + for offset in fanouttable: + offset = offset if offset != EMPTYFANOUT else last + last = offset + rawfanouttable += struct.pack('!I', offset) + + rawentrieslength = struct.pack('!Q', len(self.entries)) + + # The index offset is the it's location in the file. So after the 2 byte + # header and the fanouttable. + rawindex = self.createindex(locations, 2 + len(rawfanouttable)) + + self._writeheader(params) + self.idxfp.write(rawfanouttable) + if self.VERSION == 1: + self.idxfp.write(rawentrieslength) + self.idxfp.write(rawindex) + self.idxfp.close() + + def createindex(self, nodelocations): + raise NotImplementedError() + + def _writeheader(self, indexparams): + # Index header + # + # # 1 means 2^16, 0 means 2^8 + # # future use (compression, delta format, etc) + config = 0 + if indexparams.fanoutprefix == LARGEFANOUTPREFIX: + config = 0b10000000 + self.idxfp.write(struct.pack('!BB', self.VERSION, config)) + +class indexparams(object): + __slots__ = ('fanoutprefix', 'fanoutstruct', 'fanoutcount', 'fanoutsize', + 'indexstart') + + def __init__(self, prefixsize, version): + self.fanoutprefix = prefixsize + + # The struct pack format for fanout table location (i.e. the format that + # converts the node prefix into an integer location in the fanout + # table). + if prefixsize == SMALLFANOUTPREFIX: + self.fanoutstruct = '!B' + elif prefixsize == LARGEFANOUTPREFIX: + self.fanoutstruct = '!H' + else: + raise ValueError("invalid fanout prefix size: %s" % prefixsize) + + # The number of fanout table entries + self.fanoutcount = 2**(prefixsize * 8) + + # The total bytes used by the fanout table + self.fanoutsize = self.fanoutcount * 4 + + self.indexstart = FANOUTSTART + self.fanoutsize + if version == 1: + # Skip the index length + self.indexstart += 8 diff --git a/hgext/remotefilelog/basestore.py b/hgext/remotefilelog/basestore.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/basestore.py @@ -0,0 +1,423 @@ +from __future__ import absolute_import + +import errno +import hashlib +import os +import shutil +import stat +import time + +from mercurial.i18n import _ +from mercurial.node import bin, hex +from mercurial import ( + error, + pycompat, + util, +) +from . import ( + constants, + shallowutil, +) + +class basestore(object): + def __init__(self, repo, path, reponame, shared=False): + """Creates a remotefilelog store object for the given repo name. + + `path` - The file path where this store keeps its data + `reponame` - The name of the repo. This is used to partition data from + many repos. + `shared` - True if this store is a shared cache of data from the central + server, for many repos on this machine. False means this store is for + the local data for one repo. + """ + self.repo = repo + self.ui = repo.ui + self._path = path + self._reponame = reponame + self._shared = shared + self._uid = os.getuid() if not pycompat.iswindows else None + + self._validatecachelog = self.ui.config("remotefilelog", + "validatecachelog") + self._validatecache = self.ui.config("remotefilelog", "validatecache", + 'on') + if self._validatecache not in ('on', 'strict', 'off'): + self._validatecache = 'on' + if self._validatecache == 'off': + self._validatecache = False + + if shared: + shallowutil.mkstickygroupdir(self.ui, path) + + def getmissing(self, keys): + missing = [] + for name, node in keys: + filepath = self._getfilepath(name, node) + exists = os.path.exists(filepath) + if (exists and self._validatecache == 'strict' and + not self._validatekey(filepath, 'contains')): + exists = False + if not exists: + missing.append((name, node)) + + return missing + + # BELOW THIS ARE IMPLEMENTATIONS OF REPACK SOURCE + + def markledger(self, ledger, options=None): + if options and options.get(constants.OPTION_PACKSONLY): + return + if self._shared: + for filename, nodes in self._getfiles(): + for node in nodes: + ledger.markdataentry(self, filename, node) + ledger.markhistoryentry(self, filename, node) + + def cleanup(self, ledger): + ui = self.ui + entries = ledger.sources.get(self, []) + count = 0 + for entry in entries: + if entry.gced or (entry.datarepacked and entry.historyrepacked): + ui.progress(_("cleaning up"), count, unit="files", + total=len(entries)) + path = self._getfilepath(entry.filename, entry.node) + util.tryunlink(path) + count += 1 + ui.progress(_("cleaning up"), None) + + # Clean up the repo cache directory. + self._cleanupdirectory(self._getrepocachepath()) + + # BELOW THIS ARE NON-STANDARD APIS + + def _cleanupdirectory(self, rootdir): + """Removes the empty directories and unnecessary files within the root + directory recursively. Note that this method does not remove the root + directory itself. """ + + oldfiles = set() + otherfiles = set() + # osutil.listdir returns stat information which saves some rmdir/listdir + # syscalls. + for name, mode in util.osutil.listdir(rootdir): + if stat.S_ISDIR(mode): + dirpath = os.path.join(rootdir, name) + self._cleanupdirectory(dirpath) + + # Now that the directory specified by dirpath is potentially + # empty, try and remove it. + try: + os.rmdir(dirpath) + except OSError: + pass + + elif stat.S_ISREG(mode): + if name.endswith('_old'): + oldfiles.add(name[:-4]) + else: + otherfiles.add(name) + + # Remove the files which end with suffix '_old' and have no + # corresponding file without the suffix '_old'. See addremotefilelognode + # method for the generation/purpose of files with '_old' suffix. + for filename in oldfiles - otherfiles: + filepath = os.path.join(rootdir, filename + '_old') + util.tryunlink(filepath) + + def _getfiles(self): + """Return a list of (filename, [node,...]) for all the revisions that + exist in the store. + + This is useful for obtaining a list of all the contents of the store + when performing a repack to another store, since the store API requires + name+node keys and not namehash+node keys. + """ + existing = {} + for filenamehash, node in self._listkeys(): + existing.setdefault(filenamehash, []).append(node) + + filenamemap = self._resolvefilenames(existing.keys()) + + for filename, sha in filenamemap.iteritems(): + yield (filename, existing[sha]) + + def _resolvefilenames(self, hashes): + """Given a list of filename hashes that are present in the + remotefilelog store, return a mapping from filename->hash. + + This is useful when converting remotefilelog blobs into other storage + formats. + """ + if not hashes: + return {} + + filenames = {} + missingfilename = set(hashes) + + # Start with a full manifest, since it'll cover the majority of files + for filename in self.repo['tip'].manifest(): + sha = hashlib.sha1(filename).digest() + if sha in missingfilename: + filenames[filename] = sha + missingfilename.discard(sha) + + # Scan the changelog until we've found every file name + cl = self.repo.unfiltered().changelog + for rev in pycompat.xrange(len(cl) - 1, -1, -1): + if not missingfilename: + break + files = cl.readfiles(cl.node(rev)) + for filename in files: + sha = hashlib.sha1(filename).digest() + if sha in missingfilename: + filenames[filename] = sha + missingfilename.discard(sha) + + return filenames + + def _getrepocachepath(self): + return os.path.join( + self._path, self._reponame) if self._shared else self._path + + def _listkeys(self): + """List all the remotefilelog keys that exist in the store. + + Returns a iterator of (filename hash, filecontent hash) tuples. + """ + + for root, dirs, files in os.walk(self._getrepocachepath()): + for filename in files: + if len(filename) != 40: + continue + node = filename + if self._shared: + # .../1a/85ffda..be21 + filenamehash = root[-41:-39] + root[-38:] + else: + filenamehash = root[-40:] + yield (bin(filenamehash), bin(node)) + + def _getfilepath(self, name, node): + node = hex(node) + if self._shared: + key = shallowutil.getcachekey(self._reponame, name, node) + else: + key = shallowutil.getlocalkey(name, node) + + return os.path.join(self._path, key) + + def _getdata(self, name, node): + filepath = self._getfilepath(name, node) + try: + data = shallowutil.readfile(filepath) + if self._validatecache and not self._validatedata(data, filepath): + if self._validatecachelog: + with open(self._validatecachelog, 'a+') as f: + f.write("corrupt %s during read\n" % filepath) + os.rename(filepath, filepath + ".corrupt") + raise KeyError("corrupt local cache file %s" % filepath) + except IOError: + raise KeyError("no file found at %s for %s:%s" % (filepath, name, + hex(node))) + + return data + + def addremotefilelognode(self, name, node, data): + filepath = self._getfilepath(name, node) + + oldumask = os.umask(0o002) + try: + # if this node already exists, save the old version for + # recovery/debugging purposes. + if os.path.exists(filepath): + newfilename = filepath + '_old' + # newfilename can be read-only and shutil.copy will fail. + # Delete newfilename to avoid it + if os.path.exists(newfilename): + shallowutil.unlinkfile(newfilename) + shutil.copy(filepath, newfilename) + + shallowutil.mkstickygroupdir(self.ui, os.path.dirname(filepath)) + shallowutil.writefile(filepath, data, readonly=True) + + if self._validatecache: + if not self._validatekey(filepath, 'write'): + raise error.Abort(_("local cache write was corrupted %s") % + filepath) + finally: + os.umask(oldumask) + + def markrepo(self, path): + """Call this to add the given repo path to the store's list of + repositories that are using it. This is useful later when doing garbage + collection, since it allows us to insecpt the repos to see what nodes + they want to be kept alive in the store. + """ + repospath = os.path.join(self._path, "repos") + with open(repospath, 'a') as reposfile: + reposfile.write(os.path.dirname(path) + "\n") + + repospathstat = os.stat(repospath) + if repospathstat.st_uid == self._uid: + os.chmod(repospath, 0o0664) + + def _validatekey(self, path, action): + with open(path, 'rb') as f: + data = f.read() + + if self._validatedata(data, path): + return True + + if self._validatecachelog: + with open(self._validatecachelog, 'a+') as f: + f.write("corrupt %s during %s\n" % (path, action)) + + os.rename(path, path + ".corrupt") + return False + + def _validatedata(self, data, path): + try: + if len(data) > 0: + # see remotefilelogserver.createfileblob for the format + offset, size, flags = shallowutil.parsesizeflags(data) + if len(data) <= size: + # it is truncated + return False + + # extract the node from the metadata + offset += size + datanode = data[offset:offset + 20] + + # and compare against the path + if os.path.basename(path) == hex(datanode): + # Content matches the intended path + return True + return False + except (ValueError, RuntimeError): + pass + + return False + + def gc(self, keepkeys): + ui = self.ui + cachepath = self._path + _removing = _("removing unnecessary files") + _truncating = _("enforcing cache limit") + + # prune cache + import Queue + queue = Queue.PriorityQueue() + originalsize = 0 + size = 0 + count = 0 + removed = 0 + + # keep files newer than a day even if they aren't needed + limit = time.time() - (60 * 60 * 24) + + ui.progress(_removing, count, unit="files") + for root, dirs, files in os.walk(cachepath): + for file in files: + if file == 'repos': + continue + + # Don't delete pack files + if '/packs/' in root: + continue + + ui.progress(_removing, count, unit="files") + path = os.path.join(root, file) + key = os.path.relpath(path, cachepath) + count += 1 + try: + pathstat = os.stat(path) + except OSError as e: + # errno.ENOENT = no such file or directory + if e.errno != errno.ENOENT: + raise + msg = _("warning: file %s was removed by another process\n") + ui.warn(msg % path) + continue + + originalsize += pathstat.st_size + + if key in keepkeys or pathstat.st_atime > limit: + queue.put((pathstat.st_atime, path, pathstat)) + size += pathstat.st_size + else: + try: + shallowutil.unlinkfile(path) + except OSError as e: + # errno.ENOENT = no such file or directory + if e.errno != errno.ENOENT: + raise + msg = _("warning: file %s was removed by another " + "process\n") + ui.warn(msg % path) + continue + removed += 1 + ui.progress(_removing, None) + + # remove oldest files until under limit + limit = ui.configbytes("remotefilelog", "cachelimit") + if size > limit: + excess = size - limit + removedexcess = 0 + while queue and size > limit and size > 0: + ui.progress(_truncating, removedexcess, unit="bytes", + total=excess) + atime, oldpath, oldpathstat = queue.get() + try: + shallowutil.unlinkfile(oldpath) + except OSError as e: + # errno.ENOENT = no such file or directory + if e.errno != errno.ENOENT: + raise + msg = _("warning: file %s was removed by another process\n") + ui.warn(msg % oldpath) + size -= oldpathstat.st_size + removed += 1 + removedexcess += oldpathstat.st_size + ui.progress(_truncating, None) + + ui.status(_("finished: removed %s of %s files (%0.2f GB to %0.2f GB)\n") + % (removed, count, + float(originalsize) / 1024.0 / 1024.0 / 1024.0, + float(size) / 1024.0 / 1024.0 / 1024.0)) + +class baseunionstore(object): + def __init__(self, *args, **kwargs): + # If one of the functions that iterates all of the stores is about to + # throw a KeyError, try this many times with a full refresh between + # attempts. A repack operation may have moved data from one store to + # another while we were running. + self.numattempts = kwargs.get('numretries', 0) + 1 + # If not-None, call this function on every retry and if the attempts are + # exhausted. + self.retrylog = kwargs.get('retrylog', None) + + def markforrefresh(self): + for store in self.stores: + if util.safehasattr(store, 'markforrefresh'): + store.markforrefresh() + + @staticmethod + def retriable(fn): + def noop(*args): + pass + def wrapped(self, *args, **kwargs): + retrylog = self.retrylog or noop + funcname = fn.__name__ + for i in pycompat.xrange(self.numattempts): + if i > 0: + retrylog('re-attempting (n=%d) %s\n' % (i, funcname)) + self.markforrefresh() + try: + return fn(self, *args, **kwargs) + except KeyError: + pass + # retries exhausted + retrylog('retries exhausted in %s, raising KeyError\n' % funcname) + raise + return wrapped diff --git a/hgext/remotefilelog/cacheclient.py b/hgext/remotefilelog/cacheclient.py new file mode 100755 --- /dev/null +++ b/hgext/remotefilelog/cacheclient.py @@ -0,0 +1,213 @@ +#!/usr/bin/env python +# cacheclient.py - example cache client implementation +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +# The remotefilelog extension can optionally use a caching layer to serve +# file revision requests. This is an example implementation that uses +# the python-memcached library: https://pypi.python.org/pypi/python-memcached/ +# A better implementation would make all of the requests non-blocking. +from __future__ import absolute_import + +import os +import sys + +import memcache + +stdin = sys.stdin +stdout = sys.stdout +stderr = sys.stderr + +mc = None +keyprefix = None +cachepath = None + +# Max number of keys per request +batchsize = 1000 + +# Max value size per key (in bytes) +maxsize = 512 * 1024 + +def readfile(path): + f = open(path, "r") + try: + return f.read() + finally: + f.close() + +def writefile(path, content): + dirname = os.path.dirname(path) + if not os.path.exists(dirname): + os.makedirs(dirname) + + f = open(path, "w") + try: + f.write(content) + finally: + f.close() + +def compress(value): + # Real world implementations will want to compress values. + # Insert your favorite compression here, ex: + # return lz4wrapper.lzcompresshc(value) + return value + +def decompress(value): + # Real world implementations will want to compress values. + # Insert your favorite compression here, ex: + # return lz4wrapper.lz4decompress(value) + return value + +def generateKey(id): + return keyprefix + id + +def generateId(key): + return key[len(keyprefix):] + +def getKeys(): + raw = stdin.readline()[:-1] + keycount = int(raw) + + keys = [] + for i in range(keycount): + id = stdin.readline()[:-1] + keys.append(generateKey(id)) + + results = mc.get_multi(keys) + + hits = 0 + for i, key in enumerate(keys): + value = results.get(key) + id = generateId(key) + # On hit, write to disk + if value: + # Integer hit indicates a large file + if isinstance(value, int): + largekeys = list([key + str(i) for i in range(value)]) + largevalues = mc.get_multi(largekeys) + if len(largevalues) == value: + value = "" + for largekey in largekeys: + value += largevalues[largekey] + else: + # A chunk is missing, give up + stdout.write(id + "\n") + stdout.flush() + continue + path = os.path.join(cachepath, id) + value = decompress(value) + writefile(path, value) + hits += 1 + else: + # On miss, report to caller + stdout.write(id + "\n") + stdout.flush() + + if i % 500 == 0: + stdout.write("_hits_%s_\n" % hits) + stdout.flush() + + # done signal + stdout.write("0\n") + stdout.flush() + +def setKeys(): + raw = stdin.readline()[:-1] + keycount = int(raw) + + values = {} + for i in range(keycount): + id = stdin.readline()[:-1] + path = os.path.join(cachepath, id) + + value = readfile(path) + value = compress(value) + + key = generateKey(id) + if len(value) > maxsize: + # split up large files + start = 0 + i = 0 + while start < len(value): + end = min(len(value), start + maxsize) + values[key + str(i)] = value[start:end] + start += maxsize + i += 1 + + # Large files are stored as an integer representing how many + # chunks it's broken into. + value = i + + values[key] = value + + if len(values) == batchsize: + mc.set_multi(values) + values = {} + + if values: + mc.set_multi(values) + +def main(argv=None): + """ + remotefilelog uses this cacheclient by setting it in the repo config: + + [remotefilelog] + cacheprocess = cacheclient + + When memcache requests need to be made, it will execute this process + with the following arguments: + + cacheclient + + Communication happens via stdin and stdout. To make a get request, + the following is written to stdin: + + get\n + \n + \n + \n + \n + + The results of any cache hits will be written directly to /. + Any cache misses will be written to stdout in the form \n. Once all + hits and misses are finished 0\n will be written to stdout to signal + completion. + + During the request, progress may be reported via stdout with the format + _hits_###_\n where ### is an integer representing the number of hits so + far. remotefilelog uses this to display a progress bar. + + A single cacheclient process may be used for multiple requests (though + not in parallel), so it stays open until it receives exit\n via stdin. + + """ + if argv is None: + argv = sys.argv + + global cachepath + global keyprefix + global mc + + ip = argv[1] + keyprefix = argv[2] + cachepath = argv[3] + + mc = memcache.Client([ip], debug=0) + + while True: + cmd = stdin.readline()[:-1] + if cmd == "get": + getKeys() + elif cmd == "set": + setKeys() + elif cmd == "exit": + return 0 + else: + stderr.write("Invalid Command %s\n" % cmd) + return 1 + +if __name__ == "__main__": + sys.exit(main()) diff --git a/hgext/remotefilelog/connectionpool.py b/hgext/remotefilelog/connectionpool.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/connectionpool.py @@ -0,0 +1,84 @@ +# connectionpool.py - class for pooling peer connections for reuse +# +# Copyright 2017 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +from __future__ import absolute_import + +from mercurial import ( + extensions, + hg, + sshpeer, + util, +) + +_sshv1peer = sshpeer.sshv1peer + +class connectionpool(object): + def __init__(self, repo): + self._repo = repo + self._pool = dict() + + def get(self, path): + pathpool = self._pool.get(path) + if pathpool is None: + pathpool = list() + self._pool[path] = pathpool + + conn = None + if len(pathpool) > 0: + try: + conn = pathpool.pop() + peer = conn.peer + # If the connection has died, drop it + if isinstance(peer, _sshv1peer): + if peer._subprocess.poll() is not None: + conn = None + except IndexError: + pass + + if conn is None: + def _cleanup(orig): + # close pipee first so peer.cleanup reading it won't deadlock, + # if there are other processes with pipeo open (i.e. us). + peer = orig.im_self + if util.safehasattr(peer, 'pipee'): + peer.pipee.close() + return orig() + + peer = hg.peer(self._repo.ui, {}, path) + if util.safehasattr(peer, 'cleanup'): + extensions.wrapfunction(peer, 'cleanup', _cleanup) + + conn = connection(pathpool, peer) + + return conn + + def close(self): + for pathpool in self._pool.itervalues(): + for conn in pathpool: + conn.close() + del pathpool[:] + +class connection(object): + def __init__(self, pool, peer): + self._pool = pool + self.peer = peer + + def __enter__(self): + return self + + def __exit__(self, type, value, traceback): + # Only add the connection back to the pool if there was no exception, + # since an exception could mean the connection is not in a reusable + # state. + if type is None: + self._pool.append(self) + else: + self.close() + + def close(self): + if util.safehasattr(self.peer, 'cleanup'): + self.peer.cleanup() diff --git a/hgext/remotefilelog/constants.py b/hgext/remotefilelog/constants.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/constants.py @@ -0,0 +1,37 @@ +from __future__ import absolute_import + +import struct + +from mercurial.i18n import _ + +REQUIREMENT = "remotefilelog" + +FILENAMESTRUCT = '!H' +FILENAMESIZE = struct.calcsize(FILENAMESTRUCT) + +NODESIZE = 20 +PACKREQUESTCOUNTSTRUCT = '!I' + +NODECOUNTSTRUCT = '!I' +NODECOUNTSIZE = struct.calcsize(NODECOUNTSTRUCT) + +PATHCOUNTSTRUCT = '!I' +PATHCOUNTSIZE = struct.calcsize(PATHCOUNTSTRUCT) + +FILEPACK_CATEGORY="" +TREEPACK_CATEGORY="manifests" + +ALL_CATEGORIES = [FILEPACK_CATEGORY, TREEPACK_CATEGORY] + +# revision metadata keys. must be a single character. +METAKEYFLAG = 'f' # revlog flag +METAKEYSIZE = 's' # full rawtext size + +def getunits(category): + if category == FILEPACK_CATEGORY: + return _("files") + if category == TREEPACK_CATEGORY: + return _("trees") + +# Repack options passed to ``markledger``. +OPTION_PACKSONLY = 'packsonly' diff --git a/hgext/remotefilelog/contentstore.py b/hgext/remotefilelog/contentstore.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/contentstore.py @@ -0,0 +1,376 @@ +from __future__ import absolute_import + +import threading + +from mercurial.node import hex, nullid +from mercurial import ( + mdiff, + pycompat, + revlog, +) +from . import ( + basestore, + constants, + shallowutil, +) + +class ChainIndicies(object): + """A static class for easy reference to the delta chain indicies. + """ + # The filename of this revision delta + NAME = 0 + # The mercurial file node for this revision delta + NODE = 1 + # The filename of the delta base's revision. This is useful when delta + # between different files (like in the case of a move or copy, we can delta + # against the original file content). + BASENAME = 2 + # The mercurial file node for the delta base revision. This is the nullid if + # this delta is a full text. + BASENODE = 3 + # The actual delta or full text data. + DATA = 4 + +class unioncontentstore(basestore.baseunionstore): + def __init__(self, *args, **kwargs): + super(unioncontentstore, self).__init__(*args, **kwargs) + + self.stores = args + self.writestore = kwargs.get('writestore') + + # If allowincomplete==True then the union store can return partial + # delta chains, otherwise it will throw a KeyError if a full + # deltachain can't be found. + self.allowincomplete = kwargs.get('allowincomplete', False) + + def get(self, name, node): + """Fetches the full text revision contents of the given name+node pair. + If the full text doesn't exist, throws a KeyError. + + Under the hood, this uses getdeltachain() across all the stores to build + up a full chain to produce the full text. + """ + chain = self.getdeltachain(name, node) + + if chain[-1][ChainIndicies.BASENODE] != nullid: + # If we didn't receive a full chain, throw + raise KeyError((name, hex(node))) + + # The last entry in the chain is a full text, so we start our delta + # applies with that. + fulltext = chain.pop()[ChainIndicies.DATA] + + text = fulltext + while chain: + delta = chain.pop()[ChainIndicies.DATA] + text = mdiff.patches(text, [delta]) + + return text + + @basestore.baseunionstore.retriable + def getdelta(self, name, node): + """Return the single delta entry for the given name/node pair. + """ + for store in self.stores: + try: + return store.getdelta(name, node) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + def getdeltachain(self, name, node): + """Returns the deltachain for the given name/node pair. + + Returns an ordered list of: + + [(name, node, deltabasename, deltabasenode, deltacontent),...] + + where the chain is terminated by a full text entry with a nullid + deltabasenode. + """ + chain = self._getpartialchain(name, node) + while chain[-1][ChainIndicies.BASENODE] != nullid: + x, x, deltabasename, deltabasenode, x = chain[-1] + try: + morechain = self._getpartialchain(deltabasename, deltabasenode) + chain.extend(morechain) + except KeyError: + # If we allow incomplete chains, don't throw. + if not self.allowincomplete: + raise + break + + return chain + + @basestore.baseunionstore.retriable + def getmeta(self, name, node): + """Returns the metadata dict for given node.""" + for store in self.stores: + try: + return store.getmeta(name, node) + except KeyError: + pass + raise KeyError((name, hex(node))) + + def getmetrics(self): + metrics = [s.getmetrics() for s in self.stores] + return shallowutil.sumdicts(*metrics) + + @basestore.baseunionstore.retriable + def _getpartialchain(self, name, node): + """Returns a partial delta chain for the given name/node pair. + + A partial chain is a chain that may not be terminated in a full-text. + """ + for store in self.stores: + try: + return store.getdeltachain(name, node) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + def add(self, name, node, data): + raise RuntimeError("cannot add content only to remotefilelog " + "contentstore") + + def getmissing(self, keys): + missing = keys + for store in self.stores: + if missing: + missing = store.getmissing(missing) + return missing + + def addremotefilelognode(self, name, node, data): + if self.writestore: + self.writestore.addremotefilelognode(name, node, data) + else: + raise RuntimeError("no writable store configured") + + def markledger(self, ledger, options=None): + for store in self.stores: + store.markledger(ledger, options) + +class remotefilelogcontentstore(basestore.basestore): + def __init__(self, *args, **kwargs): + super(remotefilelogcontentstore, self).__init__(*args, **kwargs) + self._threaddata = threading.local() + + def get(self, name, node): + # return raw revision text + data = self._getdata(name, node) + + offset, size, flags = shallowutil.parsesizeflags(data) + content = data[offset:offset + size] + + ancestormap = shallowutil.ancestormap(data) + p1, p2, linknode, copyfrom = ancestormap[node] + copyrev = None + if copyfrom: + copyrev = hex(p1) + + self._updatemetacache(node, size, flags) + + # lfs tracks renames in its own metadata, remove hg copy metadata, + # because copy metadata will be re-added by lfs flag processor. + if flags & revlog.REVIDX_EXTSTORED: + copyrev = copyfrom = None + revision = shallowutil.createrevlogtext(content, copyfrom, copyrev) + return revision + + def getdelta(self, name, node): + # Since remotefilelog content stores only contain full texts, just + # return that. + revision = self.get(name, node) + return revision, name, nullid, self.getmeta(name, node) + + def getdeltachain(self, name, node): + # Since remotefilelog content stores just contain full texts, we return + # a fake delta chain that just consists of a single full text revision. + # The nullid in the deltabasenode slot indicates that the revision is a + # fulltext. + revision = self.get(name, node) + return [(name, node, None, nullid, revision)] + + def getmeta(self, name, node): + self._sanitizemetacache() + if node != self._threaddata.metacache[0]: + data = self._getdata(name, node) + offset, size, flags = shallowutil.parsesizeflags(data) + self._updatemetacache(node, size, flags) + return self._threaddata.metacache[1] + + def add(self, name, node, data): + raise RuntimeError("cannot add content only to remotefilelog " + "contentstore") + + def _sanitizemetacache(self): + metacache = getattr(self._threaddata, 'metacache', None) + if metacache is None: + self._threaddata.metacache = (None, None) # (node, meta) + + def _updatemetacache(self, node, size, flags): + self._sanitizemetacache() + if node == self._threaddata.metacache[0]: + return + meta = {constants.METAKEYFLAG: flags, + constants.METAKEYSIZE: size} + self._threaddata.metacache = (node, meta) + +class remotecontentstore(object): + def __init__(self, ui, fileservice, shared): + self._fileservice = fileservice + # type(shared) is usually remotefilelogcontentstore + self._shared = shared + + def get(self, name, node): + self._fileservice.prefetch([(name, hex(node))], force=True, + fetchdata=True) + return self._shared.get(name, node) + + def getdelta(self, name, node): + revision = self.get(name, node) + return revision, name, nullid, self._shared.getmeta(name, node) + + def getdeltachain(self, name, node): + # Since our remote content stores just contain full texts, we return a + # fake delta chain that just consists of a single full text revision. + # The nullid in the deltabasenode slot indicates that the revision is a + # fulltext. + revision = self.get(name, node) + return [(name, node, None, nullid, revision)] + + def getmeta(self, name, node): + self._fileservice.prefetch([(name, hex(node))], force=True, + fetchdata=True) + return self._shared.getmeta(name, node) + + def add(self, name, node, data): + raise RuntimeError("cannot add to a remote store") + + def getmissing(self, keys): + return keys + + def markledger(self, ledger, options=None): + pass + +class manifestrevlogstore(object): + def __init__(self, repo): + self._store = repo.store + self._svfs = repo.svfs + self._revlogs = dict() + self._cl = revlog.revlog(self._svfs, '00changelog.i') + self._repackstartlinkrev = 0 + + def get(self, name, node): + return self._revlog(name).revision(node, raw=True) + + def getdelta(self, name, node): + revision = self.get(name, node) + return revision, name, nullid, self.getmeta(name, node) + + def getdeltachain(self, name, node): + revision = self.get(name, node) + return [(name, node, None, nullid, revision)] + + def getmeta(self, name, node): + rl = self._revlog(name) + rev = rl.rev(node) + return {constants.METAKEYFLAG: rl.flags(rev), + constants.METAKEYSIZE: rl.rawsize(rev)} + + def getancestors(self, name, node, known=None): + if known is None: + known = set() + if node in known: + return [] + + rl = self._revlog(name) + ancestors = {} + missing = set((node,)) + for ancrev in rl.ancestors([rl.rev(node)], inclusive=True): + ancnode = rl.node(ancrev) + missing.discard(ancnode) + + p1, p2 = rl.parents(ancnode) + if p1 != nullid and p1 not in known: + missing.add(p1) + if p2 != nullid and p2 not in known: + missing.add(p2) + + linknode = self._cl.node(rl.linkrev(ancrev)) + ancestors[rl.node(ancrev)] = (p1, p2, linknode, '') + if not missing: + break + return ancestors + + def getnodeinfo(self, name, node): + cl = self._cl + rl = self._revlog(name) + parents = rl.parents(node) + linkrev = rl.linkrev(rl.rev(node)) + return (parents[0], parents[1], cl.node(linkrev), None) + + def add(self, *args): + raise RuntimeError("cannot add to a revlog store") + + def _revlog(self, name): + rl = self._revlogs.get(name) + if rl is None: + revlogname = '00manifesttree.i' + if name != '': + revlogname = 'meta/%s/00manifest.i' % name + rl = revlog.revlog(self._svfs, revlogname) + self._revlogs[name] = rl + return rl + + def getmissing(self, keys): + missing = [] + for name, node in keys: + mfrevlog = self._revlog(name) + if node not in mfrevlog.nodemap: + missing.append((name, node)) + + return missing + + def setrepacklinkrevrange(self, startrev, endrev): + self._repackstartlinkrev = startrev + self._repackendlinkrev = endrev + + def markledger(self, ledger, options=None): + if options and options.get(constants.OPTION_PACKSONLY): + return + treename = '' + rl = revlog.revlog(self._svfs, '00manifesttree.i') + startlinkrev = self._repackstartlinkrev + endlinkrev = self._repackendlinkrev + for rev in pycompat.xrange(len(rl) - 1, -1, -1): + linkrev = rl.linkrev(rev) + if linkrev < startlinkrev: + break + if linkrev > endlinkrev: + continue + node = rl.node(rev) + ledger.markdataentry(self, treename, node) + ledger.markhistoryentry(self, treename, node) + + for path, encoded, size in self._store.datafiles(): + if path[:5] != 'meta/' or path[-2:] != '.i': + continue + + treename = path[5:-len('/00manifest.i')] + + rl = revlog.revlog(self._svfs, path) + for rev in pycompat.xrange(len(rl) - 1, -1, -1): + linkrev = rl.linkrev(rev) + if linkrev < startlinkrev: + break + if linkrev > endlinkrev: + continue + node = rl.node(rev) + ledger.markdataentry(self, treename, node) + ledger.markhistoryentry(self, treename, node) + + def cleanup(self, ledger): + pass diff --git a/hgext/remotefilelog/datapack.py b/hgext/remotefilelog/datapack.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/datapack.py @@ -0,0 +1,470 @@ +from __future__ import absolute_import + +import struct + +from mercurial.node import hex, nullid +from mercurial.i18n import _ +from mercurial import ( + error, + pycompat, + util, +) +from . import ( + basepack, + constants, + lz4wrapper, + shallowutil, +) + +NODELENGTH = 20 + +# The indicator value in the index for a fulltext entry. +FULLTEXTINDEXMARK = -1 +NOBASEINDEXMARK = -2 + +INDEXSUFFIX = '.dataidx' +PACKSUFFIX = '.datapack' + +class datapackstore(basepack.basepackstore): + INDEXSUFFIX = INDEXSUFFIX + PACKSUFFIX = PACKSUFFIX + + def __init__(self, ui, path): + super(datapackstore, self).__init__(ui, path) + + def getpack(self, path): + return datapack(path) + + def get(self, name, node): + raise RuntimeError("must use getdeltachain with datapackstore") + + def getmeta(self, name, node): + for pack in self.packs: + try: + return pack.getmeta(name, node) + except KeyError: + pass + + for pack in self.refresh(): + try: + return pack.getmeta(name, node) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + def getdelta(self, name, node): + for pack in self.packs: + try: + return pack.getdelta(name, node) + except KeyError: + pass + + for pack in self.refresh(): + try: + return pack.getdelta(name, node) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + def getdeltachain(self, name, node): + for pack in self.packs: + try: + return pack.getdeltachain(name, node) + except KeyError: + pass + + for pack in self.refresh(): + try: + return pack.getdeltachain(name, node) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + def add(self, name, node, data): + raise RuntimeError("cannot add to datapackstore") + +class datapack(basepack.basepack): + INDEXSUFFIX = INDEXSUFFIX + PACKSUFFIX = PACKSUFFIX + + # Format is + # See the mutabledatapack doccomment for more details. + INDEXFORMAT = '!20siQQ' + INDEXENTRYLENGTH = 40 + + SUPPORTED_VERSIONS = [0, 1] + + def getmissing(self, keys): + missing = [] + for name, node in keys: + value = self._find(node) + if not value: + missing.append((name, node)) + + return missing + + def get(self, name, node): + raise RuntimeError("must use getdeltachain with datapack (%s:%s)" + % (name, hex(node))) + + def getmeta(self, name, node): + value = self._find(node) + if value is None: + raise KeyError((name, hex(node))) + + # version 0 does not support metadata + if self.VERSION == 0: + return {} + + node, deltabaseoffset, offset, size = value + rawentry = self._data[offset:offset + size] + + # see docstring of mutabledatapack for the format + offset = 0 + offset += struct.unpack_from('!H', rawentry, offset)[0] + 2 # filename + offset += 40 # node, deltabase node + offset += struct.unpack_from('!Q', rawentry, offset)[0] + 8 # delta + + metalen = struct.unpack_from('!I', rawentry, offset)[0] + offset += 4 + + meta = shallowutil.parsepackmeta(rawentry[offset:offset + metalen]) + + return meta + + def getdelta(self, name, node): + value = self._find(node) + if value is None: + raise KeyError((name, hex(node))) + + node, deltabaseoffset, offset, size = value + entry = self._readentry(offset, size, getmeta=True) + filename, node, deltabasenode, delta, meta = entry + + # If we've read a lot of data from the mmap, free some memory. + self.freememory() + + return delta, filename, deltabasenode, meta + + def getdeltachain(self, name, node): + value = self._find(node) + if value is None: + raise KeyError((name, hex(node))) + + params = self.params + + # Precompute chains + chain = [value] + deltabaseoffset = value[1] + entrylen = self.INDEXENTRYLENGTH + while (deltabaseoffset != FULLTEXTINDEXMARK + and deltabaseoffset != NOBASEINDEXMARK): + loc = params.indexstart + deltabaseoffset + value = struct.unpack(self.INDEXFORMAT, + self._index[loc:loc + entrylen]) + deltabaseoffset = value[1] + chain.append(value) + + # Read chain data + deltachain = [] + for node, deltabaseoffset, offset, size in chain: + filename, node, deltabasenode, delta = self._readentry(offset, size) + deltachain.append((filename, node, filename, deltabasenode, delta)) + + # If we've read a lot of data from the mmap, free some memory. + self.freememory() + + return deltachain + + def _readentry(self, offset, size, getmeta=False): + rawentry = self._data[offset:offset + size] + self._pagedin += len(rawentry) + + # <2 byte len> + + lengthsize = 2 + filenamelen = struct.unpack('!H', rawentry[:2])[0] + filename = rawentry[lengthsize:lengthsize + filenamelen] + + # <20 byte node> + <20 byte deltabase> + nodestart = lengthsize + filenamelen + deltabasestart = nodestart + NODELENGTH + node = rawentry[nodestart:deltabasestart] + deltabasenode = rawentry[deltabasestart:deltabasestart + NODELENGTH] + + # <8 byte len> + + deltastart = deltabasestart + NODELENGTH + rawdeltalen = rawentry[deltastart:deltastart + 8] + deltalen = struct.unpack('!Q', rawdeltalen)[0] + + delta = rawentry[deltastart + 8:deltastart + 8 + deltalen] + delta = lz4wrapper.lz4decompress(delta) + + if getmeta: + if self.VERSION == 0: + meta = {} + else: + metastart = deltastart + 8 + deltalen + metalen = struct.unpack_from('!I', rawentry, metastart)[0] + + rawmeta = rawentry[metastart + 4:metastart + 4 + metalen] + meta = shallowutil.parsepackmeta(rawmeta) + return filename, node, deltabasenode, delta, meta + else: + return filename, node, deltabasenode, delta + + def add(self, name, node, data): + raise RuntimeError("cannot add to datapack (%s:%s)" % (name, node)) + + def _find(self, node): + params = self.params + fanoutkey = struct.unpack(params.fanoutstruct, + node[:params.fanoutprefix])[0] + fanout = self._fanouttable + + start = fanout[fanoutkey] + params.indexstart + indexend = self._indexend + + # Scan forward to find the first non-same entry, which is the upper + # bound. + for i in pycompat.xrange(fanoutkey + 1, params.fanoutcount): + end = fanout[i] + params.indexstart + if end != start: + break + else: + end = indexend + + # Bisect between start and end to find node + index = self._index + startnode = index[start:start + NODELENGTH] + endnode = index[end:end + NODELENGTH] + entrylen = self.INDEXENTRYLENGTH + if startnode == node: + entry = index[start:start + entrylen] + elif endnode == node: + entry = index[end:end + entrylen] + else: + while start < end - entrylen: + mid = start + (end - start) / 2 + mid = mid - ((mid - params.indexstart) % entrylen) + midnode = index[mid:mid + NODELENGTH] + if midnode == node: + entry = index[mid:mid + entrylen] + break + if node > midnode: + start = mid + startnode = midnode + elif node < midnode: + end = mid + endnode = midnode + else: + return None + + return struct.unpack(self.INDEXFORMAT, entry) + + def markledger(self, ledger, options=None): + for filename, node in self: + ledger.markdataentry(self, filename, node) + + def cleanup(self, ledger): + entries = ledger.sources.get(self, []) + allkeys = set(self) + repackedkeys = set((e.filename, e.node) for e in entries if + e.datarepacked or e.gced) + + if len(allkeys - repackedkeys) == 0: + if self.path not in ledger.created: + util.unlinkpath(self.indexpath, ignoremissing=True) + util.unlinkpath(self.packpath, ignoremissing=True) + + def __iter__(self): + for f, n, deltabase, deltalen in self.iterentries(): + yield f, n + + def iterentries(self): + # Start at 1 to skip the header + offset = 1 + data = self._data + while offset < self.datasize: + oldoffset = offset + + # <2 byte len> + + filenamelen = struct.unpack('!H', data[offset:offset + 2])[0] + offset += 2 + filename = data[offset:offset + filenamelen] + offset += filenamelen + + # <20 byte node> + node = data[offset:offset + constants.NODESIZE] + offset += constants.NODESIZE + # <20 byte deltabase> + deltabase = data[offset:offset + constants.NODESIZE] + offset += constants.NODESIZE + + # <8 byte len> + + rawdeltalen = data[offset:offset + 8] + deltalen = struct.unpack('!Q', rawdeltalen)[0] + offset += 8 + + # it has to be at least long enough for the lz4 header. + assert deltalen >= 4 + + # python-lz4 stores the length of the uncompressed field as a + # little-endian 32-bit integer at the start of the data. + uncompressedlen = struct.unpack(' + + metalen = struct.unpack_from('!I', data, offset)[0] + offset += 4 + metalen + + yield (filename, node, deltabase, uncompressedlen) + + # If we've read a lot of data from the mmap, free some memory. + self._pagedin += offset - oldoffset + if self.freememory(): + data = self._data + +class mutabledatapack(basepack.mutablebasepack): + """A class for constructing and serializing a datapack file and index. + + A datapack is a pair of files that contain the revision contents for various + file revisions in Mercurial. It contains only revision contents (like file + contents), not any history information. + + It consists of two files, with the following format. All bytes are in + network byte order (big endian). + + .datapack + The pack itself is a series of revision deltas with some basic header + information on each. A revision delta may be a fulltext, represented by + a deltabasenode equal to the nullid. + + datapack = + [,...] + revision = + + + + + + [1] + [1] + metadata-list = [, ...] + metadata-item = + + + + metadata-key could be METAKEYFLAG or METAKEYSIZE or other single byte + value in the future. + + .dataidx + The index file consists of two parts, the fanout and the index. + + The index is a list of index entries, sorted by node (one per revision + in the pack). Each entry has: + + - node (The 20 byte node of the entry; i.e. the commit hash, file node + hash, etc) + - deltabase index offset (The location in the index of the deltabase for + this entry. The deltabase is the next delta in + the chain, with the chain eventually + terminating in a full-text, represented by a + deltabase offset of -1. This lets us compute + delta chains from the index, then do + sequential reads from the pack if the revision + are nearby on disk.) + - pack entry offset (The location of this entry in the datapack) + - pack content size (The on-disk length of this entry's pack data) + + The fanout is a quick lookup table to reduce the number of steps for + bisecting the index. It is a series of 4 byte pointers to positions + within the index. It has 2^16 entries, which corresponds to hash + prefixes [0000, 0001,..., FFFE, FFFF]. Example: the pointer in slot + 4F0A points to the index position of the first revision whose node + starts with 4F0A. This saves log(2^16)=16 bisect steps. + + dataidx = + + fanouttable = [,...] (2^16 entries) + index = [,...] + indexentry = + + + + + [1]: new in version 1. + """ + INDEXSUFFIX = INDEXSUFFIX + PACKSUFFIX = PACKSUFFIX + + # v[01] index format: + INDEXFORMAT = datapack.INDEXFORMAT + INDEXENTRYLENGTH = datapack.INDEXENTRYLENGTH + + # v1 has metadata support + SUPPORTED_VERSIONS = [0, 1] + + def add(self, name, node, deltabasenode, delta, metadata=None): + # metadata is a dict, ex. {METAKEYFLAG: flag} + if len(name) > 2**16: + raise RuntimeError(_("name too long %s") % name) + if len(node) != 20: + raise RuntimeError(_("node should be 20 bytes %s") % node) + + if node in self.entries: + # The revision has already been added + return + + # TODO: allow configurable compression + delta = lz4wrapper.lz4compress(delta) + + rawdata = ''.join(( + struct.pack('!H', len(name)), # unsigned 2 byte int + name, + node, + deltabasenode, + struct.pack('!Q', len(delta)), # unsigned 8 byte int + delta, + )) + + if self.VERSION == 1: + # v1 support metadata + rawmeta = shallowutil.buildpackmeta(metadata) + rawdata += struct.pack('!I', len(rawmeta)) # unsigned 4 byte + rawdata += rawmeta + else: + # v0 cannot store metadata, raise if metadata contains flag + if metadata and metadata.get(constants.METAKEYFLAG, 0) != 0: + raise error.ProgrammingError('v0 pack cannot store flags') + + offset = self.packfp.tell() + + size = len(rawdata) + + self.entries[node] = (deltabasenode, offset, size) + + self.writeraw(rawdata) + + def createindex(self, nodelocations, indexoffset): + entries = sorted((n, db, o, s) for n, (db, o, s) + in self.entries.iteritems()) + + rawindex = '' + fmt = self.INDEXFORMAT + for node, deltabase, offset, size in entries: + if deltabase == nullid: + deltabaselocation = FULLTEXTINDEXMARK + else: + # Instead of storing the deltabase node in the index, let's + # store a pointer directly to the index entry for the deltabase. + deltabaselocation = nodelocations.get(deltabase, + NOBASEINDEXMARK) + + entry = struct.pack(fmt, node, deltabaselocation, offset, size) + rawindex += entry + + return rawindex diff --git a/hgext/remotefilelog/debugcommands.py b/hgext/remotefilelog/debugcommands.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/debugcommands.py @@ -0,0 +1,375 @@ +# debugcommands.py - debug logic for remotefilelog +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import hashlib +import os + +from mercurial.node import bin, hex, nullid, short +from mercurial.i18n import _ +from mercurial import ( + error, + filelog, + revlog, +) +from . import ( + constants, + datapack, + extutil, + fileserverclient, + historypack, + lz4wrapper, + repack, + shallowrepo, + shallowutil, +) + +def debugremotefilelog(ui, path, **opts): + decompress = opts.get('decompress') + + size, firstnode, mapping = parsefileblob(path, decompress) + + ui.status(_("size: %s bytes\n") % (size)) + ui.status(_("path: %s \n") % (path)) + ui.status(_("key: %s \n") % (short(firstnode))) + ui.status(_("\n")) + ui.status(_("%12s => %12s %13s %13s %12s\n") % + ("node", "p1", "p2", "linknode", "copyfrom")) + + queue = [firstnode] + while queue: + node = queue.pop(0) + p1, p2, linknode, copyfrom = mapping[node] + ui.status(_("%s => %s %s %s %s\n") % + (short(node), short(p1), short(p2), short(linknode), copyfrom)) + if p1 != nullid: + queue.append(p1) + if p2 != nullid: + queue.append(p2) + +def buildtemprevlog(repo, file): + # get filename key + filekey = hashlib.sha1(file).hexdigest() + filedir = os.path.join(repo.path, 'store/data', filekey) + + # sort all entries based on linkrev + fctxs = [] + for filenode in os.listdir(filedir): + if '_old' not in filenode: + fctxs.append(repo.filectx(file, fileid=bin(filenode))) + + fctxs = sorted(fctxs, key=lambda x: x.linkrev()) + + # add to revlog + temppath = repo.sjoin('data/temprevlog.i') + if os.path.exists(temppath): + os.remove(temppath) + r = filelog.filelog(repo.svfs, 'temprevlog') + + class faket(object): + def add(self, a, b, c): + pass + t = faket() + for fctx in fctxs: + if fctx.node() not in repo: + continue + + p = fctx.filelog().parents(fctx.filenode()) + meta = {} + if fctx.renamed(): + meta['copy'] = fctx.renamed()[0] + meta['copyrev'] = hex(fctx.renamed()[1]) + + r.add(fctx.data(), meta, t, fctx.linkrev(), p[0], p[1]) + + return r + +def debugindex(orig, ui, repo, file_=None, **opts): + """dump the contents of an index file""" + if (opts.get('changelog') or + opts.get('manifest') or + opts.get('dir') or + not shallowrepo.requirement in repo.requirements or + not repo.shallowmatch(file_)): + return orig(ui, repo, file_, **opts) + + r = buildtemprevlog(repo, file_) + + # debugindex like normal + format = opts.get('format', 0) + if format not in (0, 1): + raise error.Abort(_("unknown format %d") % format) + + generaldelta = r.version & revlog.FLAG_GENERALDELTA + if generaldelta: + basehdr = ' delta' + else: + basehdr = ' base' + + if format == 0: + ui.write((" rev offset length " + basehdr + " linkrev" + " nodeid p1 p2\n")) + elif format == 1: + ui.write((" rev flag offset length" + " size " + basehdr + " link p1 p2" + " nodeid\n")) + + for i in r: + node = r.node(i) + if generaldelta: + base = r.deltaparent(i) + else: + base = r.chainbase(i) + if format == 0: + try: + pp = r.parents(node) + except Exception: + pp = [nullid, nullid] + ui.write("% 6d % 9d % 7d % 6d % 7d %s %s %s\n" % ( + i, r.start(i), r.length(i), base, r.linkrev(i), + short(node), short(pp[0]), short(pp[1]))) + elif format == 1: + pr = r.parentrevs(i) + ui.write("% 6d %04x % 8d % 8d % 8d % 6d % 6d % 6d % 6d %s\n" % ( + i, r.flags(i), r.start(i), r.length(i), r.rawsize(i), + base, r.linkrev(i), pr[0], pr[1], short(node))) + +def debugindexdot(orig, ui, repo, file_): + """dump an index DAG as a graphviz dot file""" + if not shallowrepo.requirement in repo.requirements: + return orig(ui, repo, file_) + + r = buildtemprevlog(repo, os.path.basename(file_)[:-2]) + + ui.write(("digraph G {\n")) + for i in r: + node = r.node(i) + pp = r.parents(node) + ui.write("\t%d -> %d\n" % (r.rev(pp[0]), i)) + if pp[1] != nullid: + ui.write("\t%d -> %d\n" % (r.rev(pp[1]), i)) + ui.write("}\n") + +def verifyremotefilelog(ui, path, **opts): + decompress = opts.get('decompress') + + for root, dirs, files in os.walk(path): + for file in files: + if file == "repos": + continue + filepath = os.path.join(root, file) + size, firstnode, mapping = parsefileblob(filepath, decompress) + for p1, p2, linknode, copyfrom in mapping.itervalues(): + if linknode == nullid: + actualpath = os.path.relpath(root, path) + key = fileserverclient.getcachekey("reponame", actualpath, + file) + ui.status("%s %s\n" % (key, os.path.relpath(filepath, + path))) + +def parsefileblob(path, decompress): + raw = None + f = open(path, "r") + try: + raw = f.read() + finally: + f.close() + + if decompress: + raw = lz4wrapper.lz4decompress(raw) + + offset, size, flags = shallowutil.parsesizeflags(raw) + start = offset + size + + firstnode = None + + mapping = {} + while start < len(raw): + divider = raw.index('\0', start + 80) + + currentnode = raw[start:(start + 20)] + if not firstnode: + firstnode = currentnode + + p1 = raw[(start + 20):(start + 40)] + p2 = raw[(start + 40):(start + 60)] + linknode = raw[(start + 60):(start + 80)] + copyfrom = raw[(start + 80):divider] + + mapping[currentnode] = (p1, p2, linknode, copyfrom) + start = divider + 1 + + return size, firstnode, mapping + +def debugdatapack(ui, *paths, **opts): + for path in paths: + if '.data' in path: + path = path[:path.index('.data')] + ui.write("%s:\n" % path) + dpack = datapack.datapack(path) + node = opts.get('node') + if node: + deltachain = dpack.getdeltachain('', bin(node)) + dumpdeltachain(ui, deltachain, **opts) + return + + if opts.get('long'): + hashformatter = hex + hashlen = 42 + else: + hashformatter = short + hashlen = 14 + + lastfilename = None + totaldeltasize = 0 + totalblobsize = 0 + def printtotals(): + if lastfilename is not None: + ui.write("\n") + if not totaldeltasize or not totalblobsize: + return + difference = totalblobsize - totaldeltasize + deltastr = "%0.1f%% %s" % ( + (100.0 * abs(difference) / totalblobsize), + ("smaller" if difference > 0 else "bigger")) + + ui.write(("Total:%s%s %s (%s)\n") % ( + "".ljust(2 * hashlen - len("Total:")), + str(totaldeltasize).ljust(12), + str(totalblobsize).ljust(9), + deltastr + )) + + bases = {} + nodes = set() + failures = 0 + for filename, node, deltabase, deltalen in dpack.iterentries(): + bases[node] = deltabase + if node in nodes: + ui.write(("Bad entry: %s appears twice\n" % short(node))) + failures += 1 + nodes.add(node) + if filename != lastfilename: + printtotals() + name = '(empty name)' if filename == '' else filename + ui.write("%s:\n" % name) + ui.write("%s%s%s%s\n" % ( + "Node".ljust(hashlen), + "Delta Base".ljust(hashlen), + "Delta Length".ljust(14), + "Blob Size".ljust(9))) + lastfilename = filename + totalblobsize = 0 + totaldeltasize = 0 + + # Metadata could be missing, in which case it will be an empty dict. + meta = dpack.getmeta(filename, node) + if constants.METAKEYSIZE in meta: + blobsize = meta[constants.METAKEYSIZE] + totaldeltasize += deltalen + totalblobsize += blobsize + else: + blobsize = "(missing)" + ui.write("%s %s %s%s\n" % ( + hashformatter(node), + hashformatter(deltabase), + str(deltalen).ljust(14), + blobsize)) + + if filename is not None: + printtotals() + + failures += _sanitycheck(ui, set(nodes), bases) + if failures > 1: + ui.warn(("%d failures\n" % failures)) + return 1 + +def _sanitycheck(ui, nodes, bases): + """ + Does some basic sanity checking on a packfiles with ``nodes`` ``bases`` (a + mapping of node->base): + + - Each deltabase must itself be a node elsewhere in the pack + - There must be no cycles + """ + failures = 0 + for node in nodes: + seen = set() + current = node + deltabase = bases[current] + + while deltabase != nullid: + if deltabase not in nodes: + ui.warn(("Bad entry: %s has an unknown deltabase (%s)\n" % + (short(node), short(deltabase)))) + failures += 1 + break + + if deltabase in seen: + ui.warn(("Bad entry: %s has a cycle (at %s)\n" % + (short(node), short(deltabase)))) + failures += 1 + break + + current = deltabase + seen.add(current) + deltabase = bases[current] + # Since ``node`` begins a valid chain, reset/memoize its base to nullid + # so we don't traverse it again. + bases[node] = nullid + return failures + +def dumpdeltachain(ui, deltachain, **opts): + hashformatter = hex + hashlen = 40 + + lastfilename = None + for filename, node, filename, deltabasenode, delta in deltachain: + if filename != lastfilename: + ui.write("\n%s\n" % filename) + lastfilename = filename + ui.write("%s %s %s %s\n" % ( + "Node".ljust(hashlen), + "Delta Base".ljust(hashlen), + "Delta SHA1".ljust(hashlen), + "Delta Length".ljust(6), + )) + + ui.write("%s %s %s %s\n" % ( + hashformatter(node), + hashformatter(deltabasenode), + hashlib.sha1(delta).hexdigest(), + len(delta))) + +def debughistorypack(ui, path): + if '.hist' in path: + path = path[:path.index('.hist')] + hpack = historypack.historypack(path) + + lastfilename = None + for entry in hpack.iterentries(): + filename, node, p1node, p2node, linknode, copyfrom = entry + if filename != lastfilename: + ui.write("\n%s\n" % filename) + ui.write("%s%s%s%s%s\n" % ( + "Node".ljust(14), + "P1 Node".ljust(14), + "P2 Node".ljust(14), + "Link Node".ljust(14), + "Copy From")) + lastfilename = filename + ui.write("%s %s %s %s %s\n" % (short(node), short(p1node), + short(p2node), short(linknode), copyfrom)) + +def debugwaitonrepack(repo): + with extutil.flock(repack.repacklockvfs(repo).join('repacklock'), ''): + return + +def debugwaitonprefetch(repo): + with repo._lock(repo.svfs, "prefetchlock", True, None, + None, _('prefetching in %s') % repo.origroot): + pass diff --git a/hgext/remotefilelog/extutil.py b/hgext/remotefilelog/extutil.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/extutil.py @@ -0,0 +1,151 @@ +# extutil.py - useful utility methods for extensions +# +# Copyright 2016 Facebook +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +from __future__ import absolute_import + +import contextlib +import errno +import os +import subprocess +import time + +from mercurial import ( + error, + lock as lockmod, + pycompat, + util, + vfs as vfsmod, +) + +if pycompat.iswindows: + # no fork on Windows, but we can create a detached process + # https://msdn.microsoft.com/en-us/library/windows/desktop/ms684863.aspx + # No stdlib constant exists for this value + DETACHED_PROCESS = 0x00000008 + _creationflags = DETACHED_PROCESS | subprocess.CREATE_NEW_PROCESS_GROUP + + def runbgcommand(script, env, shell=False, stdout=None, stderr=None): + '''Spawn a command without waiting for it to finish.''' + # we can't use close_fds *and* redirect stdin. I'm not sure that we + # need to because the detached process has no console connection. + subprocess.Popen( + script, shell=shell, env=env, close_fds=True, + creationflags=_creationflags, stdout=stdout, stderr=stderr) +else: + def runbgcommand(cmd, env, shell=False, stdout=None, stderr=None): + '''Spawn a command without waiting for it to finish.''' + # double-fork to completely detach from the parent process + # based on http://code.activestate.com/recipes/278731 + pid = os.fork() + if pid: + # Parent process + (_pid, status) = os.waitpid(pid, 0) + if os.WIFEXITED(status): + returncode = os.WEXITSTATUS(status) + else: + returncode = -os.WTERMSIG(status) + if returncode != 0: + # The child process's return code is 0 on success, an errno + # value on failure, or 255 if we don't have a valid errno + # value. + # + # (It would be slightly nicer to return the full exception info + # over a pipe as the subprocess module does. For now it + # doesn't seem worth adding that complexity here, though.) + if returncode == 255: + returncode = errno.EINVAL + raise OSError(returncode, 'error running %r: %s' % + (cmd, os.strerror(returncode))) + return + + returncode = 255 + try: + # Start a new session + os.setsid() + + stdin = open(os.devnull, 'r') + if stdout is None: + stdout = open(os.devnull, 'w') + if stderr is None: + stderr = open(os.devnull, 'w') + + # connect stdin to devnull to make sure the subprocess can't + # muck up that stream for mercurial. + subprocess.Popen( + cmd, shell=shell, env=env, close_fds=True, + stdin=stdin, stdout=stdout, stderr=stderr) + returncode = 0 + except EnvironmentError as ex: + returncode = (ex.errno & 0xff) + if returncode == 0: + # This shouldn't happen, but just in case make sure the + # return code is never 0 here. + returncode = 255 + except Exception: + returncode = 255 + finally: + # mission accomplished, this child needs to exit and not + # continue the hg process here. + os._exit(returncode) + +def runshellcommand(script, env): + ''' + Run a shell command in the background. + This spawns the command and returns before it completes. + + Prefer using runbgcommand() instead of this function. This function should + be discouraged in new code. Running commands through a subshell requires + you to be very careful about correctly escaping arguments, and you need to + make sure your command works with both Windows and Unix shells. + ''' + runbgcommand(script, env=env, shell=True) + +@contextlib.contextmanager +def flock(lockpath, description, timeout=-1): + """A flock based lock object. Currently it is always non-blocking. + + Note that since it is flock based, you can accidentally take it multiple + times within one process and the first one to be released will release all + of them. So the caller needs to be careful to not create more than one + instance per lock. + """ + + # best effort lightweight lock + try: + import fcntl + fcntl.flock + except ImportError: + # fallback to Mercurial lock + vfs = vfsmod.vfs(os.path.dirname(lockpath)) + with lockmod.lock(vfs, os.path.basename(lockpath), timeout=timeout): + yield + return + # make sure lock file exists + util.makedirs(os.path.dirname(lockpath)) + with open(lockpath, 'a'): + pass + lockfd = os.open(lockpath, os.O_RDONLY, 0o664) + start = time.time() + while True: + try: + fcntl.flock(lockfd, fcntl.LOCK_EX | fcntl.LOCK_NB) + break + except IOError as ex: + if ex.errno == errno.EAGAIN: + if timeout != -1 and time.time() - start > timeout: + raise error.LockHeld(errno.EAGAIN, lockpath, description, + '') + else: + time.sleep(0.05) + continue + raise + + try: + yield + finally: + fcntl.flock(lockfd, fcntl.LOCK_UN) + os.close(lockfd) diff --git a/hgext/remotefilelog/fileserverclient.py b/hgext/remotefilelog/fileserverclient.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/fileserverclient.py @@ -0,0 +1,648 @@ +# fileserverclient.py - client for communicating with the cache process +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +from __future__ import absolute_import + +import hashlib +import io +import os +import struct +import threading +import time + +from mercurial.i18n import _ +from mercurial.node import bin, hex, nullid +from mercurial import ( + error, + revlog, + sshpeer, + util, + wireprotov1peer, +) +from mercurial.utils import procutil + +from . import ( + constants, + contentstore, + lz4wrapper, + metadatastore, + shallowutil, + wirepack, +) + +_sshv1peer = sshpeer.sshv1peer + +# Statistics for debugging +fetchcost = 0 +fetches = 0 +fetched = 0 +fetchmisses = 0 + +_lfsmod = None +_downloading = _('downloading') + +def getcachekey(reponame, file, id): + pathhash = hashlib.sha1(file).hexdigest() + return os.path.join(reponame, pathhash[:2], pathhash[2:], id) + +def getlocalkey(file, id): + pathhash = hashlib.sha1(file).hexdigest() + return os.path.join(pathhash, id) + +def peersetup(ui, peer): + + class remotefilepeer(peer.__class__): + @wireprotov1peer.batchable + def getfile(self, file, node): + if not self.capable('getfile'): + raise error.Abort( + 'configured remotefile server does not support getfile') + f = wireprotov1peer.future() + yield {'file': file, 'node': node}, f + code, data = f.value.split('\0', 1) + if int(code): + raise error.LookupError(file, node, data) + yield data + + @wireprotov1peer.batchable + def getflogheads(self, path): + if not self.capable('getflogheads'): + raise error.Abort('configured remotefile server does not ' + 'support getflogheads') + f = wireprotov1peer.future() + yield {'path': path}, f + heads = f.value.split('\n') if f.value else [] + yield heads + + def _updatecallstreamopts(self, command, opts): + if command != 'getbundle': + return + if 'remotefilelog' not in self.capabilities(): + return + if not util.safehasattr(self, '_localrepo'): + return + if constants.REQUIREMENT not in self._localrepo.requirements: + return + + bundlecaps = opts.get('bundlecaps') + if bundlecaps: + bundlecaps = [bundlecaps] + else: + bundlecaps = [] + + # shallow, includepattern, and excludepattern are a hacky way of + # carrying over data from the local repo to this getbundle + # command. We need to do it this way because bundle1 getbundle + # doesn't provide any other place we can hook in to manipulate + # getbundle args before it goes across the wire. Once we get rid + # of bundle1, we can use bundle2's _pullbundle2extraprepare to + # do this more cleanly. + bundlecaps.append('remotefilelog') + if self._localrepo.includepattern: + patterns = '\0'.join(self._localrepo.includepattern) + includecap = "includepattern=" + patterns + bundlecaps.append(includecap) + if self._localrepo.excludepattern: + patterns = '\0'.join(self._localrepo.excludepattern) + excludecap = "excludepattern=" + patterns + bundlecaps.append(excludecap) + opts['bundlecaps'] = ','.join(bundlecaps) + + def _sendrequest(self, command, args, **opts): + self._updatecallstreamopts(command, args) + return super(remotefilepeer, self)._sendrequest(command, args, + **opts) + + def _callstream(self, command, **opts): + supertype = super(remotefilepeer, self) + if not util.safehasattr(supertype, '_sendrequest'): + self._updatecallstreamopts(command, opts) + return super(remotefilepeer, self)._callstream(command, **opts) + + peer.__class__ = remotefilepeer + +class cacheconnection(object): + """The connection for communicating with the remote cache. Performs + gets and sets by communicating with an external process that has the + cache-specific implementation. + """ + def __init__(self): + self.pipeo = self.pipei = self.pipee = None + self.subprocess = None + self.connected = False + + def connect(self, cachecommand): + if self.pipeo: + raise error.Abort(_("cache connection already open")) + self.pipei, self.pipeo, self.pipee, self.subprocess = \ + procutil.popen4(cachecommand) + self.connected = True + + def close(self): + def tryclose(pipe): + try: + pipe.close() + except Exception: + pass + if self.connected: + try: + self.pipei.write("exit\n") + except Exception: + pass + tryclose(self.pipei) + self.pipei = None + tryclose(self.pipeo) + self.pipeo = None + tryclose(self.pipee) + self.pipee = None + try: + # Wait for process to terminate, making sure to avoid deadlock. + # See https://docs.python.org/2/library/subprocess.html for + # warnings about wait() and deadlocking. + self.subprocess.communicate() + except Exception: + pass + self.subprocess = None + self.connected = False + + def request(self, request, flush=True): + if self.connected: + try: + self.pipei.write(request) + if flush: + self.pipei.flush() + except IOError: + self.close() + + def receiveline(self): + if not self.connected: + return None + try: + result = self.pipeo.readline()[:-1] + if not result: + self.close() + except IOError: + self.close() + + return result + +def _getfilesbatch( + remote, receivemissing, progresstick, missed, idmap, batchsize): + # Over http(s), iterbatch is a streamy method and we can start + # looking at results early. This means we send one (potentially + # large) request, but then we show nice progress as we process + # file results, rather than showing chunks of $batchsize in + # progress. + # + # Over ssh, iterbatch isn't streamy because batch() wasn't + # explicitly designed as a streaming method. In the future we + # should probably introduce a streambatch() method upstream and + # use that for this. + with remote.commandexecutor() as e: + futures = [] + for m in missed: + futures.append(e.callcommand('getfile', { + 'file': idmap[m], + 'node': m[-40:] + })) + + for i, m in enumerate(missed): + r = futures[i].result() + futures[i] = None # release memory + file_ = idmap[m] + node = m[-40:] + receivemissing(io.BytesIO('%d\n%s' % (len(r), r)), file_, node) + progresstick() + +def _getfiles_optimistic( + remote, receivemissing, progresstick, missed, idmap, step): + remote._callstream("getfiles") + i = 0 + pipeo = remote._pipeo + pipei = remote._pipei + while i < len(missed): + # issue a batch of requests + start = i + end = min(len(missed), start + step) + i = end + for missingid in missed[start:end]: + # issue new request + versionid = missingid[-40:] + file = idmap[missingid] + sshrequest = "%s%s\n" % (versionid, file) + pipeo.write(sshrequest) + pipeo.flush() + + # receive batch results + for missingid in missed[start:end]: + versionid = missingid[-40:] + file = idmap[missingid] + receivemissing(pipei, file, versionid) + progresstick() + + # End the command + pipeo.write('\n') + pipeo.flush() + +def _getfiles_threaded( + remote, receivemissing, progresstick, missed, idmap, step): + remote._callstream("getfiles") + pipeo = remote._pipeo + pipei = remote._pipei + + def writer(): + for missingid in missed: + versionid = missingid[-40:] + file = idmap[missingid] + sshrequest = "%s%s\n" % (versionid, file) + pipeo.write(sshrequest) + pipeo.flush() + writerthread = threading.Thread(target=writer) + writerthread.daemon = True + writerthread.start() + + for missingid in missed: + versionid = missingid[-40:] + file = idmap[missingid] + receivemissing(pipei, file, versionid) + progresstick() + + writerthread.join() + # End the command + pipeo.write('\n') + pipeo.flush() + +class fileserverclient(object): + """A client for requesting files from the remote file server. + """ + def __init__(self, repo): + ui = repo.ui + self.repo = repo + self.ui = ui + self.cacheprocess = ui.config("remotefilelog", "cacheprocess") + if self.cacheprocess: + self.cacheprocess = util.expandpath(self.cacheprocess) + + # This option causes remotefilelog to pass the full file path to the + # cacheprocess instead of a hashed key. + self.cacheprocesspasspath = ui.configbool( + "remotefilelog", "cacheprocess.includepath") + + self.debugoutput = ui.configbool("remotefilelog", "debug") + + self.remotecache = cacheconnection() + + def setstore(self, datastore, historystore, writedata, writehistory): + self.datastore = datastore + self.historystore = historystore + self.writedata = writedata + self.writehistory = writehistory + + def _connect(self): + return self.repo.connectionpool.get(self.repo.fallbackpath) + + def request(self, fileids): + """Takes a list of filename/node pairs and fetches them from the + server. Files are stored in the local cache. + A list of nodes that the server couldn't find is returned. + If the connection fails, an exception is raised. + """ + if not self.remotecache.connected: + self.connect() + cache = self.remotecache + writedata = self.writedata + + if self.ui.configbool('remotefilelog', 'fetchpacks'): + self.requestpack(fileids) + return + + repo = self.repo + count = len(fileids) + request = "get\n%d\n" % count + idmap = {} + reponame = repo.name + for file, id in fileids: + fullid = getcachekey(reponame, file, id) + if self.cacheprocesspasspath: + request += file + '\0' + request += fullid + "\n" + idmap[fullid] = file + + cache.request(request) + + total = count + self.ui.progress(_downloading, 0, total=count) + + missed = [] + count = 0 + while True: + missingid = cache.receiveline() + if not missingid: + missedset = set(missed) + for missingid in idmap.iterkeys(): + if not missingid in missedset: + missed.append(missingid) + self.ui.warn(_("warning: cache connection closed early - " + + "falling back to server\n")) + break + if missingid == "0": + break + if missingid.startswith("_hits_"): + # receive progress reports + parts = missingid.split("_") + count += int(parts[2]) + self.ui.progress(_downloading, count, total=total) + continue + + missed.append(missingid) + + global fetchmisses + fetchmisses += len(missed) + + count = [total - len(missed)] + fromcache = count[0] + self.ui.progress(_downloading, count[0], total=total) + self.ui.log("remotefilelog", "remote cache hit rate is %r of %r\n", + count[0], total, hit=count[0], total=total) + + oldumask = os.umask(0o002) + try: + # receive cache misses from master + if missed: + def progresstick(): + count[0] += 1 + self.ui.progress(_downloading, count[0], total=total) + # When verbose is true, sshpeer prints 'running ssh...' + # to stdout, which can interfere with some command + # outputs + verbose = self.ui.verbose + self.ui.verbose = False + try: + with self._connect() as conn: + remote = conn.peer + # TODO: deduplicate this with the constant in + # shallowrepo + if remote.capable("remotefilelog"): + if not isinstance(remote, _sshv1peer): + raise error.Abort('remotefilelog requires ssh ' + 'servers') + step = self.ui.configint('remotefilelog', + 'getfilesstep') + getfilestype = self.ui.config('remotefilelog', + 'getfilestype') + if getfilestype == 'threaded': + _getfiles = _getfiles_threaded + else: + _getfiles = _getfiles_optimistic + _getfiles(remote, self.receivemissing, progresstick, + missed, idmap, step) + elif remote.capable("getfile"): + if remote.capable('batch'): + batchdefault = 100 + else: + batchdefault = 10 + batchsize = self.ui.configint( + 'remotefilelog', 'batchsize', batchdefault) + _getfilesbatch( + remote, self.receivemissing, progresstick, + missed, idmap, batchsize) + else: + raise error.Abort("configured remotefilelog server" + " does not support remotefilelog") + + self.ui.log("remotefilefetchlog", + "Success\n", + fetched_files = count[0] - fromcache, + total_to_fetch = total - fromcache) + except Exception: + self.ui.log("remotefilefetchlog", + "Fail\n", + fetched_files = count[0] - fromcache, + total_to_fetch = total - fromcache) + raise + finally: + self.ui.verbose = verbose + # send to memcache + count[0] = len(missed) + request = "set\n%d\n%s\n" % (count[0], "\n".join(missed)) + cache.request(request) + + self.ui.progress(_downloading, None) + + # mark ourselves as a user of this cache + writedata.markrepo(self.repo.path) + finally: + os.umask(oldumask) + + def receivemissing(self, pipe, filename, node): + line = pipe.readline()[:-1] + if not line: + raise error.ResponseError(_("error downloading file contents:"), + _("connection closed early")) + size = int(line) + data = pipe.read(size) + if len(data) != size: + raise error.ResponseError(_("error downloading file contents:"), + _("only received %s of %s bytes") + % (len(data), size)) + + self.writedata.addremotefilelognode(filename, bin(node), + lz4wrapper.lz4decompress(data)) + + def requestpack(self, fileids): + """Requests the given file revisions from the server in a pack format. + + See `remotefilelogserver.getpack` for the file format. + """ + try: + with self._connect() as conn: + total = len(fileids) + rcvd = 0 + + remote = conn.peer + remote._callstream("getpackv1") + + self._sendpackrequest(remote, fileids) + + packpath = shallowutil.getcachepackpath( + self.repo, constants.FILEPACK_CATEGORY) + pipei = remote._pipei + receiveddata, receivedhistory = wirepack.receivepack( + self.repo.ui, pipei, packpath) + rcvd = len(receiveddata) + + self.ui.log("remotefilefetchlog", + "Success(pack)\n" if (rcvd==total) else "Fail(pack)\n", + fetched_files = rcvd, + total_to_fetch = total) + except Exception: + self.ui.log("remotefilefetchlog", + "Fail(pack)\n", + fetched_files = rcvd, + total_to_fetch = total) + raise + + def _sendpackrequest(self, remote, fileids): + """Formats and writes the given fileids to the remote as part of a + getpackv1 call. + """ + # Sort the requests by name, so we receive requests in batches by name + grouped = {} + for filename, node in fileids: + grouped.setdefault(filename, set()).add(node) + + # Issue request + pipeo = remote._pipeo + for filename, nodes in grouped.iteritems(): + filenamelen = struct.pack(constants.FILENAMESTRUCT, len(filename)) + countlen = struct.pack(constants.PACKREQUESTCOUNTSTRUCT, len(nodes)) + rawnodes = ''.join(bin(n) for n in nodes) + + pipeo.write('%s%s%s%s' % (filenamelen, filename, countlen, + rawnodes)) + pipeo.flush() + pipeo.write(struct.pack(constants.FILENAMESTRUCT, 0)) + pipeo.flush() + + def connect(self): + if self.cacheprocess: + cmd = "%s %s" % (self.cacheprocess, self.writedata._path) + self.remotecache.connect(cmd) + else: + # If no cache process is specified, we fake one that always + # returns cache misses. This enables tests to run easily + # and may eventually allow us to be a drop in replacement + # for the largefiles extension. + class simplecache(object): + def __init__(self): + self.missingids = [] + self.connected = True + + def close(self): + pass + + def request(self, value, flush=True): + lines = value.split("\n") + if lines[0] != "get": + return + self.missingids = lines[2:-1] + self.missingids.append('0') + + def receiveline(self): + if len(self.missingids) > 0: + return self.missingids.pop(0) + return None + + self.remotecache = simplecache() + + def close(self): + if fetches: + msg = ("%s files fetched over %d fetches - " + + "(%d misses, %0.2f%% hit ratio) over %0.2fs\n") % ( + fetched, + fetches, + fetchmisses, + float(fetched - fetchmisses) / float(fetched) * 100.0, + fetchcost) + if self.debugoutput: + self.ui.warn(msg) + self.ui.log("remotefilelog.prefetch", msg.replace("%", "%%"), + remotefilelogfetched=fetched, + remotefilelogfetches=fetches, + remotefilelogfetchmisses=fetchmisses, + remotefilelogfetchtime=fetchcost * 1000) + + if self.remotecache.connected: + self.remotecache.close() + + def prefetch(self, fileids, force=False, fetchdata=True, + fetchhistory=False): + """downloads the given file versions to the cache + """ + repo = self.repo + idstocheck = [] + for file, id in fileids: + # hack + # - we don't use .hgtags + # - workingctx produces ids with length 42, + # which we skip since they aren't in any cache + if (file == '.hgtags' or len(id) == 42 + or not repo.shallowmatch(file)): + continue + + idstocheck.append((file, bin(id))) + + datastore = self.datastore + historystore = self.historystore + if force: + datastore = contentstore.unioncontentstore(*repo.shareddatastores) + historystore = metadatastore.unionmetadatastore( + *repo.sharedhistorystores) + + missingids = set() + if fetchdata: + missingids.update(datastore.getmissing(idstocheck)) + if fetchhistory: + missingids.update(historystore.getmissing(idstocheck)) + + # partition missing nodes into nullid and not-nullid so we can + # warn about this filtering potentially shadowing bugs. + nullids = len([None for unused, id in missingids if id == nullid]) + if nullids: + missingids = [(f, id) for f, id in missingids if id != nullid] + repo.ui.develwarn( + ('remotefilelog not fetching %d null revs' + ' - this is likely hiding bugs' % nullids), + config='remotefilelog-ext') + if missingids: + global fetches, fetched, fetchcost + fetches += 1 + + # We want to be able to detect excess individual file downloads, so + # let's log that information for debugging. + if fetches >= 15 and fetches < 18: + if fetches == 15: + fetchwarning = self.ui.config('remotefilelog', + 'fetchwarning') + if fetchwarning: + self.ui.warn(fetchwarning + '\n') + self.logstacktrace() + missingids = [(file, hex(id)) for file, id in missingids] + fetched += len(missingids) + start = time.time() + missingids = self.request(missingids) + if missingids: + raise error.Abort(_("unable to download %d files") % + len(missingids)) + fetchcost += time.time() - start + self._lfsprefetch(fileids) + + def _lfsprefetch(self, fileids): + if not _lfsmod or not util.safehasattr( + self.repo.svfs, 'lfslocalblobstore'): + return + if not _lfsmod.wrapper.candownload(self.repo): + return + pointers = [] + store = self.repo.svfs.lfslocalblobstore + for file, id in fileids: + node = bin(id) + rlog = self.repo.file(file) + if rlog.flags(node) & revlog.REVIDX_EXTSTORED: + text = rlog.revision(node, raw=True) + p = _lfsmod.pointer.deserialize(text) + oid = p.oid() + if not store.has(oid): + pointers.append(p) + if len(pointers) > 0: + self.repo.svfs.lfsremoteblobstore.readbatch(pointers, store) + assert all(store.has(p.oid()) for p in pointers) + + def logstacktrace(self): + import traceback + self.ui.log('remotefilelog', 'excess remotefilelog fetching:\n%s\n', + ''.join(traceback.format_stack())) diff --git a/hgext/remotefilelog/historypack.py b/hgext/remotefilelog/historypack.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/historypack.py @@ -0,0 +1,545 @@ +from __future__ import absolute_import + +import hashlib +import struct + +from mercurial.node import hex, nullid +from mercurial import ( + pycompat, + util, +) +from . import ( + basepack, + constants, + shallowutil, +) + +# (filename hash, offset, size) +INDEXFORMAT0 = '!20sQQ' +INDEXENTRYLENGTH0 = struct.calcsize(INDEXFORMAT0) +INDEXFORMAT1 = '!20sQQII' +INDEXENTRYLENGTH1 = struct.calcsize(INDEXFORMAT1) +NODELENGTH = 20 + +NODEINDEXFORMAT = '!20sQ' +NODEINDEXENTRYLENGTH = struct.calcsize(NODEINDEXFORMAT) + +# (node, p1, p2, linknode) +PACKFORMAT = "!20s20s20s20sH" +PACKENTRYLENGTH = 82 + +ENTRYCOUNTSIZE = 4 + +INDEXSUFFIX = '.histidx' +PACKSUFFIX = '.histpack' + +ANC_NODE = 0 +ANC_P1NODE = 1 +ANC_P2NODE = 2 +ANC_LINKNODE = 3 +ANC_COPYFROM = 4 + +class historypackstore(basepack.basepackstore): + INDEXSUFFIX = INDEXSUFFIX + PACKSUFFIX = PACKSUFFIX + + def getpack(self, path): + return historypack(path) + + def getancestors(self, name, node, known=None): + for pack in self.packs: + try: + return pack.getancestors(name, node, known=known) + except KeyError: + pass + + for pack in self.refresh(): + try: + return pack.getancestors(name, node, known=known) + except KeyError: + pass + + raise KeyError((name, node)) + + def getnodeinfo(self, name, node): + for pack in self.packs: + try: + return pack.getnodeinfo(name, node) + except KeyError: + pass + + for pack in self.refresh(): + try: + return pack.getnodeinfo(name, node) + except KeyError: + pass + + raise KeyError((name, node)) + + def add(self, filename, node, p1, p2, linknode, copyfrom): + raise RuntimeError("cannot add to historypackstore (%s:%s)" + % (filename, hex(node))) + +class historypack(basepack.basepack): + INDEXSUFFIX = INDEXSUFFIX + PACKSUFFIX = PACKSUFFIX + + SUPPORTED_VERSIONS = [0, 1] + + def __init__(self, path): + super(historypack, self).__init__(path) + + if self.VERSION == 0: + self.INDEXFORMAT = INDEXFORMAT0 + self.INDEXENTRYLENGTH = INDEXENTRYLENGTH0 + else: + self.INDEXFORMAT = INDEXFORMAT1 + self.INDEXENTRYLENGTH = INDEXENTRYLENGTH1 + + def getmissing(self, keys): + missing = [] + for name, node in keys: + try: + self._findnode(name, node) + except KeyError: + missing.append((name, node)) + + return missing + + def getancestors(self, name, node, known=None): + """Returns as many ancestors as we're aware of. + + return value: { + node: (p1, p2, linknode, copyfrom), + ... + } + """ + if known and node in known: + return [] + + ancestors = self._getancestors(name, node, known=known) + results = {} + for ancnode, p1, p2, linknode, copyfrom in ancestors: + results[ancnode] = (p1, p2, linknode, copyfrom) + + if not results: + raise KeyError((name, node)) + return results + + def getnodeinfo(self, name, node): + # Drop the node from the tuple before returning, since the result should + # just be (p1, p2, linknode, copyfrom) + return self._findnode(name, node)[1:] + + def _getancestors(self, name, node, known=None): + if known is None: + known = set() + section = self._findsection(name) + filename, offset, size, nodeindexoffset, nodeindexsize = section + pending = set((node,)) + o = 0 + while o < size: + if not pending: + break + entry, copyfrom = self._readentry(offset + o) + o += PACKENTRYLENGTH + if copyfrom: + o += len(copyfrom) + + ancnode = entry[ANC_NODE] + if ancnode in pending: + pending.remove(ancnode) + p1node = entry[ANC_P1NODE] + p2node = entry[ANC_P2NODE] + if p1node != nullid and p1node not in known: + pending.add(p1node) + if p2node != nullid and p2node not in known: + pending.add(p2node) + + yield (ancnode, p1node, p2node, entry[ANC_LINKNODE], copyfrom) + + def _readentry(self, offset): + data = self._data + entry = struct.unpack(PACKFORMAT, data[offset:offset + PACKENTRYLENGTH]) + copyfrom = None + copyfromlen = entry[ANC_COPYFROM] + if copyfromlen != 0: + offset += PACKENTRYLENGTH + copyfrom = data[offset:offset + copyfromlen] + return entry, copyfrom + + def add(self, filename, node, p1, p2, linknode, copyfrom): + raise RuntimeError("cannot add to historypack (%s:%s)" % + (filename, hex(node))) + + def _findnode(self, name, node): + if self.VERSION == 0: + ancestors = self._getancestors(name, node) + for ancnode, p1node, p2node, linknode, copyfrom in ancestors: + if ancnode == node: + return (ancnode, p1node, p2node, linknode, copyfrom) + else: + section = self._findsection(name) + nodeindexoffset, nodeindexsize = section[3:] + entry = self._bisect(node, nodeindexoffset, + nodeindexoffset + nodeindexsize, + NODEINDEXENTRYLENGTH) + if entry is not None: + node, offset = struct.unpack(NODEINDEXFORMAT, entry) + entry, copyfrom = self._readentry(offset) + # Drop the copyfromlen from the end of entry, and replace it + # with the copyfrom string. + return entry[:4] + (copyfrom,) + + raise KeyError("unable to find history for %s:%s" % (name, hex(node))) + + def _findsection(self, name): + params = self.params + namehash = hashlib.sha1(name).digest() + fanoutkey = struct.unpack(params.fanoutstruct, + namehash[:params.fanoutprefix])[0] + fanout = self._fanouttable + + start = fanout[fanoutkey] + params.indexstart + indexend = self._indexend + + for i in pycompat.xrange(fanoutkey + 1, params.fanoutcount): + end = fanout[i] + params.indexstart + if end != start: + break + else: + end = indexend + + entry = self._bisect(namehash, start, end, self.INDEXENTRYLENGTH) + if not entry: + raise KeyError(name) + + rawentry = struct.unpack(self.INDEXFORMAT, entry) + if self.VERSION == 0: + x, offset, size = rawentry + nodeindexoffset = None + nodeindexsize = None + else: + x, offset, size, nodeindexoffset, nodeindexsize = rawentry + rawnamelen = self._index[nodeindexoffset:nodeindexoffset + + constants.FILENAMESIZE] + actualnamelen = struct.unpack('!H', rawnamelen)[0] + nodeindexoffset += constants.FILENAMESIZE + actualname = self._index[nodeindexoffset:nodeindexoffset + + actualnamelen] + if actualname != name: + raise KeyError("found file name %s when looking for %s" % + (actualname, name)) + nodeindexoffset += actualnamelen + + filenamelength = struct.unpack('!H', self._data[offset:offset + + constants.FILENAMESIZE])[0] + offset += constants.FILENAMESIZE + + actualname = self._data[offset:offset + filenamelength] + offset += filenamelength + + if name != actualname: + raise KeyError("found file name %s when looking for %s" % + (actualname, name)) + + # Skip entry list size + offset += ENTRYCOUNTSIZE + + nodelistoffset = offset + nodelistsize = (size - constants.FILENAMESIZE - filenamelength - + ENTRYCOUNTSIZE) + return (name, nodelistoffset, nodelistsize, + nodeindexoffset, nodeindexsize) + + def _bisect(self, node, start, end, entrylen): + # Bisect between start and end to find node + origstart = start + startnode = self._index[start:start + NODELENGTH] + endnode = self._index[end:end + NODELENGTH] + + if startnode == node: + return self._index[start:start + entrylen] + elif endnode == node: + return self._index[end:end + entrylen] + else: + while start < end - entrylen: + mid = start + (end - start) / 2 + mid = mid - ((mid - origstart) % entrylen) + midnode = self._index[mid:mid + NODELENGTH] + if midnode == node: + return self._index[mid:mid + entrylen] + if node > midnode: + start = mid + startnode = midnode + elif node < midnode: + end = mid + endnode = midnode + return None + + def markledger(self, ledger, options=None): + for filename, node in self: + ledger.markhistoryentry(self, filename, node) + + def cleanup(self, ledger): + entries = ledger.sources.get(self, []) + allkeys = set(self) + repackedkeys = set((e.filename, e.node) for e in entries if + e.historyrepacked) + + if len(allkeys - repackedkeys) == 0: + if self.path not in ledger.created: + util.unlinkpath(self.indexpath, ignoremissing=True) + util.unlinkpath(self.packpath, ignoremissing=True) + + def __iter__(self): + for f, n, x, x, x, x in self.iterentries(): + yield f, n + + def iterentries(self): + # Start at 1 to skip the header + offset = 1 + while offset < self.datasize: + data = self._data + # <2 byte len> + + filenamelen = struct.unpack('!H', data[offset:offset + + constants.FILENAMESIZE])[0] + offset += constants.FILENAMESIZE + filename = data[offset:offset + filenamelen] + offset += filenamelen + + revcount = struct.unpack('!I', data[offset:offset + + ENTRYCOUNTSIZE])[0] + offset += ENTRYCOUNTSIZE + + for i in pycompat.xrange(revcount): + entry = struct.unpack(PACKFORMAT, data[offset:offset + + PACKENTRYLENGTH]) + offset += PACKENTRYLENGTH + + copyfrom = data[offset:offset + entry[ANC_COPYFROM]] + offset += entry[ANC_COPYFROM] + + yield (filename, entry[ANC_NODE], entry[ANC_P1NODE], + entry[ANC_P2NODE], entry[ANC_LINKNODE], copyfrom) + + self._pagedin += PACKENTRYLENGTH + + # If we've read a lot of data from the mmap, free some memory. + self.freememory() + +class mutablehistorypack(basepack.mutablebasepack): + """A class for constructing and serializing a histpack file and index. + + A history pack is a pair of files that contain the revision history for + various file revisions in Mercurial. It contains only revision history (like + parent pointers and linknodes), not any revision content information. + + It consists of two files, with the following format: + + .histpack + The pack itself is a series of file revisions with some basic header + information on each. + + datapack = + [,...] + filesection = + + + [,...] + revision = + + + + + + + The revisions within each filesection are stored in topological order + (newest first). If a given entry has a parent from another file (a copy) + then p1node is the node from the other file, and copyfrom is the + filepath of the other file. + + .histidx + The index file provides a mapping from filename to the file section in + the histpack. In V1 it also contains sub-indexes for specific nodes + within each file. It consists of three parts, the fanout, the file index + and the node indexes. + + The file index is a list of index entries, sorted by filename hash (one + per file section in the pack). Each entry has: + + - node (The 20 byte hash of the filename) + - pack entry offset (The location of this file section in the histpack) + - pack content size (The on-disk length of this file section's pack + data) + - node index offset (The location of the file's node index in the index + file) [1] + - node index size (the on-disk length of this file's node index) [1] + + The fanout is a quick lookup table to reduce the number of steps for + bisecting the index. It is a series of 4 byte pointers to positions + within the index. It has 2^16 entries, which corresponds to hash + prefixes [00, 01, 02,..., FD, FE, FF]. Example: the pointer in slot 4F + points to the index position of the first revision whose node starts + with 4F. This saves log(2^16) bisect steps. + + dataidx = + [1] + + [1] + [,...] [1] + fanouttable = [,...] (2^16 entries) + + fileindex = [,...] + fileindexentry = + + + [1] + [1] + nodeindex = [,...] [1] + filename = [1] + nodeindexentry = [1] + [1] + + [1]: new in version 1. + """ + INDEXSUFFIX = INDEXSUFFIX + PACKSUFFIX = PACKSUFFIX + + SUPPORTED_VERSIONS = [0, 1] + + def __init__(self, ui, packpath, version=0): + # internal config: remotefilelog.historypackv1 + if version == 0 and ui.configbool('remotefilelog', 'historypackv1'): + version = 1 + + super(mutablehistorypack, self).__init__(ui, packpath, version=version) + self.files = {} + self.entrylocations = {} + self.fileentries = {} + + if version == 0: + self.INDEXFORMAT = INDEXFORMAT0 + self.INDEXENTRYLENGTH = INDEXENTRYLENGTH0 + else: + self.INDEXFORMAT = INDEXFORMAT1 + self.INDEXENTRYLENGTH = INDEXENTRYLENGTH1 + + self.NODEINDEXFORMAT = NODEINDEXFORMAT + self.NODEINDEXENTRYLENGTH = NODEINDEXENTRYLENGTH + + def add(self, filename, node, p1, p2, linknode, copyfrom): + copyfrom = copyfrom or '' + copyfromlen = struct.pack('!H', len(copyfrom)) + self.fileentries.setdefault(filename, []).append((node, p1, p2, + linknode, + copyfromlen, + copyfrom)) + + def _write(self): + for filename in sorted(self.fileentries): + entries = self.fileentries[filename] + sectionstart = self.packfp.tell() + + # Write the file section content + entrymap = dict((e[0], e) for e in entries) + def parentfunc(node): + x, p1, p2, x, x, x = entrymap[node] + parents = [] + if p1 != nullid: + parents.append(p1) + if p2 != nullid: + parents.append(p2) + return parents + + sortednodes = list(reversed(shallowutil.sortnodes( + (e[0] for e in entries), + parentfunc))) + + # Write the file section header + self.writeraw("%s%s%s" % ( + struct.pack('!H', len(filename)), + filename, + struct.pack('!I', len(sortednodes)), + )) + + sectionlen = constants.FILENAMESIZE + len(filename) + 4 + + rawstrings = [] + + # Record the node locations for the index + locations = self.entrylocations.setdefault(filename, {}) + offset = sectionstart + sectionlen + for node in sortednodes: + locations[node] = offset + raw = '%s%s%s%s%s%s' % entrymap[node] + rawstrings.append(raw) + offset += len(raw) + + rawdata = ''.join(rawstrings) + sectionlen += len(rawdata) + + self.writeraw(rawdata) + + # Record metadata for the index + self.files[filename] = (sectionstart, sectionlen) + node = hashlib.sha1(filename).digest() + self.entries[node] = node + + def close(self, ledger=None): + if self._closed: + return + + self._write() + + return super(mutablehistorypack, self).close(ledger=ledger) + + def createindex(self, nodelocations, indexoffset): + fileindexformat = self.INDEXFORMAT + fileindexlength = self.INDEXENTRYLENGTH + nodeindexformat = self.NODEINDEXFORMAT + nodeindexlength = self.NODEINDEXENTRYLENGTH + version = self.VERSION + + files = ((hashlib.sha1(filename).digest(), filename, offset, size) + for filename, (offset, size) in self.files.iteritems()) + files = sorted(files) + + # node index is after file index size, file index, and node index size + indexlensize = struct.calcsize('!Q') + nodeindexoffset = (indexoffset + indexlensize + + (len(files) * fileindexlength) + indexlensize) + + fileindexentries = [] + nodeindexentries = [] + nodecount = 0 + for namehash, filename, offset, size in files: + # File section index + if version == 0: + rawentry = struct.pack(fileindexformat, namehash, offset, size) + else: + nodelocations = self.entrylocations[filename] + + nodeindexsize = len(nodelocations) * nodeindexlength + + rawentry = struct.pack(fileindexformat, namehash, offset, size, + nodeindexoffset, nodeindexsize) + # Node index + nodeindexentries.append(struct.pack(constants.FILENAMESTRUCT, + len(filename)) + filename) + nodeindexoffset += constants.FILENAMESIZE + len(filename) + + for node, location in sorted(nodelocations.iteritems()): + nodeindexentries.append(struct.pack(nodeindexformat, node, + location)) + nodecount += 1 + + nodeindexoffset += len(nodelocations) * nodeindexlength + + fileindexentries.append(rawentry) + + nodecountraw = '' + if version == 1: + nodecountraw = struct.pack('!Q', nodecount) + return (''.join(fileindexentries) + nodecountraw + + ''.join(nodeindexentries)) diff --git a/hgext/remotefilelog/lz4wrapper.py b/hgext/remotefilelog/lz4wrapper.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/lz4wrapper.py @@ -0,0 +1,37 @@ +from __future__ import absolute_import + +from mercurial.i18n import _ +from mercurial import ( + demandimport, + error, + util, +) +if util.safehasattr(demandimport, 'IGNORES'): + # Since 670eb4fa1b86 + demandimport.IGNORES.update(['pkgutil', 'pkg_resources', '__main__']) +else: + demandimport.ignore.extend(['pkgutil', 'pkg_resources', '__main__']) + +def missing(*args, **kwargs): + raise error.Abort(_('remotefilelog extension requires lz4 support')) + +lz4compress = lzcompresshc = lz4decompress = missing + +with demandimport.deactivated(): + import lz4 + + try: + # newer python-lz4 has these functions deprecated as top-level ones, + # so we are trying to import from lz4.block first + def _compressHC(*args, **kwargs): + return lz4.block.compress(*args, mode='high_compression', **kwargs) + lzcompresshc = _compressHC + lz4compress = lz4.block.compress + lz4decompress = lz4.block.decompress + except AttributeError: + try: + lzcompresshc = lz4.compressHC + lz4compress = lz4.compress + lz4decompress = lz4.decompress + except AttributeError: + pass diff --git a/hgext/remotefilelog/metadatastore.py b/hgext/remotefilelog/metadatastore.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/metadatastore.py @@ -0,0 +1,156 @@ +from __future__ import absolute_import + +from mercurial.node import hex, nullid +from . import ( + basestore, + shallowutil, +) + +class unionmetadatastore(basestore.baseunionstore): + def __init__(self, *args, **kwargs): + super(unionmetadatastore, self).__init__(*args, **kwargs) + + self.stores = args + self.writestore = kwargs.get('writestore') + + # If allowincomplete==True then the union store can return partial + # ancestor lists, otherwise it will throw a KeyError if a full + # history can't be found. + self.allowincomplete = kwargs.get('allowincomplete', False) + + def getancestors(self, name, node, known=None): + """Returns as many ancestors as we're aware of. + + return value: { + node: (p1, p2, linknode, copyfrom), + ... + } + """ + if known is None: + known = set() + if node in known: + return [] + + ancestors = {} + def traverse(curname, curnode): + # TODO: this algorithm has the potential to traverse parts of + # history twice. Ex: with A->B->C->F and A->B->D->F, both D and C + # may be queued as missing, then B and A are traversed for both. + queue = [(curname, curnode)] + missing = [] + seen = set() + while queue: + name, node = queue.pop() + if (name, node) in seen: + continue + seen.add((name, node)) + value = ancestors.get(node) + if not value: + missing.append((name, node)) + continue + p1, p2, linknode, copyfrom = value + if p1 != nullid and p1 not in known: + queue.append((copyfrom or curname, p1)) + if p2 != nullid and p2 not in known: + queue.append((curname, p2)) + return missing + + missing = [(name, node)] + while missing: + curname, curnode = missing.pop() + try: + ancestors.update(self._getpartialancestors(curname, curnode, + known=known)) + newmissing = traverse(curname, curnode) + missing.extend(newmissing) + except KeyError: + # If we allow incomplete histories, don't throw. + if not self.allowincomplete: + raise + # If the requested name+node doesn't exist, always throw. + if (curname, curnode) == (name, node): + raise + + # TODO: ancestors should probably be (name, node) -> (value) + return ancestors + + @basestore.baseunionstore.retriable + def _getpartialancestors(self, name, node, known=None): + for store in self.stores: + try: + return store.getancestors(name, node, known=known) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + @basestore.baseunionstore.retriable + def getnodeinfo(self, name, node): + for store in self.stores: + try: + return store.getnodeinfo(name, node) + except KeyError: + pass + + raise KeyError((name, hex(node))) + + def add(self, name, node, data): + raise RuntimeError("cannot add content only to remotefilelog " + "contentstore") + + def getmissing(self, keys): + missing = keys + for store in self.stores: + if missing: + missing = store.getmissing(missing) + return missing + + def markledger(self, ledger, options=None): + for store in self.stores: + store.markledger(ledger, options) + + def getmetrics(self): + metrics = [s.getmetrics() for s in self.stores] + return shallowutil.sumdicts(*metrics) + +class remotefilelogmetadatastore(basestore.basestore): + def getancestors(self, name, node, known=None): + """Returns as many ancestors as we're aware of. + + return value: { + node: (p1, p2, linknode, copyfrom), + ... + } + """ + data = self._getdata(name, node) + ancestors = shallowutil.ancestormap(data) + return ancestors + + def getnodeinfo(self, name, node): + return self.getancestors(name, node)[node] + + def add(self, name, node, parents, linknode): + raise RuntimeError("cannot add metadata only to remotefilelog " + "metadatastore") + +class remotemetadatastore(object): + def __init__(self, ui, fileservice, shared): + self._fileservice = fileservice + self._shared = shared + + def getancestors(self, name, node, known=None): + self._fileservice.prefetch([(name, hex(node))], force=True, + fetchdata=False, fetchhistory=True) + return self._shared.getancestors(name, node, known=known) + + def getnodeinfo(self, name, node): + return self.getancestors(name, node)[node] + + def add(self, name, node, data): + raise RuntimeError("cannot add to a remote store") + + def getmissing(self, keys): + return keys + + def markledger(self, ledger, options=None): + pass diff --git a/hgext/remotefilelog/remotefilectx.py b/hgext/remotefilelog/remotefilectx.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/remotefilectx.py @@ -0,0 +1,490 @@ +# remotefilectx.py - filectx/workingfilectx implementations for remotefilelog +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import collections +import time + +from mercurial.node import bin, hex, nullid, nullrev +from mercurial import ( + ancestor, + context, + error, + phases, + util, +) +from . import shallowutil + +propertycache = util.propertycache +FASTLOG_TIMEOUT_IN_SECS = 0.5 + +class remotefilectx(context.filectx): + def __init__(self, repo, path, changeid=None, fileid=None, + filelog=None, changectx=None, ancestormap=None): + if fileid == nullrev: + fileid = nullid + if fileid and len(fileid) == 40: + fileid = bin(fileid) + super(remotefilectx, self).__init__(repo, path, changeid, + fileid, filelog, changectx) + self._ancestormap = ancestormap + + def size(self): + return self._filelog.size(self._filenode) + + @propertycache + def _changeid(self): + if '_changeid' in self.__dict__: + return self._changeid + elif '_changectx' in self.__dict__: + return self._changectx.rev() + elif '_descendantrev' in self.__dict__: + # this file context was created from a revision with a known + # descendant, we can (lazily) correct for linkrev aliases + linknode = self._adjustlinknode(self._path, self._filelog, + self._filenode, self._descendantrev) + return self._repo.unfiltered().changelog.rev(linknode) + else: + return self.linkrev() + + def filectx(self, fileid, changeid=None): + '''opens an arbitrary revision of the file without + opening a new filelog''' + return remotefilectx(self._repo, self._path, fileid=fileid, + filelog=self._filelog, changeid=changeid) + + def linkrev(self): + return self._linkrev + + @propertycache + def _linkrev(self): + if self._filenode == nullid: + return nullrev + + ancestormap = self.ancestormap() + p1, p2, linknode, copyfrom = ancestormap[self._filenode] + rev = self._repo.changelog.nodemap.get(linknode) + if rev is not None: + return rev + + # Search all commits for the appropriate linkrev (slow, but uncommon) + path = self._path + fileid = self._filenode + cl = self._repo.unfiltered().changelog + mfl = self._repo.manifestlog + + for rev in range(len(cl) - 1, 0, -1): + node = cl.node(rev) + data = cl.read(node) # get changeset data (we avoid object creation) + if path in data[3]: # checking the 'files' field. + # The file has been touched, check if the hash is what we're + # looking for. + if fileid == mfl[data[0]].readfast().get(path): + return rev + + # Couldn't find the linkrev. This should generally not happen, and will + # likely cause a crash. + return None + + def introrev(self): + """return the rev of the changeset which introduced this file revision + + This method is different from linkrev because it take into account the + changeset the filectx was created from. It ensures the returned + revision is one of its ancestors. This prevents bugs from + 'linkrev-shadowing' when a file revision is used by multiple + changesets. + """ + lkr = self.linkrev() + attrs = vars(self) + noctx = not ('_changeid' in attrs or '_changectx' in attrs) + if noctx or self.rev() == lkr: + return lkr + linknode = self._adjustlinknode(self._path, self._filelog, + self._filenode, self.rev(), + inclusive=True) + return self._repo.changelog.rev(linknode) + + def renamed(self): + """check if file was actually renamed in this changeset revision + + If rename logged in file revision, we report copy for changeset only + if file revisions linkrev points back to the changeset in question + or both changeset parents contain different file revisions. + """ + ancestormap = self.ancestormap() + + p1, p2, linknode, copyfrom = ancestormap[self._filenode] + if not copyfrom: + return None + + renamed = (copyfrom, p1) + if self.rev() == self.linkrev(): + return renamed + + name = self.path() + fnode = self._filenode + for p in self._changectx.parents(): + try: + if fnode == p.filenode(name): + return None + except error.LookupError: + pass + return renamed + + def ancestormap(self): + if not self._ancestormap: + self._ancestormap = self.filelog().ancestormap(self._filenode) + + return self._ancestormap + + def parents(self): + repo = self._repo + ancestormap = self.ancestormap() + + p1, p2, linknode, copyfrom = ancestormap[self._filenode] + results = [] + if p1 != nullid: + path = copyfrom or self._path + flog = repo.file(path) + p1ctx = remotefilectx(repo, path, fileid=p1, filelog=flog, + ancestormap=ancestormap) + p1ctx._descendantrev = self.rev() + results.append(p1ctx) + + if p2 != nullid: + path = self._path + flog = repo.file(path) + p2ctx = remotefilectx(repo, path, fileid=p2, filelog=flog, + ancestormap=ancestormap) + p2ctx._descendantrev = self.rev() + results.append(p2ctx) + + return results + + def _nodefromancrev(self, ancrev, cl, mfl, path, fnode): + """returns the node for in if content matches """ + ancctx = cl.read(ancrev) # This avoids object creation. + manifestnode, files = ancctx[0], ancctx[3] + # If the file was touched in this ancestor, and the content is similar + # to the one we are searching for. + if path in files and fnode == mfl[manifestnode].readfast().get(path): + return cl.node(ancrev) + return None + + def _adjustlinknode(self, path, filelog, fnode, srcrev, inclusive=False): + """return the first ancestor of introducing + + If the linkrev of the file revision does not point to an ancestor of + srcrev, we'll walk down the ancestors until we find one introducing + this file revision. + + :repo: a localrepository object (used to access changelog and manifest) + :path: the file path + :fnode: the nodeid of the file revision + :filelog: the filelog of this path + :srcrev: the changeset revision we search ancestors from + :inclusive: if true, the src revision will also be checked + + Note: This is based on adjustlinkrev in core, but it's quite different. + + adjustlinkrev depends on the fact that the linkrev is the bottom most + node, and uses that as a stopping point for the ancestor traversal. We + can't do that here because the linknode is not guaranteed to be the + bottom most one. + + In our code here, we actually know what a bunch of potential ancestor + linknodes are, so instead of stopping the cheap-ancestor-traversal when + we get to a linkrev, we stop when we see any of the known linknodes. + """ + repo = self._repo + cl = repo.unfiltered().changelog + mfl = repo.manifestlog + ancestormap = self.ancestormap() + linknode = ancestormap[fnode][2] + + if srcrev is None: + # wctx case, used by workingfilectx during mergecopy + revs = [p.rev() for p in self._repo[None].parents()] + inclusive = True # we skipped the real (revless) source + else: + revs = [srcrev] + + if self._verifylinknode(revs, linknode): + return linknode + + commonlogkwargs = { + 'revs': ' '.join([hex(cl.node(rev)) for rev in revs]), + 'fnode': hex(fnode), + 'filepath': path, + 'user': shallowutil.getusername(repo.ui), + 'reponame': shallowutil.getreponame(repo.ui), + } + + repo.ui.log('linkrevfixup', 'adjusting linknode', **commonlogkwargs) + + pc = repo._phasecache + seenpublic = False + iteranc = cl.ancestors(revs, inclusive=inclusive) + for ancrev in iteranc: + # First, check locally-available history. + lnode = self._nodefromancrev(ancrev, cl, mfl, path, fnode) + if lnode is not None: + return lnode + + # adjusting linknode can be super-slow. To mitigate the issue + # we use two heuristics: calling fastlog and forcing remotefilelog + # prefetch + if not seenpublic and pc.phase(repo, ancrev) == phases.public: + # TODO: there used to be a codepath to fetch linknodes + # from a server as a fast path, but it appeared to + # depend on an API FB added to their phabricator. + lnode = self._forceprefetch(repo, path, fnode, revs, + commonlogkwargs) + if lnode: + return lnode + seenpublic = True + + return linknode + + def _forceprefetch(self, repo, path, fnode, revs, + commonlogkwargs): + # This next part is super non-obvious, so big comment block time! + # + # It is possible to get extremely bad performance here when a fairly + # common set of circumstances occur when this extension is combined + # with a server-side commit rewriting extension like pushrebase. + # + # First, an engineer creates Commit A and pushes it to the server. + # While the server's data structure will have the correct linkrev + # for the files touched in Commit A, the client will have the + # linkrev of the local commit, which is "invalid" because it's not + # an ancestor of the main line of development. + # + # The client will never download the remotefilelog with the correct + # linkrev as long as nobody else touches that file, since the file + # data and history hasn't changed since Commit A. + # + # After a long time (or a short time in a heavily used repo), if the + # same engineer returns to change the same file, some commands -- + # such as amends of commits with file moves, logs, diffs, etc -- + # can trigger this _adjustlinknode code. In those cases, finding + # the correct rev can become quite expensive, as the correct + # revision is far back in history and we need to walk back through + # history to find it. + # + # In order to improve this situation, we force a prefetch of the + # remotefilelog data blob for the file we were called on. We do this + # at most once, when we first see a public commit in the history we + # are traversing. + # + # Forcing the prefetch means we will download the remote blob even + # if we have the "correct" blob in the local store. Since the union + # store checks the remote store first, this means we are much more + # likely to get the correct linkrev at this point. + # + # In rare circumstances (such as the server having a suboptimal + # linkrev for our use case), we will fall back to the old slow path. + # + # We may want to add additional heuristics here in the future if + # the slow path is used too much. One promising possibility is using + # obsolescence markers to find a more-likely-correct linkrev. + + logmsg = '' + start = time.time() + try: + repo.fileservice.prefetch([(path, hex(fnode))], force=True) + + # Now that we've downloaded a new blob from the server, + # we need to rebuild the ancestor map to recompute the + # linknodes. + self._ancestormap = None + linknode = self.ancestormap()[fnode][2] # 2 is linknode + if self._verifylinknode(revs, linknode): + logmsg = 'remotefilelog prefetching succeeded' + return linknode + logmsg = 'remotefilelog prefetching not found' + return None + except Exception as e: + logmsg = 'remotefilelog prefetching failed (%s)' % e + return None + finally: + elapsed = time.time() - start + repo.ui.log('linkrevfixup', logmsg, elapsed=elapsed * 1000, + **commonlogkwargs) + + def _verifylinknode(self, revs, linknode): + """ + Check if a linknode is correct one for the current history. + + That is, return True if the linkrev is the ancestor of any of the + passed in revs, otherwise return False. + + `revs` is a list that usually has one element -- usually the wdir parent + or the user-passed rev we're looking back from. It may contain two revs + when there is a merge going on, or zero revs when a root node with no + parents is being created. + """ + if not revs: + return False + try: + # Use the C fastpath to check if the given linknode is correct. + cl = self._repo.unfiltered().changelog + return any(cl.isancestor(linknode, cl.node(r)) for r in revs) + except error.LookupError: + # The linknode read from the blob may have been stripped or + # otherwise not present in the repository anymore. Do not fail hard + # in this case. Instead, return false and continue the search for + # the correct linknode. + return False + + def ancestors(self, followfirst=False): + ancestors = [] + queue = collections.deque((self,)) + seen = set() + while queue: + current = queue.pop() + if current.filenode() in seen: + continue + seen.add(current.filenode()) + + ancestors.append(current) + + parents = current.parents() + first = True + for p in parents: + if first or not followfirst: + queue.append(p) + first = False + + # Remove self + ancestors.pop(0) + + # Sort by linkrev + # The copy tracing algorithm depends on these coming out in order + ancestors = sorted(ancestors, reverse=True, key=lambda x:x.linkrev()) + + for ancestor in ancestors: + yield ancestor + + def ancestor(self, fc2, actx): + # the easy case: no (relevant) renames + if fc2.path() == self.path() and self.path() in actx: + return actx[self.path()] + + # the next easiest cases: unambiguous predecessor (name trumps + # history) + if self.path() in actx and fc2.path() not in actx: + return actx[self.path()] + if fc2.path() in actx and self.path() not in actx: + return actx[fc2.path()] + + # do a full traversal + amap = self.ancestormap() + bmap = fc2.ancestormap() + + def parents(x): + f, n = x + p = amap.get(n) or bmap.get(n) + if not p: + return [] + + return [(p[3] or f, p[0]), (f, p[1])] + + a = (self.path(), self.filenode()) + b = (fc2.path(), fc2.filenode()) + result = ancestor.genericancestor(a, b, parents) + if result: + f, n = result + r = remotefilectx(self._repo, f, fileid=n, + ancestormap=amap) + return r + + return None + + def annotate(self, *args, **kwargs): + introctx = self + prefetchskip = kwargs.pop('prefetchskip', None) + if prefetchskip: + # use introrev so prefetchskip can be accurately tested + introrev = self.introrev() + if self.rev() != introrev: + introctx = remotefilectx(self._repo, self._path, + changeid=introrev, + fileid=self._filenode, + filelog=self._filelog, + ancestormap=self._ancestormap) + + # like self.ancestors, but append to "fetch" and skip visiting parents + # of nodes in "prefetchskip". + fetch = [] + seen = set() + queue = collections.deque((introctx,)) + seen.add(introctx.node()) + while queue: + current = queue.pop() + if current.filenode() != self.filenode(): + # this is a "joint point". fastannotate needs contents of + # "joint point"s to calculate diffs for side branches. + fetch.append((current.path(), hex(current.filenode()))) + if prefetchskip and current in prefetchskip: + continue + for parent in current.parents(): + if parent.node() not in seen: + seen.add(parent.node()) + queue.append(parent) + + self._repo.ui.debug('remotefilelog: prefetching %d files ' + 'for annotate\n' % len(fetch)) + if fetch: + self._repo.fileservice.prefetch(fetch) + return super(remotefilectx, self).annotate(*args, **kwargs) + + # Return empty set so that the hg serve and thg don't stack trace + def children(self): + return [] + +class remoteworkingfilectx(context.workingfilectx, remotefilectx): + def __init__(self, repo, path, filelog=None, workingctx=None): + self._ancestormap = None + return super(remoteworkingfilectx, self).__init__(repo, path, + filelog, workingctx) + + def parents(self): + return remotefilectx.parents(self) + + def ancestormap(self): + if not self._ancestormap: + path = self._path + pcl = self._changectx._parents + renamed = self.renamed() + + if renamed: + p1 = renamed + else: + p1 = (path, pcl[0]._manifest.get(path, nullid)) + + p2 = (path, nullid) + if len(pcl) > 1: + p2 = (path, pcl[1]._manifest.get(path, nullid)) + + m = {} + if p1[1] != nullid: + p1ctx = self._repo.filectx(p1[0], fileid=p1[1]) + m.update(p1ctx.filelog().ancestormap(p1[1])) + + if p2[1] != nullid: + p2ctx = self._repo.filectx(p2[0], fileid=p2[1]) + m.update(p2ctx.filelog().ancestormap(p2[1])) + + copyfrom = '' + if renamed: + copyfrom = renamed[0] + m[None] = (p1[1], p2[1], nullid, copyfrom) + self._ancestormap = m + + return self._ancestormap diff --git a/hgext/remotefilelog/remotefilelog.py b/hgext/remotefilelog/remotefilelog.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/remotefilelog.py @@ -0,0 +1,481 @@ +# remotefilelog.py - filelog implementation where filelog history is stored +# remotely +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import collections +import os + +from mercurial.node import bin, nullid +from mercurial.i18n import _ +from mercurial import ( + ancestor, + error, + mdiff, + revlog, +) +from mercurial.utils import storageutil + +from . import ( + constants, + fileserverclient, + shallowutil, +) + +class remotefilelognodemap(object): + def __init__(self, filename, store): + self._filename = filename + self._store = store + + def __contains__(self, node): + missing = self._store.getmissing([(self._filename, node)]) + return not bool(missing) + + def __get__(self, node): + if node not in self: + raise KeyError(node) + return node + +class remotefilelog(object): + + _generaldelta = True + + def __init__(self, opener, path, repo): + self.opener = opener + self.filename = path + self.repo = repo + self.nodemap = remotefilelognodemap(self.filename, repo.contentstore) + + self.version = 1 + + def read(self, node): + """returns the file contents at this node""" + t = self.revision(node) + if not t.startswith('\1\n'): + return t + s = t.index('\1\n', 2) + return t[s + 2:] + + def add(self, text, meta, transaction, linknode, p1=None, p2=None): + hashtext = text + + # hash with the metadata, like in vanilla filelogs + hashtext = shallowutil.createrevlogtext(text, meta.get('copy'), + meta.get('copyrev')) + node = storageutil.hashrevisionsha1(hashtext, p1, p2) + return self.addrevision(hashtext, transaction, linknode, p1, p2, + node=node) + + def _createfileblob(self, text, meta, flags, p1, p2, node, linknode): + # text passed to "_createfileblob" does not include filelog metadata + header = shallowutil.buildfileblobheader(len(text), flags) + data = "%s\0%s" % (header, text) + + realp1 = p1 + copyfrom = "" + if meta and 'copy' in meta: + copyfrom = meta['copy'] + realp1 = bin(meta['copyrev']) + + data += "%s%s%s%s%s\0" % (node, realp1, p2, linknode, copyfrom) + + visited = set() + + pancestors = {} + queue = [] + if realp1 != nullid: + p1flog = self + if copyfrom: + p1flog = remotefilelog(self.opener, copyfrom, self.repo) + + pancestors.update(p1flog.ancestormap(realp1)) + queue.append(realp1) + visited.add(realp1) + if p2 != nullid: + pancestors.update(self.ancestormap(p2)) + queue.append(p2) + visited.add(p2) + + ancestortext = "" + + # add the ancestors in topological order + while queue: + c = queue.pop(0) + pa1, pa2, ancestorlinknode, pacopyfrom = pancestors[c] + + pacopyfrom = pacopyfrom or '' + ancestortext += "%s%s%s%s%s\0" % ( + c, pa1, pa2, ancestorlinknode, pacopyfrom) + + if pa1 != nullid and pa1 not in visited: + queue.append(pa1) + visited.add(pa1) + if pa2 != nullid and pa2 not in visited: + queue.append(pa2) + visited.add(pa2) + + data += ancestortext + + return data + + def addrevision(self, text, transaction, linknode, p1, p2, cachedelta=None, + node=None, flags=revlog.REVIDX_DEFAULT_FLAGS): + # text passed to "addrevision" includes hg filelog metadata header + if node is None: + node = storageutil.hashrevisionsha1(text, p1, p2) + + meta, metaoffset = storageutil.parsemeta(text) + rawtext, validatehash = self._processflags(text, flags, 'write') + return self.addrawrevision(rawtext, transaction, linknode, p1, p2, + node, flags, cachedelta, + _metatuple=(meta, metaoffset)) + + def addrawrevision(self, rawtext, transaction, linknode, p1, p2, node, + flags, cachedelta=None, _metatuple=None): + if _metatuple: + # _metatuple: used by "addrevision" internally by remotefilelog + # meta was parsed confidently + meta, metaoffset = _metatuple + else: + # not from self.addrevision, but something else (repo._filecommit) + # calls addrawrevision directly. remotefilelog needs to get and + # strip filelog metadata. + # we don't have confidence about whether rawtext contains filelog + # metadata or not (flag processor could replace it), so we just + # parse it as best-effort. + # in LFS (flags != 0)'s case, the best way is to call LFS code to + # get the meta information, instead of storageutil.parsemeta. + meta, metaoffset = storageutil.parsemeta(rawtext) + if flags != 0: + # when flags != 0, be conservative and do not mangle rawtext, since + # a read flag processor expects the text not being mangled at all. + metaoffset = 0 + if metaoffset: + # remotefilelog fileblob stores copy metadata in its ancestortext, + # not its main blob. so we need to remove filelog metadata + # (containing copy information) from text. + blobtext = rawtext[metaoffset:] + else: + blobtext = rawtext + data = self._createfileblob(blobtext, meta, flags, p1, p2, node, + linknode) + self.repo.contentstore.addremotefilelognode(self.filename, node, data) + + return node + + def renamed(self, node): + ancestors = self.repo.metadatastore.getancestors(self.filename, node) + p1, p2, linknode, copyfrom = ancestors[node] + if copyfrom: + return (copyfrom, p1) + + return False + + def size(self, node): + """return the size of a given revision""" + return len(self.read(node)) + + rawsize = size + + def cmp(self, node, text): + """compare text with a given file revision + + returns True if text is different than what is stored. + """ + + if node == nullid: + return True + + nodetext = self.read(node) + return nodetext != text + + def __nonzero__(self): + return True + + def __len__(self): + if self.filename == '.hgtags': + # The length of .hgtags is used to fast path tag checking. + # remotefilelog doesn't support .hgtags since the entire .hgtags + # history is needed. Use the excludepattern setting to make + # .hgtags a normal filelog. + return 0 + + raise RuntimeError("len not supported") + + def empty(self): + return False + + def flags(self, node): + if isinstance(node, int): + raise error.ProgrammingError( + 'remotefilelog does not accept integer rev for flags') + store = self.repo.contentstore + return store.getmeta(self.filename, node).get(constants.METAKEYFLAG, 0) + + def parents(self, node): + if node == nullid: + return nullid, nullid + + ancestormap = self.repo.metadatastore.getancestors(self.filename, node) + p1, p2, linknode, copyfrom = ancestormap[node] + if copyfrom: + p1 = nullid + + return p1, p2 + + def parentrevs(self, rev): + # TODO(augie): this is a node and should be a rev, but for now + # nothing in core seems to actually break. + return self.parents(rev) + + def linknode(self, node): + ancestormap = self.repo.metadatastore.getancestors(self.filename, node) + p1, p2, linknode, copyfrom = ancestormap[node] + return linknode + + def linkrev(self, node): + return self.repo.unfiltered().changelog.rev(self.linknode(node)) + + def emitrevisions(self, nodes, nodesorder=None, revisiondata=False, + assumehaveparentrevisions=False, deltaprevious=False, + deltamode=None): + # we don't use any of these parameters here + del nodesorder, revisiondata, assumehaveparentrevisions, deltaprevious + del deltamode + prevnode = None + for node in nodes: + p1, p2 = self.parents(node) + if prevnode is None: + basenode = prevnode = p1 + if basenode == node: + basenode = nullid + if basenode != nullid: + revision = None + delta = self.revdiff(basenode, node) + else: + revision = self.revision(node, raw=True) + delta = None + yield revlog.revlogrevisiondelta( + node=node, + p1node=p1, + p2node=p2, + linknode=self.linknode(node), + basenode=basenode, + flags=self.flags(node), + baserevisionsize=None, + revision=revision, + delta=delta, + ) + + def emitrevisiondeltas(self, requests): + prevnode = None + for request in requests: + node = request.node + p1, p2 = self.parents(node) + if prevnode is None: + prevnode = p1 + if request.basenode is not None: + basenode = request.basenode + else: + basenode = p1 + if basenode == nullid: + revision = self.revision(node, raw=True) + delta = None + else: + revision = None + delta = self.revdiff(basenode, node) + yield revlog.revlogrevisiondelta( + node=node, + p1node=p1, + p2node=p2, + linknode=self.linknode(node), + basenode=basenode, + flags=self.flags(node), + baserevisionsize=None, + revision=revision, + delta=delta, + ) + + def revdiff(self, node1, node2): + return mdiff.textdiff(self.revision(node1, raw=True), + self.revision(node2, raw=True)) + + def lookup(self, node): + if len(node) == 40: + node = bin(node) + if len(node) != 20: + raise error.LookupError(node, self.filename, + _('invalid lookup input')) + + return node + + def rev(self, node): + # This is a hack to make TortoiseHG work. + return node + + def node(self, rev): + # This is a hack. + if isinstance(rev, int): + raise error.ProgrammingError( + 'remotefilelog does not convert integer rev to node') + return rev + + def revision(self, node, raw=False): + """returns the revlog contents at this node. + this includes the meta data traditionally included in file revlogs. + this is generally only used for bundling and communicating with vanilla + hg clients. + """ + if node == nullid: + return "" + if len(node) != 20: + raise error.LookupError(node, self.filename, + _('invalid revision input')) + + store = self.repo.contentstore + rawtext = store.get(self.filename, node) + if raw: + return rawtext + flags = store.getmeta(self.filename, node).get(constants.METAKEYFLAG, 0) + if flags == 0: + return rawtext + text, verifyhash = self._processflags(rawtext, flags, 'read') + return text + + def _processflags(self, text, flags, operation, raw=False): + # mostly copied from hg/mercurial/revlog.py + validatehash = True + orderedflags = revlog.REVIDX_FLAGS_ORDER + if operation == 'write': + orderedflags = reversed(orderedflags) + for flag in orderedflags: + if flag & flags: + vhash = True + if flag not in revlog._flagprocessors: + message = _("missing processor for flag '%#x'") % (flag) + raise revlog.RevlogError(message) + readfunc, writefunc, rawfunc = revlog._flagprocessors[flag] + if raw: + vhash = rawfunc(self, text) + elif operation == 'read': + text, vhash = readfunc(self, text) + elif operation == 'write': + text, vhash = writefunc(self, text) + validatehash = validatehash and vhash + return text, validatehash + + def _read(self, id): + """reads the raw file blob from disk, cache, or server""" + fileservice = self.repo.fileservice + localcache = fileservice.localcache + cachekey = fileserverclient.getcachekey(self.repo.name, self.filename, + id) + try: + return localcache.read(cachekey) + except KeyError: + pass + + localkey = fileserverclient.getlocalkey(self.filename, id) + localpath = os.path.join(self.localpath, localkey) + try: + return shallowutil.readfile(localpath) + except IOError: + pass + + fileservice.prefetch([(self.filename, id)]) + try: + return localcache.read(cachekey) + except KeyError: + pass + + raise error.LookupError(id, self.filename, _('no node')) + + def ancestormap(self, node): + return self.repo.metadatastore.getancestors(self.filename, node) + + def ancestor(self, a, b): + if a == nullid or b == nullid: + return nullid + + revmap, parentfunc = self._buildrevgraph(a, b) + nodemap = dict(((v, k) for (k, v) in revmap.iteritems())) + + ancs = ancestor.ancestors(parentfunc, revmap[a], revmap[b]) + if ancs: + # choose a consistent winner when there's a tie + return min(map(nodemap.__getitem__, ancs)) + return nullid + + def commonancestorsheads(self, a, b): + """calculate all the heads of the common ancestors of nodes a and b""" + + if a == nullid or b == nullid: + return nullid + + revmap, parentfunc = self._buildrevgraph(a, b) + nodemap = dict(((v, k) for (k, v) in revmap.iteritems())) + + ancs = ancestor.commonancestorsheads(parentfunc, revmap[a], revmap[b]) + return map(nodemap.__getitem__, ancs) + + def _buildrevgraph(self, a, b): + """Builds a numeric revision graph for the given two nodes. + Returns a node->rev map and a rev->[revs] parent function. + """ + amap = self.ancestormap(a) + bmap = self.ancestormap(b) + + # Union the two maps + parentsmap = collections.defaultdict(list) + allparents = set() + for mapping in (amap, bmap): + for node, pdata in mapping.iteritems(): + parents = parentsmap[node] + p1, p2, linknode, copyfrom = pdata + # Don't follow renames (copyfrom). + # remotefilectx.ancestor does that. + if p1 != nullid and not copyfrom: + parents.append(p1) + allparents.add(p1) + if p2 != nullid: + parents.append(p2) + allparents.add(p2) + + # Breadth first traversal to build linkrev graph + parentrevs = collections.defaultdict(list) + revmap = {} + queue = collections.deque(((None, n) for n in parentsmap.iterkeys() + if n not in allparents)) + while queue: + prevrev, current = queue.pop() + if current in revmap: + if prevrev: + parentrevs[prevrev].append(revmap[current]) + continue + + # Assign linkrevs in reverse order, so start at + # len(parentsmap) and work backwards. + currentrev = len(parentsmap) - len(revmap) - 1 + revmap[current] = currentrev + + if prevrev: + parentrevs[prevrev].append(currentrev) + + for parent in parentsmap.get(current): + queue.appendleft((currentrev, parent)) + + return revmap, parentrevs.__getitem__ + + def strip(self, minlink, transaction): + pass + + # misc unused things + def files(self): + return [] + + def checksize(self): + return 0, 0 diff --git a/hgext/remotefilelog/remotefilelogserver.py b/hgext/remotefilelog/remotefilelogserver.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/remotefilelogserver.py @@ -0,0 +1,554 @@ +# remotefilelogserver.py - server logic for a remotefilelog server +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import errno +import os +import stat +import time + +from mercurial.i18n import _ +from mercurial.node import bin, hex, nullid, nullrev +from mercurial import ( + ancestor, + changegroup, + changelog, + context, + error, + extensions, + match, + pycompat, + store, + streamclone, + util, + wireprotoserver, + wireprototypes, + wireprotov1server, +) +from . import ( + constants, + lz4wrapper, + shallowrepo, + shallowutil, + wirepack, +) + +_sshv1server = wireprotoserver.sshv1protocolhandler + +def setupserver(ui, repo): + """Sets up a normal Mercurial repo so it can serve files to shallow repos. + """ + onetimesetup(ui) + + # don't send files to shallow clients during pulls + def generatefiles(orig, self, changedfiles, linknodes, commonrevs, source, + *args, **kwargs): + caps = self._bundlecaps or [] + if shallowrepo.requirement in caps: + # only send files that don't match the specified patterns + includepattern = None + excludepattern = None + for cap in (self._bundlecaps or []): + if cap.startswith("includepattern="): + includepattern = cap[len("includepattern="):].split('\0') + elif cap.startswith("excludepattern="): + excludepattern = cap[len("excludepattern="):].split('\0') + + m = match.always(repo.root, '') + if includepattern or excludepattern: + m = match.match(repo.root, '', None, + includepattern, excludepattern) + + changedfiles = list([f for f in changedfiles if not m(f)]) + return orig(self, changedfiles, linknodes, commonrevs, source, + *args, **kwargs) + + extensions.wrapfunction( + changegroup.cgpacker, 'generatefiles', generatefiles) + +onetime = False +def onetimesetup(ui): + """Configures the wireprotocol for both clients and servers. + """ + global onetime + if onetime: + return + onetime = True + + # support file content requests + wireprotov1server.wireprotocommand( + 'getflogheads', 'path', permission='pull')(getflogheads) + wireprotov1server.wireprotocommand( + 'getfiles', '', permission='pull')(getfiles) + wireprotov1server.wireprotocommand( + 'getfile', 'file node', permission='pull')(getfile) + wireprotov1server.wireprotocommand( + 'getpackv1', '*', permission='pull')(getpack) + + class streamstate(object): + match = None + shallowremote = False + noflatmf = False + state = streamstate() + + def stream_out_shallow(repo, proto, other): + includepattern = None + excludepattern = None + raw = other.get('includepattern') + if raw: + includepattern = raw.split('\0') + raw = other.get('excludepattern') + if raw: + excludepattern = raw.split('\0') + + oldshallow = state.shallowremote + oldmatch = state.match + oldnoflatmf = state.noflatmf + try: + state.shallowremote = True + state.match = match.always(repo.root, '') + state.noflatmf = other.get('noflatmanifest') == 'True' + if includepattern or excludepattern: + state.match = match.match(repo.root, '', None, + includepattern, excludepattern) + streamres = wireprotov1server.stream(repo, proto) + + # Force the first value to execute, so the file list is computed + # within the try/finally scope + first = next(streamres.gen) + second = next(streamres.gen) + def gen(): + yield first + yield second + for value in streamres.gen: + yield value + return wireprototypes.streamres(gen()) + finally: + state.shallowremote = oldshallow + state.match = oldmatch + state.noflatmf = oldnoflatmf + + wireprotov1server.commands['stream_out_shallow'] = (stream_out_shallow, '*') + + # don't clone filelogs to shallow clients + def _walkstreamfiles(orig, repo): + if state.shallowremote: + # if we are shallow ourselves, stream our local commits + if shallowrepo.requirement in repo.requirements: + striplen = len(repo.store.path) + 1 + readdir = repo.store.rawvfs.readdir + visit = [os.path.join(repo.store.path, 'data')] + while visit: + p = visit.pop() + for f, kind, st in readdir(p, stat=True): + fp = p + '/' + f + if kind == stat.S_IFREG: + if not fp.endswith('.i') and not fp.endswith('.d'): + n = util.pconvert(fp[striplen:]) + yield (store.decodedir(n), n, st.st_size) + if kind == stat.S_IFDIR: + visit.append(fp) + + if 'treemanifest' in repo.requirements: + for (u, e, s) in repo.store.datafiles(): + if (u.startswith('meta/') and + (u.endswith('.i') or u.endswith('.d'))): + yield (u, e, s) + + # Return .d and .i files that do not match the shallow pattern + match = state.match + if match and not match.always(): + for (u, e, s) in repo.store.datafiles(): + f = u[5:-2] # trim data/... and .i/.d + if not state.match(f): + yield (u, e, s) + + for x in repo.store.topfiles(): + if state.noflatmf and x[0][:11] == '00manifest.': + continue + yield x + + elif shallowrepo.requirement in repo.requirements: + # don't allow cloning from a shallow repo to a full repo + # since it would require fetching every version of every + # file in order to create the revlogs. + raise error.Abort(_("Cannot clone from a shallow repo " + "to a full repo.")) + else: + for x in orig(repo): + yield x + + extensions.wrapfunction(streamclone, '_walkstreamfiles', _walkstreamfiles) + + # We no longer use getbundle_shallow commands, but we must still + # support it for migration purposes + def getbundleshallow(repo, proto, others): + bundlecaps = others.get('bundlecaps', '') + bundlecaps = set(bundlecaps.split(',')) + bundlecaps.add('remotefilelog') + others['bundlecaps'] = ','.join(bundlecaps) + + return wireprotov1server.commands["getbundle"][0](repo, proto, others) + + wireprotov1server.commands["getbundle_shallow"] = (getbundleshallow, '*') + + # expose remotefilelog capabilities + def _capabilities(orig, repo, proto): + caps = orig(repo, proto) + if ((shallowrepo.requirement in repo.requirements or + ui.configbool('remotefilelog', 'server'))): + if isinstance(proto, _sshv1server): + # legacy getfiles method which only works over ssh + caps.append(shallowrepo.requirement) + caps.append('getflogheads') + caps.append('getfile') + return caps + extensions.wrapfunction(wireprotov1server, '_capabilities', _capabilities) + + def _adjustlinkrev(orig, self, *args, **kwargs): + # When generating file blobs, taking the real path is too slow on large + # repos, so force it to just return the linkrev directly. + repo = self._repo + if util.safehasattr(repo, 'forcelinkrev') and repo.forcelinkrev: + return self._filelog.linkrev(self._filelog.rev(self._filenode)) + return orig(self, *args, **kwargs) + + extensions.wrapfunction( + context.basefilectx, '_adjustlinkrev', _adjustlinkrev) + + def _iscmd(orig, cmd): + if cmd == 'getfiles': + return False + return orig(cmd) + + extensions.wrapfunction(wireprotoserver, 'iscmd', _iscmd) + +def _loadfileblob(repo, cachepath, path, node): + filecachepath = os.path.join(cachepath, path, hex(node)) + if not os.path.exists(filecachepath) or os.path.getsize(filecachepath) == 0: + filectx = repo.filectx(path, fileid=node) + if filectx.node() == nullid: + repo.changelog = changelog.changelog(repo.svfs) + filectx = repo.filectx(path, fileid=node) + + text = createfileblob(filectx) + text = lz4wrapper.lzcompresshc(text) + + # everything should be user & group read/writable + oldumask = os.umask(0o002) + try: + dirname = os.path.dirname(filecachepath) + if not os.path.exists(dirname): + try: + os.makedirs(dirname) + except OSError as ex: + if ex.errno != errno.EEXIST: + raise + + f = None + try: + f = util.atomictempfile(filecachepath, "w") + f.write(text) + except (IOError, OSError): + # Don't abort if the user only has permission to read, + # and not write. + pass + finally: + if f: + f.close() + finally: + os.umask(oldumask) + else: + with open(filecachepath, "r") as f: + text = f.read() + return text + +def getflogheads(repo, proto, path): + """A server api for requesting a filelog's heads + """ + flog = repo.file(path) + heads = flog.heads() + return '\n'.join((hex(head) for head in heads if head != nullid)) + +def getfile(repo, proto, file, node): + """A server api for requesting a particular version of a file. Can be used + in batches to request many files at once. The return protocol is: + \0 where is 0 for success or + non-zero for an error. + + data is a compressed blob with revlog flag and ancestors information. See + createfileblob for its content. + """ + if shallowrepo.requirement in repo.requirements: + return '1\0' + _('cannot fetch remote files from shallow repo') + cachepath = repo.ui.config("remotefilelog", "servercachepath") + if not cachepath: + cachepath = os.path.join(repo.path, "remotefilelogcache") + node = bin(node.strip()) + if node == nullid: + return '0\0' + return '0\0' + _loadfileblob(repo, cachepath, file, node) + +def getfiles(repo, proto): + """A server api for requesting particular versions of particular files. + """ + if shallowrepo.requirement in repo.requirements: + raise error.Abort(_('cannot fetch remote files from shallow repo')) + if not isinstance(proto, _sshv1server): + raise error.Abort(_('cannot fetch remote files over non-ssh protocol')) + + def streamer(): + fin = proto._fin + + cachepath = repo.ui.config("remotefilelog", "servercachepath") + if not cachepath: + cachepath = os.path.join(repo.path, "remotefilelogcache") + + while True: + request = fin.readline()[:-1] + if not request: + break + + node = bin(request[:40]) + if node == nullid: + yield '0\n' + continue + + path = request[40:] + + text = _loadfileblob(repo, cachepath, path, node) + + yield '%d\n%s' % (len(text), text) + + # it would be better to only flush after processing a whole batch + # but currently we don't know if there are more requests coming + proto._fout.flush() + return wireprototypes.streamres(streamer()) + +def createfileblob(filectx): + """ + format: + v0: + str(len(rawtext)) + '\0' + rawtext + ancestortext + v1: + 'v1' + '\n' + metalist + '\0' + rawtext + ancestortext + metalist := metalist + '\n' + meta | meta + meta := sizemeta | flagmeta + sizemeta := METAKEYSIZE + str(len(rawtext)) + flagmeta := METAKEYFLAG + str(flag) + + note: sizemeta must exist. METAKEYFLAG and METAKEYSIZE must have a + length of 1. + """ + flog = filectx.filelog() + frev = filectx.filerev() + revlogflags = flog._revlog.flags(frev) + if revlogflags == 0: + # normal files + text = filectx.data() + else: + # lfs, read raw revision data + text = flog.revision(frev, raw=True) + + repo = filectx._repo + + ancestors = [filectx] + + try: + repo.forcelinkrev = True + ancestors.extend([f for f in filectx.ancestors()]) + + ancestortext = "" + for ancestorctx in ancestors: + parents = ancestorctx.parents() + p1 = nullid + p2 = nullid + if len(parents) > 0: + p1 = parents[0].filenode() + if len(parents) > 1: + p2 = parents[1].filenode() + + copyname = "" + rename = ancestorctx.renamed() + if rename: + copyname = rename[0] + linknode = ancestorctx.node() + ancestortext += "%s%s%s%s%s\0" % ( + ancestorctx.filenode(), p1, p2, linknode, + copyname) + finally: + repo.forcelinkrev = False + + header = shallowutil.buildfileblobheader(len(text), revlogflags) + + return "%s\0%s%s" % (header, text, ancestortext) + +def gcserver(ui, repo): + if not repo.ui.configbool("remotefilelog", "server"): + return + + neededfiles = set() + heads = repo.revs("heads(tip~25000:) - null") + + cachepath = repo.vfs.join("remotefilelogcache") + for head in heads: + mf = repo[head].manifest() + for filename, filenode in mf.iteritems(): + filecachepath = os.path.join(cachepath, filename, hex(filenode)) + neededfiles.add(filecachepath) + + # delete unneeded older files + days = repo.ui.configint("remotefilelog", "serverexpiration") + expiration = time.time() - (days * 24 * 60 * 60) + + _removing = _("removing old server cache") + count = 0 + ui.progress(_removing, count, unit="files") + for root, dirs, files in os.walk(cachepath): + for file in files: + filepath = os.path.join(root, file) + count += 1 + ui.progress(_removing, count, unit="files") + if filepath in neededfiles: + continue + + stat = os.stat(filepath) + if stat.st_mtime < expiration: + os.remove(filepath) + + ui.progress(_removing, None) + +def getpack(repo, proto, args): + """A server api for requesting a pack of file information. + """ + if shallowrepo.requirement in repo.requirements: + raise error.Abort(_('cannot fetch remote files from shallow repo')) + if not isinstance(proto, _sshv1server): + raise error.Abort(_('cannot fetch remote files over non-ssh protocol')) + + def streamer(): + """Request format: + + [,...]\0\0 + filerequest = + [,...] + + Response format: + [,...]<10 null bytes> + fileresponse = + history = [,...] + historyentry = + + deltas = [,...] + deltaentry = + + """ + fin = proto._fin + files = _receivepackrequest(fin) + + # Sort the files by name, so we provide deterministic results + for filename, nodes in sorted(files.iteritems()): + fl = repo.file(filename) + + # Compute history + history = [] + for rev in ancestor.lazyancestors(fl.parentrevs, + [fl.rev(n) for n in nodes], + inclusive=True): + linkrev = fl.linkrev(rev) + node = fl.node(rev) + p1node, p2node = fl.parents(node) + copyfrom = '' + linknode = repo.changelog.node(linkrev) + if p1node == nullid: + copydata = fl.renamed(node) + if copydata: + copyfrom, copynode = copydata + p1node = copynode + + history.append((node, p1node, p2node, linknode, copyfrom)) + + # Scan and send deltas + chain = _getdeltachain(fl, nodes, -1) + + for chunk in wirepack.sendpackpart(filename, history, chain): + yield chunk + + yield wirepack.closepart() + proto._fout.flush() + + return wireprototypes.streamres(streamer()) + +def _receivepackrequest(stream): + files = {} + while True: + filenamelen = shallowutil.readunpack(stream, + constants.FILENAMESTRUCT)[0] + if filenamelen == 0: + break + + filename = shallowutil.readexactly(stream, filenamelen) + + nodecount = shallowutil.readunpack(stream, + constants.PACKREQUESTCOUNTSTRUCT)[0] + + # Read N nodes + nodes = shallowutil.readexactly(stream, constants.NODESIZE * nodecount) + nodes = set(nodes[i:i + constants.NODESIZE] for i in + pycompat.xrange(0, len(nodes), constants.NODESIZE)) + + files[filename] = nodes + + return files + +def _getdeltachain(fl, nodes, stophint): + """Produces a chain of deltas that includes each of the given nodes. + + `stophint` - The changeset rev number to stop at. If it's set to >= 0, we + will return not only the deltas for the requested nodes, but also all + necessary deltas in their delta chains, as long as the deltas have link revs + >= the stophint. This allows us to return an approximately minimal delta + chain when the user performs a pull. If `stophint` is set to -1, all nodes + will return full texts. """ + chain = [] + + seen = set() + for node in nodes: + startrev = fl.rev(node) + cur = startrev + while True: + if cur in seen: + break + base = fl._revlog.deltaparent(cur) + linkrev = fl.linkrev(cur) + node = fl.node(cur) + p1, p2 = fl.parentrevs(cur) + if linkrev < stophint and cur != startrev: + break + + # Return a full text if: + # - the caller requested it (via stophint == -1) + # - the revlog chain has ended (via base==null or base==node) + # - p1 is null. In some situations this can mean it's a copy, so + # we need to use fl.read() to remove the copymetadata. + if (stophint == -1 or base == nullrev or base == cur + or p1 == nullrev): + delta = fl.read(cur) + base = nullrev + else: + delta = fl._chunk(cur) + + basenode = fl.node(base) + chain.append((node, basenode, delta)) + seen.add(cur) + + if base == nullrev: + break + cur = base + + chain.reverse() + return chain diff --git a/hgext/remotefilelog/repack.py b/hgext/remotefilelog/repack.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/repack.py @@ -0,0 +1,786 @@ +from __future__ import absolute_import + +import os +import time + +from mercurial.i18n import _ +from mercurial.node import ( + nullid, + short, +) +from mercurial import ( + encoding, + error, + mdiff, + policy, + pycompat, + scmutil, + util, + vfs, +) +from mercurial.utils import procutil +from . import ( + constants, + contentstore, + datapack, + extutil, + historypack, + metadatastore, + shallowutil, +) + +osutil = policy.importmod(r'osutil') + +class RepackAlreadyRunning(error.Abort): + pass + +if util.safehasattr(util, '_hgexecutable'): + # Before 5be286db + _hgexecutable = util.hgexecutable +else: + from mercurial.utils import procutil + _hgexecutable = procutil.hgexecutable + +def backgroundrepack(repo, incremental=True, packsonly=False): + cmd = [_hgexecutable(), '-R', repo.origroot, 'repack'] + msg = _("(running background repack)\n") + if incremental: + cmd.append('--incremental') + msg = _("(running background incremental repack)\n") + if packsonly: + cmd.append('--packsonly') + cmd = ' '.join(map(procutil.shellquote, cmd)) + + repo.ui.warn(msg) + extutil.runshellcommand(cmd, encoding.environ) + +def fullrepack(repo, options=None): + """If ``packsonly`` is True, stores creating only loose objects are skipped. + """ + if util.safehasattr(repo, 'shareddatastores'): + datasource = contentstore.unioncontentstore( + *repo.shareddatastores) + historysource = metadatastore.unionmetadatastore( + *repo.sharedhistorystores, + allowincomplete=True) + + packpath = shallowutil.getcachepackpath( + repo, + constants.FILEPACK_CATEGORY) + _runrepack(repo, datasource, historysource, packpath, + constants.FILEPACK_CATEGORY, options=options) + + if util.safehasattr(repo.manifestlog, 'datastore'): + localdata, shareddata = _getmanifeststores(repo) + lpackpath, ldstores, lhstores = localdata + spackpath, sdstores, shstores = shareddata + + # Repack the shared manifest store + datasource = contentstore.unioncontentstore(*sdstores) + historysource = metadatastore.unionmetadatastore( + *shstores, + allowincomplete=True) + _runrepack(repo, datasource, historysource, spackpath, + constants.TREEPACK_CATEGORY, options=options) + + # Repack the local manifest store + datasource = contentstore.unioncontentstore( + *ldstores, + allowincomplete=True) + historysource = metadatastore.unionmetadatastore( + *lhstores, + allowincomplete=True) + _runrepack(repo, datasource, historysource, lpackpath, + constants.TREEPACK_CATEGORY, options=options) + +def incrementalrepack(repo, options=None): + """This repacks the repo by looking at the distribution of pack files in the + repo and performing the most minimal repack to keep the repo in good shape. + """ + if util.safehasattr(repo, 'shareddatastores'): + packpath = shallowutil.getcachepackpath( + repo, + constants.FILEPACK_CATEGORY) + _incrementalrepack(repo, + repo.shareddatastores, + repo.sharedhistorystores, + packpath, + constants.FILEPACK_CATEGORY, + options=options) + + if util.safehasattr(repo.manifestlog, 'datastore'): + localdata, shareddata = _getmanifeststores(repo) + lpackpath, ldstores, lhstores = localdata + spackpath, sdstores, shstores = shareddata + + # Repack the shared manifest store + _incrementalrepack(repo, + sdstores, + shstores, + spackpath, + constants.TREEPACK_CATEGORY, + options=options) + + # Repack the local manifest store + _incrementalrepack(repo, + ldstores, + lhstores, + lpackpath, + constants.TREEPACK_CATEGORY, + allowincompletedata=True, + options=options) + +def _getmanifeststores(repo): + shareddatastores = repo.manifestlog.shareddatastores + localdatastores = repo.manifestlog.localdatastores + sharedhistorystores = repo.manifestlog.sharedhistorystores + localhistorystores = repo.manifestlog.localhistorystores + + sharedpackpath = shallowutil.getcachepackpath(repo, + constants.TREEPACK_CATEGORY) + localpackpath = shallowutil.getlocalpackpath(repo.svfs.vfs.base, + constants.TREEPACK_CATEGORY) + + return ((localpackpath, localdatastores, localhistorystores), + (sharedpackpath, shareddatastores, sharedhistorystores)) + +def _topacks(packpath, files, constructor): + paths = list(os.path.join(packpath, p) for p in files) + packs = list(constructor(p) for p in paths) + return packs + +def _deletebigpacks(repo, folder, files): + """Deletes packfiles that are bigger than ``packs.maxpacksize``. + + Returns ``files` with the removed files omitted.""" + maxsize = repo.ui.configbytes("packs", "maxpacksize") + if maxsize <= 0: + return files + + # This only considers datapacks today, but we could broaden it to include + # historypacks. + VALIDEXTS = [".datapack", ".dataidx"] + + # Either an oversize index or datapack will trigger cleanup of the whole + # pack: + oversized = set([os.path.splitext(path)[0] for path, ftype, stat in files + if (stat.st_size > maxsize and (os.path.splitext(path)[1] + in VALIDEXTS))]) + + for rootfname in oversized: + rootpath = os.path.join(folder, rootfname) + for ext in VALIDEXTS: + path = rootpath + ext + repo.ui.debug('removing oversize packfile %s (%s)\n' % + (path, util.bytecount(os.stat(path).st_size))) + os.unlink(path) + return [row for row in files if os.path.basename(row[0]) not in oversized] + +def _incrementalrepack(repo, datastore, historystore, packpath, category, + allowincompletedata=False, options=None): + shallowutil.mkstickygroupdir(repo.ui, packpath) + + files = osutil.listdir(packpath, stat=True) + files = _deletebigpacks(repo, packpath, files) + datapacks = _topacks(packpath, + _computeincrementaldatapack(repo.ui, files), + datapack.datapack) + datapacks.extend(s for s in datastore + if not isinstance(s, datapack.datapackstore)) + + historypacks = _topacks(packpath, + _computeincrementalhistorypack(repo.ui, files), + historypack.historypack) + historypacks.extend(s for s in historystore + if not isinstance(s, historypack.historypackstore)) + + # ``allhistory{files,packs}`` contains all known history packs, even ones we + # don't plan to repack. They are used during the datapack repack to ensure + # good ordering of nodes. + allhistoryfiles = _allpackfileswithsuffix(files, historypack.PACKSUFFIX, + historypack.INDEXSUFFIX) + allhistorypacks = _topacks(packpath, + (f for f, mode, stat in allhistoryfiles), + historypack.historypack) + allhistorypacks.extend(s for s in historystore + if not isinstance(s, historypack.historypackstore)) + _runrepack(repo, + contentstore.unioncontentstore( + *datapacks, + allowincomplete=allowincompletedata), + metadatastore.unionmetadatastore( + *historypacks, + allowincomplete=True), + packpath, category, + fullhistory=metadatastore.unionmetadatastore( + *allhistorypacks, + allowincomplete=True), + options=options) + +def _computeincrementaldatapack(ui, files): + opts = { + 'gencountlimit' : ui.configint( + 'remotefilelog', 'data.gencountlimit'), + 'generations' : ui.configlist( + 'remotefilelog', 'data.generations'), + 'maxrepackpacks' : ui.configint( + 'remotefilelog', 'data.maxrepackpacks'), + 'repackmaxpacksize' : ui.configbytes( + 'remotefilelog', 'data.repackmaxpacksize'), + 'repacksizelimit' : ui.configbytes( + 'remotefilelog', 'data.repacksizelimit'), + } + + packfiles = _allpackfileswithsuffix( + files, datapack.PACKSUFFIX, datapack.INDEXSUFFIX) + return _computeincrementalpack(packfiles, opts) + +def _computeincrementalhistorypack(ui, files): + opts = { + 'gencountlimit' : ui.configint( + 'remotefilelog', 'history.gencountlimit'), + 'generations' : ui.configlist( + 'remotefilelog', 'history.generations', ['100MB']), + 'maxrepackpacks' : ui.configint( + 'remotefilelog', 'history.maxrepackpacks'), + 'repackmaxpacksize' : ui.configbytes( + 'remotefilelog', 'history.repackmaxpacksize', '400MB'), + 'repacksizelimit' : ui.configbytes( + 'remotefilelog', 'history.repacksizelimit'), + } + + packfiles = _allpackfileswithsuffix( + files, historypack.PACKSUFFIX, historypack.INDEXSUFFIX) + return _computeincrementalpack(packfiles, opts) + +def _allpackfileswithsuffix(files, packsuffix, indexsuffix): + result = [] + fileset = set(fn for fn, mode, stat in files) + for filename, mode, stat in files: + if not filename.endswith(packsuffix): + continue + + prefix = filename[:-len(packsuffix)] + + # Don't process a pack if it doesn't have an index. + if (prefix + indexsuffix) not in fileset: + continue + result.append((prefix, mode, stat)) + + return result + +def _computeincrementalpack(files, opts): + """Given a set of pack files along with the configuration options, this + function computes the list of files that should be packed as part of an + incremental repack. + + It tries to strike a balance between keeping incremental repacks cheap (i.e. + packing small things when possible, and rolling the packs up to the big ones + over time). + """ + + limits = list(sorted((util.sizetoint(s) for s in opts['generations']), + reverse=True)) + limits.append(0) + + # Group the packs by generation (i.e. by size) + generations = [] + for i in pycompat.xrange(len(limits)): + generations.append([]) + + sizes = {} + for prefix, mode, stat in files: + size = stat.st_size + if size > opts['repackmaxpacksize']: + continue + + sizes[prefix] = size + for i, limit in enumerate(limits): + if size > limit: + generations[i].append(prefix) + break + + # Steps for picking what packs to repack: + # 1. Pick the largest generation with > gencountlimit pack files. + # 2. Take the smallest three packs. + # 3. While total-size-of-packs < repacksizelimit: add another pack + + # Find the largest generation with more than gencountlimit packs + genpacks = [] + for i, limit in enumerate(limits): + if len(generations[i]) > opts['gencountlimit']: + # Sort to be smallest last, for easy popping later + genpacks.extend(sorted(generations[i], reverse=True, + key=lambda x: sizes[x])) + break + + # Take as many packs from the generation as we can + chosenpacks = genpacks[-3:] + genpacks = genpacks[:-3] + repacksize = sum(sizes[n] for n in chosenpacks) + while (repacksize < opts['repacksizelimit'] and genpacks and + len(chosenpacks) < opts['maxrepackpacks']): + chosenpacks.append(genpacks.pop()) + repacksize += sizes[chosenpacks[-1]] + + return chosenpacks + +def _runrepack(repo, data, history, packpath, category, fullhistory=None, + options=None): + shallowutil.mkstickygroupdir(repo.ui, packpath) + + def isold(repo, filename, node): + """Check if the file node is older than a limit. + Unless a limit is specified in the config the default limit is taken. + """ + filectx = repo.filectx(filename, fileid=node) + filetime = repo[filectx.linkrev()].date() + + ttl = repo.ui.configint('remotefilelog', 'nodettl') + + limit = time.time() - ttl + return filetime[0] < limit + + garbagecollect = repo.ui.configbool('remotefilelog', 'gcrepack') + if not fullhistory: + fullhistory = history + packer = repacker(repo, data, history, fullhistory, category, + gc=garbagecollect, isold=isold, options=options) + + # internal config: remotefilelog.datapackversion + dv = repo.ui.configint('remotefilelog', 'datapackversion', 0) + + with datapack.mutabledatapack(repo.ui, packpath, version=dv) as dpack: + with historypack.mutablehistorypack(repo.ui, packpath) as hpack: + try: + packer.run(dpack, hpack) + except error.LockHeld: + raise RepackAlreadyRunning(_("skipping repack - another repack " + "is already running")) + +def keepset(repo, keyfn, lastkeepkeys=None): + """Computes a keepset which is not garbage collected. + 'keyfn' is a function that maps filename, node to a unique key. + 'lastkeepkeys' is an optional argument and if provided the keepset + function updates lastkeepkeys with more keys and returns the result. + """ + if not lastkeepkeys: + keepkeys = set() + else: + keepkeys = lastkeepkeys + + # We want to keep: + # 1. Working copy parent + # 2. Draft commits + # 3. Parents of draft commits + # 4. Pullprefetch and bgprefetchrevs revsets if specified + revs = ['.', 'draft()', 'parents(draft())'] + prefetchrevs = repo.ui.config('remotefilelog', 'pullprefetch', None) + if prefetchrevs: + revs.append('(%s)' % prefetchrevs) + prefetchrevs = repo.ui.config('remotefilelog', 'bgprefetchrevs', None) + if prefetchrevs: + revs.append('(%s)' % prefetchrevs) + revs = '+'.join(revs) + + revs = ['sort((%s), "topo")' % revs] + keep = scmutil.revrange(repo, revs) + + processed = set() + lastmanifest = None + + # process the commits in toposorted order starting from the oldest + for r in reversed(keep._list): + if repo[r].p1().rev() in processed: + # if the direct parent has already been processed + # then we only need to process the delta + m = repo[r].manifestctx().readdelta() + else: + # otherwise take the manifest and diff it + # with the previous manifest if one exists + if lastmanifest: + m = repo[r].manifest().diff(lastmanifest) + else: + m = repo[r].manifest() + lastmanifest = repo[r].manifest() + processed.add(r) + + # populate keepkeys with keys from the current manifest + if type(m) is dict: + # m is a result of diff of two manifests and is a dictionary that + # maps filename to ((newnode, newflag), (oldnode, oldflag)) tuple + for filename, diff in m.iteritems(): + if diff[0][0] is not None: + keepkeys.add(keyfn(filename, diff[0][0])) + else: + # m is a manifest object + for filename, filenode in m.iteritems(): + keepkeys.add(keyfn(filename, filenode)) + + return keepkeys + +class repacker(object): + """Class for orchestrating the repack of data and history information into a + new format. + """ + def __init__(self, repo, data, history, fullhistory, category, gc=False, + isold=None, options=None): + self.repo = repo + self.data = data + self.history = history + self.fullhistory = fullhistory + self.unit = constants.getunits(category) + self.garbagecollect = gc + self.options = options + if self.garbagecollect: + if not isold: + raise ValueError("Function 'isold' is not properly specified") + # use (filename, node) tuple as a keepset key + self.keepkeys = keepset(repo, lambda f, n : (f, n)) + self.isold = isold + + def run(self, targetdata, targethistory): + ledger = repackledger() + + with extutil.flock(repacklockvfs(self.repo).join("repacklock"), + _('repacking %s') % self.repo.origroot, timeout=0): + self.repo.hook('prerepack') + + # Populate ledger from source + self.data.markledger(ledger, options=self.options) + self.history.markledger(ledger, options=self.options) + + # Run repack + self.repackdata(ledger, targetdata) + self.repackhistory(ledger, targethistory) + + # Call cleanup on each source + for source in ledger.sources: + source.cleanup(ledger) + + def _chainorphans(self, ui, filename, nodes, orphans, deltabases): + """Reorderes ``orphans`` into a single chain inside ``nodes`` and + ``deltabases``. + + We often have orphan entries (nodes without a base that aren't + referenced by other nodes -- i.e., part of a chain) due to gaps in + history. Rather than store them as individual fulltexts, we prefer to + insert them as one chain sorted by size. + """ + if not orphans: + return nodes + + def getsize(node, default=0): + meta = self.data.getmeta(filename, node) + if constants.METAKEYSIZE in meta: + return meta[constants.METAKEYSIZE] + else: + return default + + # Sort orphans by size; biggest first is preferred, since it's more + # likely to be the newest version assuming files grow over time. + # (Sort by node first to ensure the sort is stable.) + orphans = sorted(orphans) + orphans = list(sorted(orphans, key=getsize, reverse=True)) + if ui.debugflag: + ui.debug("%s: orphan chain: %s\n" % (filename, + ", ".join([short(s) for s in orphans]))) + + # Create one contiguous chain and reassign deltabases. + for i, node in enumerate(orphans): + if i == 0: + deltabases[node] = (nullid, 0) + else: + parent = orphans[i - 1] + deltabases[node] = (parent, deltabases[parent][1] + 1) + nodes = filter(lambda node: node not in orphans, nodes) + nodes += orphans + return nodes + + def repackdata(self, ledger, target): + ui = self.repo.ui + maxchainlen = ui.configint('packs', 'maxchainlen', 1000) + + byfile = {} + for entry in ledger.entries.itervalues(): + if entry.datasource: + byfile.setdefault(entry.filename, {})[entry.node] = entry + + count = 0 + for filename, entries in sorted(byfile.iteritems()): + ui.progress(_("repacking data"), count, unit=self.unit, + total=len(byfile)) + + ancestors = {} + nodes = list(node for node in entries.iterkeys()) + nohistory = [] + for i, node in enumerate(nodes): + if node in ancestors: + continue + ui.progress(_("building history"), i, unit='nodes', + total=len(nodes)) + try: + ancestors.update(self.fullhistory.getancestors(filename, + node, known=ancestors)) + except KeyError: + # Since we're packing data entries, we may not have the + # corresponding history entries for them. It's not a big + # deal, but the entries won't be delta'd perfectly. + nohistory.append(node) + ui.progress(_("building history"), None) + + # Order the nodes children first, so we can produce reverse deltas + orderednodes = list(reversed(self._toposort(ancestors))) + if len(nohistory) > 0: + ui.debug('repackdata: %d nodes without history\n' % + len(nohistory)) + orderednodes.extend(sorted(nohistory)) + + # Filter orderednodes to just the nodes we want to serialize (it + # currently also has the edge nodes' ancestors). + orderednodes = filter(lambda node: node in nodes, orderednodes) + + # Garbage collect old nodes: + if self.garbagecollect: + neworderednodes = [] + for node in orderednodes: + # If the node is old and is not in the keepset, we skip it, + # and mark as garbage collected + if ((filename, node) not in self.keepkeys and + self.isold(self.repo, filename, node)): + entries[node].gced = True + continue + neworderednodes.append(node) + orderednodes = neworderednodes + + # Compute delta bases for nodes: + deltabases = {} + nobase = set() + referenced = set() + nodes = set(nodes) + for i, node in enumerate(orderednodes): + ui.progress(_("processing nodes"), i, unit='nodes', + total=len(orderednodes)) + # Find delta base + # TODO: allow delta'ing against most recent descendant instead + # of immediate child + deltatuple = deltabases.get(node, None) + if deltatuple is None: + deltabase, chainlen = nullid, 0 + deltabases[node] = (nullid, 0) + nobase.add(node) + else: + deltabase, chainlen = deltatuple + referenced.add(deltabase) + + # Use available ancestor information to inform our delta choices + ancestorinfo = ancestors.get(node) + if ancestorinfo: + p1, p2, linknode, copyfrom = ancestorinfo + + # The presence of copyfrom means we're at a point where the + # file was copied from elsewhere. So don't attempt to do any + # deltas with the other file. + if copyfrom: + p1 = nullid + + if chainlen < maxchainlen: + # Record this child as the delta base for its parents. + # This may be non optimal, since the parents may have + # many children, and this will only choose the last one. + # TODO: record all children and try all deltas to find + # best + if p1 != nullid: + deltabases[p1] = (node, chainlen + 1) + if p2 != nullid: + deltabases[p2] = (node, chainlen + 1) + + # experimental config: repack.chainorphansbysize + if ui.configbool('repack', 'chainorphansbysize'): + orphans = nobase - referenced + orderednodes = self._chainorphans(ui, filename, orderednodes, + orphans, deltabases) + + # Compute deltas and write to the pack + for i, node in enumerate(orderednodes): + deltabase, chainlen = deltabases[node] + # Compute delta + # TODO: Optimize the deltachain fetching. Since we're + # iterating over the different version of the file, we may + # be fetching the same deltachain over and over again. + meta = None + if deltabase != nullid: + deltaentry = self.data.getdelta(filename, node) + delta, deltabasename, origdeltabase, meta = deltaentry + size = meta.get(constants.METAKEYSIZE) + if (deltabasename != filename or origdeltabase != deltabase + or size is None): + deltabasetext = self.data.get(filename, deltabase) + original = self.data.get(filename, node) + size = len(original) + delta = mdiff.textdiff(deltabasetext, original) + else: + delta = self.data.get(filename, node) + size = len(delta) + meta = self.data.getmeta(filename, node) + + # TODO: don't use the delta if it's larger than the fulltext + if constants.METAKEYSIZE not in meta: + meta[constants.METAKEYSIZE] = size + target.add(filename, node, deltabase, delta, meta) + + entries[node].datarepacked = True + + ui.progress(_("processing nodes"), None) + count += 1 + + ui.progress(_("repacking data"), None) + target.close(ledger=ledger) + + def repackhistory(self, ledger, target): + ui = self.repo.ui + + byfile = {} + for entry in ledger.entries.itervalues(): + if entry.historysource: + byfile.setdefault(entry.filename, {})[entry.node] = entry + + count = 0 + for filename, entries in sorted(byfile.iteritems()): + ancestors = {} + nodes = list(node for node in entries.iterkeys()) + + for node in nodes: + if node in ancestors: + continue + ancestors.update(self.history.getancestors(filename, node, + known=ancestors)) + + # Order the nodes children first + orderednodes = reversed(self._toposort(ancestors)) + + # Write to the pack + dontprocess = set() + for node in orderednodes: + p1, p2, linknode, copyfrom = ancestors[node] + + # If the node is marked dontprocess, but it's also in the + # explicit entries set, that means the node exists both in this + # file and in another file that was copied to this file. + # Usually this happens if the file was copied to another file, + # then the copy was deleted, then reintroduced without copy + # metadata. The original add and the new add have the same hash + # since the content is identical and the parents are null. + if node in dontprocess and node not in entries: + # If copyfrom == filename, it means the copy history + # went to come other file, then came back to this one, so we + # should continue processing it. + if p1 != nullid and copyfrom != filename: + dontprocess.add(p1) + if p2 != nullid: + dontprocess.add(p2) + continue + + if copyfrom: + dontprocess.add(p1) + + target.add(filename, node, p1, p2, linknode, copyfrom) + + if node in entries: + entries[node].historyrepacked = True + + count += 1 + ui.progress(_("repacking history"), count, unit=self.unit, + total=len(byfile)) + + ui.progress(_("repacking history"), None) + target.close(ledger=ledger) + + def _toposort(self, ancestors): + def parentfunc(node): + p1, p2, linknode, copyfrom = ancestors[node] + parents = [] + if p1 != nullid: + parents.append(p1) + if p2 != nullid: + parents.append(p2) + return parents + + sortednodes = shallowutil.sortnodes(ancestors.keys(), parentfunc) + return sortednodes + +class repackledger(object): + """Storage for all the bookkeeping that happens during a repack. It contains + the list of revisions being repacked, what happened to each revision, and + which source store contained which revision originally (for later cleanup). + """ + def __init__(self): + self.entries = {} + self.sources = {} + self.created = set() + + def markdataentry(self, source, filename, node): + """Mark the given filename+node revision as having a data rev in the + given source. + """ + entry = self._getorcreateentry(filename, node) + entry.datasource = True + entries = self.sources.get(source) + if not entries: + entries = set() + self.sources[source] = entries + entries.add(entry) + + def markhistoryentry(self, source, filename, node): + """Mark the given filename+node revision as having a history rev in the + given source. + """ + entry = self._getorcreateentry(filename, node) + entry.historysource = True + entries = self.sources.get(source) + if not entries: + entries = set() + self.sources[source] = entries + entries.add(entry) + + def _getorcreateentry(self, filename, node): + key = (filename, node) + value = self.entries.get(key) + if not value: + value = repackentry(filename, node) + self.entries[key] = value + + return value + + def addcreated(self, value): + self.created.add(value) + +class repackentry(object): + """Simple class representing a single revision entry in the repackledger. + """ + __slots__ = ['filename', 'node', 'datasource', 'historysource', + 'datarepacked', 'historyrepacked', 'gced'] + def __init__(self, filename, node): + self.filename = filename + self.node = node + # If the revision has a data entry in the source + self.datasource = False + # If the revision has a history entry in the source + self.historysource = False + # If the revision's data entry was repacked into the repack target + self.datarepacked = False + # If the revision's history entry was repacked into the repack target + self.historyrepacked = False + # If garbage collected + self.gced = False + +def repacklockvfs(repo): + if util.safehasattr(repo, 'name'): + # Lock in the shared cache so repacks across multiple copies of the same + # repo are coordinated. + sharedcachepath = shallowutil.getcachepackpath( + repo, + constants.FILEPACK_CATEGORY) + return vfs.vfs(sharedcachepath) + else: + return repo.svfs diff --git a/hgext/remotefilelog/shallowbundle.py b/hgext/remotefilelog/shallowbundle.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/shallowbundle.py @@ -0,0 +1,295 @@ +# shallowbundle.py - bundle10 implementation for use with shallow repositories +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +from mercurial.i18n import _ +from mercurial.node import bin, hex, nullid +from mercurial import ( + bundlerepo, + changegroup, + error, + match, + mdiff, + pycompat, +) +from . import ( + remotefilelog, + shallowutil, +) + +NoFiles = 0 +LocalFiles = 1 +AllFiles = 2 + +requirement = "remotefilelog" + +def shallowgroup(cls, self, nodelist, rlog, lookup, units=None, reorder=None): + if not isinstance(rlog, remotefilelog.remotefilelog): + for c in super(cls, self).group(nodelist, rlog, lookup, + units=units): + yield c + return + + if len(nodelist) == 0: + yield self.close() + return + + nodelist = shallowutil.sortnodes(nodelist, rlog.parents) + + # add the parent of the first rev + p = rlog.parents(nodelist[0])[0] + nodelist.insert(0, p) + + # build deltas + for i in pycompat.xrange(len(nodelist) - 1): + prev, curr = nodelist[i], nodelist[i + 1] + linknode = lookup(curr) + for c in self.nodechunk(rlog, curr, prev, linknode): + yield c + + yield self.close() + +class shallowcg1packer(changegroup.cgpacker): + def generate(self, commonrevs, clnodes, fastpathlinkrev, source): + if "remotefilelog" in self._repo.requirements: + fastpathlinkrev = False + + return super(shallowcg1packer, self).generate(commonrevs, clnodes, + fastpathlinkrev, source) + + def group(self, nodelist, rlog, lookup, units=None, reorder=None): + return shallowgroup(shallowcg1packer, self, nodelist, rlog, lookup, + units=units) + + def generatefiles(self, changedfiles, *args): + try: + linknodes, commonrevs, source = args + except ValueError: + commonrevs, source, mfdicts, fastpathlinkrev, fnodes, clrevs = args + if requirement in self._repo.requirements: + repo = self._repo + if isinstance(repo, bundlerepo.bundlerepository): + # If the bundle contains filelogs, we can't pull from it, since + # bundlerepo is heavily tied to revlogs. Instead require that + # the user use unbundle instead. + # Force load the filelog data. + bundlerepo.bundlerepository.file(repo, 'foo') + if repo._cgfilespos: + raise error.Abort("cannot pull from full bundles", + hint="use `hg unbundle` instead") + return [] + filestosend = self.shouldaddfilegroups(source) + if filestosend == NoFiles: + changedfiles = list([f for f in changedfiles + if not repo.shallowmatch(f)]) + + return super(shallowcg1packer, self).generatefiles( + changedfiles, *args) + + def shouldaddfilegroups(self, source): + repo = self._repo + if not requirement in repo.requirements: + return AllFiles + + if source == "push" or source == "bundle": + return AllFiles + + caps = self._bundlecaps or [] + if source == "serve" or source == "pull": + if 'remotefilelog' in caps: + return LocalFiles + else: + # Serving to a full repo requires us to serve everything + repo.ui.warn(_("pulling from a shallow repo\n")) + return AllFiles + + return NoFiles + + def prune(self, rlog, missing, commonrevs): + if not isinstance(rlog, remotefilelog.remotefilelog): + return super(shallowcg1packer, self).prune(rlog, missing, + commonrevs) + + repo = self._repo + results = [] + for fnode in missing: + fctx = repo.filectx(rlog.filename, fileid=fnode) + if fctx.linkrev() not in commonrevs: + results.append(fnode) + return results + + def nodechunk(self, revlog, node, prevnode, linknode): + prefix = '' + if prevnode == nullid: + delta = revlog.revision(node, raw=True) + prefix = mdiff.trivialdiffheader(len(delta)) + else: + # Actually uses remotefilelog.revdiff which works on nodes, not revs + delta = revlog.revdiff(prevnode, node) + p1, p2 = revlog.parents(node) + flags = revlog.flags(node) + meta = self.builddeltaheader(node, p1, p2, prevnode, linknode, flags) + meta += prefix + l = len(meta) + len(delta) + yield changegroup.chunkheader(l) + yield meta + yield delta + +def makechangegroup(orig, repo, outgoing, version, source, *args, **kwargs): + if not requirement in repo.requirements: + return orig(repo, outgoing, version, source, *args, **kwargs) + + original = repo.shallowmatch + try: + # if serving, only send files the clients has patterns for + if source == 'serve': + bundlecaps = kwargs.get('bundlecaps') + includepattern = None + excludepattern = None + for cap in (bundlecaps or []): + if cap.startswith("includepattern="): + raw = cap[len("includepattern="):] + if raw: + includepattern = raw.split('\0') + elif cap.startswith("excludepattern="): + raw = cap[len("excludepattern="):] + if raw: + excludepattern = raw.split('\0') + if includepattern or excludepattern: + repo.shallowmatch = match.match(repo.root, '', None, + includepattern, excludepattern) + else: + repo.shallowmatch = match.always(repo.root, '') + return orig(repo, outgoing, version, source, *args, **kwargs) + finally: + repo.shallowmatch = original + +def addchangegroupfiles(orig, repo, source, revmap, trp, expectedfiles, *args): + if not requirement in repo.requirements: + return orig(repo, source, revmap, trp, expectedfiles, *args) + + files = 0 + newfiles = 0 + visited = set() + revisiondatas = {} + queue = [] + + # Normal Mercurial processes each file one at a time, adding all + # the new revisions for that file at once. In remotefilelog a file + # revision may depend on a different file's revision (in the case + # of a rename/copy), so we must lay all revisions down across all + # files in topological order. + + # read all the file chunks but don't add them + while True: + chunkdata = source.filelogheader() + if not chunkdata: + break + files += 1 + f = chunkdata["filename"] + repo.ui.debug("adding %s revisions\n" % f) + repo.ui.progress(_('files'), files, total=expectedfiles) + + if not repo.shallowmatch(f): + fl = repo.file(f) + deltas = source.deltaiter() + fl.addgroup(deltas, revmap, trp) + continue + + chain = None + while True: + # returns: (node, p1, p2, cs, deltabase, delta, flags) or None + revisiondata = source.deltachunk(chain) + if not revisiondata: + break + + chain = revisiondata[0] + + revisiondatas[(f, chain)] = revisiondata + queue.append((f, chain)) + + if f not in visited: + newfiles += 1 + visited.add(f) + + if chain is None: + raise error.Abort(_("received file revlog group is empty")) + + processed = set() + def available(f, node, depf, depnode): + if depnode != nullid and (depf, depnode) not in processed: + if not (depf, depnode) in revisiondatas: + # It's not in the changegroup, assume it's already + # in the repo + return True + # re-add self to queue + queue.insert(0, (f, node)) + # add dependency in front + queue.insert(0, (depf, depnode)) + return False + return True + + skipcount = 0 + + # Prefetch the non-bundled revisions that we will need + prefetchfiles = [] + for f, node in queue: + revisiondata = revisiondatas[(f, node)] + # revisiondata: (node, p1, p2, cs, deltabase, delta, flags) + dependents = [revisiondata[1], revisiondata[2], revisiondata[4]] + + for dependent in dependents: + if dependent == nullid or (f, dependent) in revisiondatas: + continue + prefetchfiles.append((f, hex(dependent))) + + repo.fileservice.prefetch(prefetchfiles) + + # Apply the revisions in topological order such that a revision + # is only written once it's deltabase and parents have been written. + while queue: + f, node = queue.pop(0) + if (f, node) in processed: + continue + + skipcount += 1 + if skipcount > len(queue) + 1: + raise error.Abort(_("circular node dependency")) + + fl = repo.file(f) + + revisiondata = revisiondatas[(f, node)] + # revisiondata: (node, p1, p2, cs, deltabase, delta, flags) + node, p1, p2, linknode, deltabase, delta, flags = revisiondata + + if not available(f, node, f, deltabase): + continue + + base = fl.revision(deltabase, raw=True) + text = mdiff.patch(base, delta) + if isinstance(text, buffer): + text = str(text) + + meta, text = shallowutil.parsemeta(text) + if 'copy' in meta: + copyfrom = meta['copy'] + copynode = bin(meta['copyrev']) + if not available(f, node, copyfrom, copynode): + continue + + for p in [p1, p2]: + if p != nullid: + if not available(f, node, f, p): + continue + + fl.add(text, meta, trp, linknode, p1, p2) + processed.add((f, node)) + skipcount = 0 + + repo.ui.progress(_('files'), None) + + return len(revisiondatas), newfiles diff --git a/hgext/remotefilelog/shallowrepo.py b/hgext/remotefilelog/shallowrepo.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/shallowrepo.py @@ -0,0 +1,310 @@ +# shallowrepo.py - shallow repository that uses remote filelogs +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import os + +from mercurial.i18n import _ +from mercurial.node import hex, nullid, nullrev +from mercurial import ( + encoding, + error, + localrepo, + match, + scmutil, + sparse, + util, +) +from mercurial.utils import procutil +from . import ( + connectionpool, + constants, + contentstore, + datapack, + extutil, + fileserverclient, + historypack, + metadatastore, + remotefilectx, + remotefilelog, + shallowutil, +) + +if util.safehasattr(util, '_hgexecutable'): + # Before 5be286db + _hgexecutable = util.hgexecutable +else: + from mercurial.utils import procutil + _hgexecutable = procutil.hgexecutable + +requirement = "remotefilelog" +_prefetching = _('prefetching') + +# These make*stores functions are global so that other extensions can replace +# them. +def makelocalstores(repo): + """In-repo stores, like .hg/store/data; can not be discarded.""" + localpath = os.path.join(repo.svfs.vfs.base, 'data') + if not os.path.exists(localpath): + os.makedirs(localpath) + + # Instantiate local data stores + localcontent = contentstore.remotefilelogcontentstore( + repo, localpath, repo.name, shared=False) + localmetadata = metadatastore.remotefilelogmetadatastore( + repo, localpath, repo.name, shared=False) + return localcontent, localmetadata + +def makecachestores(repo): + """Typically machine-wide, cache of remote data; can be discarded.""" + # Instantiate shared cache stores + cachepath = shallowutil.getcachepath(repo.ui) + cachecontent = contentstore.remotefilelogcontentstore( + repo, cachepath, repo.name, shared=True) + cachemetadata = metadatastore.remotefilelogmetadatastore( + repo, cachepath, repo.name, shared=True) + + repo.sharedstore = cachecontent + repo.shareddatastores.append(cachecontent) + repo.sharedhistorystores.append(cachemetadata) + + return cachecontent, cachemetadata + +def makeremotestores(repo, cachecontent, cachemetadata): + """These stores fetch data from a remote server.""" + # Instantiate remote stores + repo.fileservice = fileserverclient.fileserverclient(repo) + remotecontent = contentstore.remotecontentstore( + repo.ui, repo.fileservice, cachecontent) + remotemetadata = metadatastore.remotemetadatastore( + repo.ui, repo.fileservice, cachemetadata) + return remotecontent, remotemetadata + +def makepackstores(repo): + """Packs are more efficient (to read from) cache stores.""" + # Instantiate pack stores + packpath = shallowutil.getcachepackpath(repo, + constants.FILEPACK_CATEGORY) + packcontentstore = datapack.datapackstore(repo.ui, packpath) + packmetadatastore = historypack.historypackstore(repo.ui, packpath) + + repo.shareddatastores.append(packcontentstore) + repo.sharedhistorystores.append(packmetadatastore) + shallowutil.reportpackmetrics(repo.ui, 'filestore', packcontentstore, + packmetadatastore) + return packcontentstore, packmetadatastore + +def makeunionstores(repo): + """Union stores iterate the other stores and return the first result.""" + repo.shareddatastores = [] + repo.sharedhistorystores = [] + + packcontentstore, packmetadatastore = makepackstores(repo) + cachecontent, cachemetadata = makecachestores(repo) + localcontent, localmetadata = makelocalstores(repo) + remotecontent, remotemetadata = makeremotestores(repo, cachecontent, + cachemetadata) + + # Instantiate union stores + repo.contentstore = contentstore.unioncontentstore( + packcontentstore, cachecontent, + localcontent, remotecontent, writestore=localcontent) + repo.metadatastore = metadatastore.unionmetadatastore( + packmetadatastore, cachemetadata, localmetadata, remotemetadata, + writestore=localmetadata) + + fileservicedatawrite = cachecontent + fileservicehistorywrite = cachemetadata + if repo.ui.configbool('remotefilelog', 'fetchpacks'): + fileservicedatawrite = packcontentstore + fileservicehistorywrite = packmetadatastore + repo.fileservice.setstore(repo.contentstore, repo.metadatastore, + fileservicedatawrite, fileservicehistorywrite) + shallowutil.reportpackmetrics(repo.ui, 'filestore', + packcontentstore, packmetadatastore) + +def wraprepo(repo): + class shallowrepository(repo.__class__): + @util.propertycache + def name(self): + return self.ui.config('remotefilelog', 'reponame') + + @util.propertycache + def fallbackpath(self): + path = repo.ui.config("remotefilelog", "fallbackpath", + repo.ui.config('paths', 'default')) + if not path: + raise error.Abort("no remotefilelog server " + "configured - is your .hg/hgrc trusted?") + + return path + + def maybesparsematch(self, *revs, **kwargs): + ''' + A wrapper that allows the remotefilelog to invoke sparsematch() if + this is a sparse repository, or returns None if this is not a + sparse repository. + ''' + if revs: + return sparse.matcher(repo, revs=revs) + return sparse.matcher(repo) + + def file(self, f): + if f[0] == '/': + f = f[1:] + + if self.shallowmatch(f): + return remotefilelog.remotefilelog(self.svfs, f, self) + else: + return super(shallowrepository, self).file(f) + + def filectx(self, path, *args, **kwargs): + if self.shallowmatch(path): + return remotefilectx.remotefilectx(self, path, *args, **kwargs) + else: + return super(shallowrepository, self).filectx(path, *args, + **kwargs) + + @localrepo.unfilteredmethod + def commitctx(self, ctx, error=False): + """Add a new revision to current repository. + Revision information is passed via the context argument. + """ + + # some contexts already have manifest nodes, they don't need any + # prefetching (for example if we're just editing a commit message + # we can reuse manifest + if not ctx.manifestnode(): + # prefetch files that will likely be compared + m1 = ctx.p1().manifest() + files = [] + for f in ctx.modified() + ctx.added(): + fparent1 = m1.get(f, nullid) + if fparent1 != nullid: + files.append((f, hex(fparent1))) + self.fileservice.prefetch(files) + return super(shallowrepository, self).commitctx(ctx, + error=error) + + def backgroundprefetch(self, revs, base=None, repack=False, pats=None, + opts=None): + """Runs prefetch in background with optional repack + """ + cmd = [_hgexecutable(), '-R', repo.origroot, 'prefetch'] + if repack: + cmd.append('--repack') + if revs: + cmd += ['-r', revs] + cmd = ' '.join(map(procutil.shellquote, cmd)) + + extutil.runshellcommand(cmd, encoding.environ) + + def prefetch(self, revs, base=None, pats=None, opts=None): + """Prefetches all the necessary file revisions for the given revs + Optionally runs repack in background + """ + with repo._lock(repo.svfs, 'prefetchlock', True, None, None, + _('prefetching in %s') % repo.origroot): + self._prefetch(revs, base, pats, opts) + + def _prefetch(self, revs, base=None, pats=None, opts=None): + fallbackpath = self.fallbackpath + if fallbackpath: + # If we know a rev is on the server, we should fetch the server + # version of those files, since our local file versions might + # become obsolete if the local commits are stripped. + localrevs = repo.revs('outgoing(%s)', fallbackpath) + if base is not None and base != nullrev: + serverbase = list(repo.revs('first(reverse(::%s) - %ld)', + base, localrevs)) + if serverbase: + base = serverbase[0] + else: + localrevs = repo + + mfl = repo.manifestlog + mfrevlog = mfl.getstorage('') + if base is not None: + mfdict = mfl[repo[base].manifestnode()].read() + skip = set(mfdict.iteritems()) + else: + skip = set() + + # Copy the skip set to start large and avoid constant resizing, + # and since it's likely to be very similar to the prefetch set. + files = skip.copy() + serverfiles = skip.copy() + visited = set() + visited.add(nullrev) + revnum = 0 + revcount = len(revs) + self.ui.progress(_prefetching, revnum, total=revcount) + for rev in sorted(revs): + ctx = repo[rev] + if pats: + m = scmutil.match(ctx, pats, opts) + sparsematch = repo.maybesparsematch(rev) + + mfnode = ctx.manifestnode() + mfrev = mfrevlog.rev(mfnode) + + # Decompressing manifests is expensive. + # When possible, only read the deltas. + p1, p2 = mfrevlog.parentrevs(mfrev) + if p1 in visited and p2 in visited: + mfdict = mfl[mfnode].readfast() + else: + mfdict = mfl[mfnode].read() + + diff = mfdict.iteritems() + if pats: + diff = (pf for pf in diff if m(pf[0])) + if sparsematch: + diff = (pf for pf in diff if sparsematch(pf[0])) + if rev not in localrevs: + serverfiles.update(diff) + else: + files.update(diff) + + visited.add(mfrev) + revnum += 1 + self.ui.progress(_prefetching, revnum, total=revcount) + + files.difference_update(skip) + serverfiles.difference_update(skip) + self.ui.progress(_prefetching, None) + + # Fetch files known to be on the server + if serverfiles: + results = [(path, hex(fnode)) for (path, fnode) in serverfiles] + repo.fileservice.prefetch(results, force=True) + + # Fetch files that may or may not be on the server + if files: + results = [(path, hex(fnode)) for (path, fnode) in files] + repo.fileservice.prefetch(results) + + def close(self): + super(shallowrepository, self).close() + self.connectionpool.close() + + repo.__class__ = shallowrepository + + repo.shallowmatch = match.always(repo.root, '') + + makeunionstores(repo) + + repo.includepattern = repo.ui.configlist("remotefilelog", "includepattern", + None) + repo.excludepattern = repo.ui.configlist("remotefilelog", "excludepattern", + None) + if not util.safehasattr(repo, 'connectionpool'): + repo.connectionpool = connectionpool.connectionpool(repo) + + if repo.includepattern or repo.excludepattern: + repo.shallowmatch = match.match(repo.root, '', None, + repo.includepattern, repo.excludepattern) diff --git a/hgext/remotefilelog/shallowstore.py b/hgext/remotefilelog/shallowstore.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/shallowstore.py @@ -0,0 +1,17 @@ +# shallowstore.py - shallow store for interacting with shallow repos +# +# Copyright 2013 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +def wrapstore(store): + class shallowstore(store.__class__): + def __contains__(self, path): + # Assume it exists + return True + + store.__class__ = shallowstore + + return store diff --git a/hgext/remotefilelog/shallowutil.py b/hgext/remotefilelog/shallowutil.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/shallowutil.py @@ -0,0 +1,487 @@ +# shallowutil.py -- remotefilelog utilities +# +# Copyright 2014 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import collections +import errno +import hashlib +import os +import stat +import struct +import tempfile + +from mercurial.i18n import _ +from mercurial import ( + error, + pycompat, + revlog, + util, +) +from mercurial.utils import ( + storageutil, + stringutil, +) +from . import constants + +if not pycompat.iswindows: + import grp + +def getcachekey(reponame, file, id): + pathhash = hashlib.sha1(file).hexdigest() + return os.path.join(reponame, pathhash[:2], pathhash[2:], id) + +def getlocalkey(file, id): + pathhash = hashlib.sha1(file).hexdigest() + return os.path.join(pathhash, id) + +def getcachepath(ui, allowempty=False): + cachepath = ui.config("remotefilelog", "cachepath") + if not cachepath: + if allowempty: + return None + else: + raise error.Abort(_("could not find config option " + "remotefilelog.cachepath")) + return util.expandpath(cachepath) + +def getcachepackpath(repo, category): + cachepath = getcachepath(repo.ui) + if category != constants.FILEPACK_CATEGORY: + return os.path.join(cachepath, repo.name, 'packs', category) + else: + return os.path.join(cachepath, repo.name, 'packs') + +def getlocalpackpath(base, category): + return os.path.join(base, 'packs', category) + +def createrevlogtext(text, copyfrom=None, copyrev=None): + """returns a string that matches the revlog contents in a + traditional revlog + """ + meta = {} + if copyfrom or text.startswith('\1\n'): + if copyfrom: + meta['copy'] = copyfrom + meta['copyrev'] = copyrev + text = storageutil.packmeta(meta, text) + + return text + +def parsemeta(text): + """parse mercurial filelog metadata""" + meta, size = storageutil.parsemeta(text) + if text.startswith('\1\n'): + s = text.index('\1\n', 2) + text = text[s + 2:] + return meta or {}, text + +def sumdicts(*dicts): + """Adds all the values of *dicts together into one dictionary. This assumes + the values in *dicts are all summable. + + e.g. [{'a': 4', 'b': 2}, {'b': 3, 'c': 1}] -> {'a': 4, 'b': 5, 'c': 1} + """ + result = collections.defaultdict(lambda: 0) + for dict in dicts: + for k, v in dict.iteritems(): + result[k] += v + return result + +def prefixkeys(dict, prefix): + """Returns ``dict`` with ``prefix`` prepended to all its keys.""" + result = {} + for k, v in dict.iteritems(): + result[prefix + k] = v + return result + +def reportpackmetrics(ui, prefix, *stores): + dicts = [s.getmetrics() for s in stores] + dict = prefixkeys(sumdicts(*dicts), prefix + '_') + ui.log(prefix + "_packsizes", "", **dict) + +def _parsepackmeta(metabuf): + """parse datapack meta, bytes () -> dict + + The dict contains raw content - both keys and values are strings. + Upper-level business may want to convert some of them to other types like + integers, on their own. + + raise ValueError if the data is corrupted + """ + metadict = {} + offset = 0 + buflen = len(metabuf) + while buflen - offset >= 3: + key = metabuf[offset] + offset += 1 + metalen = struct.unpack_from('!H', metabuf, offset)[0] + offset += 2 + if offset + metalen > buflen: + raise ValueError('corrupted metadata: incomplete buffer') + value = metabuf[offset:offset + metalen] + metadict[key] = value + offset += metalen + if offset != buflen: + raise ValueError('corrupted metadata: redundant data') + return metadict + +def _buildpackmeta(metadict): + """reverse of _parsepackmeta, dict -> bytes () + + The dict contains raw content - both keys and values are strings. + Upper-level business may want to serialize some of other types (like + integers) to strings before calling this function. + + raise ProgrammingError when metadata key is illegal, or ValueError if + length limit is exceeded + """ + metabuf = '' + for k, v in sorted((metadict or {}).iteritems()): + if len(k) != 1: + raise error.ProgrammingError('packmeta: illegal key: %s' % k) + if len(v) > 0xfffe: + raise ValueError('metadata value is too long: 0x%x > 0xfffe' + % len(v)) + metabuf += k + metabuf += struct.pack('!H', len(v)) + metabuf += v + # len(metabuf) is guaranteed representable in 4 bytes, because there are + # only 256 keys, and for each value, len(value) <= 0xfffe. + return metabuf + +_metaitemtypes = { + constants.METAKEYFLAG: (int, long), + constants.METAKEYSIZE: (int, long), +} + +def buildpackmeta(metadict): + """like _buildpackmeta, but typechecks metadict and normalize it. + + This means, METAKEYSIZE and METAKEYSIZE should have integers as values, + and METAKEYFLAG will be dropped if its value is 0. + """ + newmeta = {} + for k, v in (metadict or {}).iteritems(): + expectedtype = _metaitemtypes.get(k, (bytes,)) + if not isinstance(v, expectedtype): + raise error.ProgrammingError('packmeta: wrong type of key %s' % k) + # normalize int to binary buffer + if int in expectedtype: + # optimization: remove flag if it's 0 to save space + if k == constants.METAKEYFLAG and v == 0: + continue + v = int2bin(v) + newmeta[k] = v + return _buildpackmeta(newmeta) + +def parsepackmeta(metabuf): + """like _parsepackmeta, but convert fields to desired types automatically. + + This means, METAKEYFLAG and METAKEYSIZE fields will be converted to + integers. + """ + metadict = _parsepackmeta(metabuf) + for k, v in metadict.iteritems(): + if k in _metaitemtypes and int in _metaitemtypes[k]: + metadict[k] = bin2int(v) + return metadict + +def int2bin(n): + """convert a non-negative integer to raw binary buffer""" + buf = bytearray() + while n > 0: + buf.insert(0, n & 0xff) + n >>= 8 + return bytes(buf) + +def bin2int(buf): + """the reverse of int2bin, convert a binary buffer to an integer""" + x = 0 + for b in bytearray(buf): + x <<= 8 + x |= b + return x + +def parsesizeflags(raw): + """given a remotefilelog blob, return (headersize, rawtextsize, flags) + + see remotefilelogserver.createfileblob for the format. + raise RuntimeError if the content is illformed. + """ + flags = revlog.REVIDX_DEFAULT_FLAGS + size = None + try: + index = raw.index('\0') + header = raw[:index] + if header.startswith('v'): + # v1 and above, header starts with 'v' + if header.startswith('v1\n'): + for s in header.split('\n'): + if s.startswith(constants.METAKEYSIZE): + size = int(s[len(constants.METAKEYSIZE):]) + elif s.startswith(constants.METAKEYFLAG): + flags = int(s[len(constants.METAKEYFLAG):]) + else: + raise RuntimeError('unsupported remotefilelog header: %s' + % header) + else: + # v0, str(int(size)) is the header + size = int(header) + except ValueError: + raise RuntimeError("unexpected remotefilelog header: illegal format") + if size is None: + raise RuntimeError("unexpected remotefilelog header: no size found") + return index + 1, size, flags + +def buildfileblobheader(size, flags, version=None): + """return the header of a remotefilelog blob. + + see remotefilelogserver.createfileblob for the format. + approximately the reverse of parsesizeflags. + + version could be 0 or 1, or None (auto decide). + """ + # choose v0 if flags is empty, otherwise v1 + if version is None: + version = int(bool(flags)) + if version == 1: + header = ('v1\n%s%d\n%s%d' + % (constants.METAKEYSIZE, size, + constants.METAKEYFLAG, flags)) + elif version == 0: + if flags: + raise error.ProgrammingError('fileblob v0 does not support flag') + header = '%d' % size + else: + raise error.ProgrammingError('unknown fileblob version %d' % version) + return header + +def ancestormap(raw): + offset, size, flags = parsesizeflags(raw) + start = offset + size + + mapping = {} + while start < len(raw): + divider = raw.index('\0', start + 80) + + currentnode = raw[start:(start + 20)] + p1 = raw[(start + 20):(start + 40)] + p2 = raw[(start + 40):(start + 60)] + linknode = raw[(start + 60):(start + 80)] + copyfrom = raw[(start + 80):divider] + + mapping[currentnode] = (p1, p2, linknode, copyfrom) + start = divider + 1 + + return mapping + +def readfile(path): + f = open(path, 'rb') + try: + result = f.read() + + # we should never have empty files + if not result: + os.remove(path) + raise IOError("empty file: %s" % path) + + return result + finally: + f.close() + +def unlinkfile(filepath): + if pycompat.iswindows: + # On Windows, os.unlink cannnot delete readonly files + os.chmod(filepath, stat.S_IWUSR) + os.unlink(filepath) + +def renamefile(source, destination): + if pycompat.iswindows: + # On Windows, os.rename cannot rename readonly files + # and cannot overwrite destination if it exists + os.chmod(source, stat.S_IWUSR) + if os.path.isfile(destination): + os.chmod(destination, stat.S_IWUSR) + os.unlink(destination) + + os.rename(source, destination) + +def writefile(path, content, readonly=False): + dirname, filename = os.path.split(path) + if not os.path.exists(dirname): + try: + os.makedirs(dirname) + except OSError as ex: + if ex.errno != errno.EEXIST: + raise + + fd, temp = tempfile.mkstemp(prefix='.%s-' % filename, dir=dirname) + os.close(fd) + + try: + f = util.posixfile(temp, 'wb') + f.write(content) + f.close() + + if readonly: + mode = 0o444 + else: + # tempfiles are created with 0o600, so we need to manually set the + # mode. + oldumask = os.umask(0) + # there's no way to get the umask without modifying it, so set it + # back + os.umask(oldumask) + mode = ~oldumask + + renamefile(temp, path) + os.chmod(path, mode) + except Exception: + try: + unlinkfile(temp) + except OSError: + pass + raise + +def sortnodes(nodes, parentfunc): + """Topologically sorts the nodes, using the parentfunc to find + the parents of nodes.""" + nodes = set(nodes) + childmap = {} + parentmap = {} + roots = [] + + # Build a child and parent map + for n in nodes: + parents = [p for p in parentfunc(n) if p in nodes] + parentmap[n] = set(parents) + for p in parents: + childmap.setdefault(p, set()).add(n) + if not parents: + roots.append(n) + + roots.sort() + # Process roots, adding children to the queue as they become roots + results = [] + while roots: + n = roots.pop(0) + results.append(n) + if n in childmap: + children = childmap[n] + for c in children: + childparents = parentmap[c] + childparents.remove(n) + if len(childparents) == 0: + # insert at the beginning, that way child nodes + # are likely to be output immediately after their + # parents. This gives better compression results. + roots.insert(0, c) + + return results + +def readexactly(stream, n): + '''read n bytes from stream.read and abort if less was available''' + s = stream.read(n) + if len(s) < n: + raise error.Abort(_("stream ended unexpectedly" + " (got %d bytes, expected %d)") + % (len(s), n)) + return s + +def readunpack(stream, fmt): + data = readexactly(stream, struct.calcsize(fmt)) + return struct.unpack(fmt, data) + +def readpath(stream): + rawlen = readexactly(stream, constants.FILENAMESIZE) + pathlen = struct.unpack(constants.FILENAMESTRUCT, rawlen)[0] + return readexactly(stream, pathlen) + +def readnodelist(stream): + rawlen = readexactly(stream, constants.NODECOUNTSIZE) + nodecount = struct.unpack(constants.NODECOUNTSTRUCT, rawlen)[0] + for i in pycompat.xrange(nodecount): + yield readexactly(stream, constants.NODESIZE) + +def readpathlist(stream): + rawlen = readexactly(stream, constants.PATHCOUNTSIZE) + pathcount = struct.unpack(constants.PATHCOUNTSTRUCT, rawlen)[0] + for i in pycompat.xrange(pathcount): + yield readpath(stream) + +def getgid(groupname): + try: + gid = grp.getgrnam(groupname).gr_gid + return gid + except KeyError: + return None + +def setstickygroupdir(path, gid, warn=None): + if gid is None: + return + try: + os.chown(path, -1, gid) + os.chmod(path, 0o2775) + except (IOError, OSError) as ex: + if warn: + warn(_('unable to chown/chmod on %s: %s\n') % (path, ex)) + +def mkstickygroupdir(ui, path): + """Creates the given directory (if it doesn't exist) and give it a + particular group with setgid enabled.""" + gid = None + groupname = ui.config("remotefilelog", "cachegroup") + if groupname: + gid = getgid(groupname) + if gid is None: + ui.warn(_('unable to resolve group name: %s\n') % groupname) + + # we use a single stat syscall to test the existence and mode / group bit + st = None + try: + st = os.stat(path) + except OSError: + pass + + if st: + # exists + if (st.st_mode & 0o2775) != 0o2775 or st.st_gid != gid: + # permission needs to be fixed + setstickygroupdir(path, gid, ui.warn) + return + + oldumask = os.umask(0o002) + try: + missingdirs = [path] + path = os.path.dirname(path) + while path and not os.path.exists(path): + missingdirs.append(path) + path = os.path.dirname(path) + + for path in reversed(missingdirs): + try: + os.mkdir(path) + except OSError as ex: + if ex.errno != errno.EEXIST: + raise + + for path in missingdirs: + setstickygroupdir(path, gid, ui.warn) + finally: + os.umask(oldumask) + +def getusername(ui): + try: + return stringutil.shortuser(ui.username()) + except Exception: + return 'unknown' + +def getreponame(ui): + reponame = ui.config('paths', 'default') + if reponame: + return os.path.basename(reponame) + return "unknown" diff --git a/hgext/remotefilelog/shallowverifier.py b/hgext/remotefilelog/shallowverifier.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/shallowverifier.py @@ -0,0 +1,17 @@ +# shallowverifier.py - shallow repository verifier +# +# Copyright 2015 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +from mercurial.i18n import _ +from mercurial import verify + +class shallowverifier(verify.verifier): + def _verifyfiles(self, filenodes, filelinkrevs): + """Skips files verification since repo's not guaranteed to have them""" + self.repo.ui.status( + _("skipping filelog check since remotefilelog is used\n")) + return 0, 0 diff --git a/hgext/remotefilelog/wirepack.py b/hgext/remotefilelog/wirepack.py new file mode 100644 --- /dev/null +++ b/hgext/remotefilelog/wirepack.py @@ -0,0 +1,235 @@ +# wirepack.py - wireprotocol for exchanging packs +# +# Copyright 2017 Facebook, Inc. +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. +from __future__ import absolute_import + +import StringIO +import collections +import struct + +from mercurial.i18n import _ +from mercurial.node import nullid +from mercurial import ( + pycompat, +) +from . import ( + constants, + datapack, + historypack, + shallowutil, +) + +def sendpackpart(filename, history, data): + """A wirepack is formatted as follows: + + wirepack = + [,...] + [,...] + + hist rev = + + + + + + + data rev = + + + + """ + rawfilenamelen = struct.pack(constants.FILENAMESTRUCT, + len(filename)) + yield '%s%s' % (rawfilenamelen, filename) + + # Serialize and send history + historylen = struct.pack('!I', len(history)) + rawhistory = '' + for entry in history: + copyfrom = entry[4] or '' + copyfromlen = len(copyfrom) + tup = entry[:-1] + (copyfromlen,) + rawhistory += struct.pack('!20s20s20s20sH', *tup) + if copyfrom: + rawhistory += copyfrom + + yield '%s%s' % (historylen, rawhistory) + + # Serialize and send data + yield struct.pack('!I', len(data)) + + # TODO: support datapack metadata + for node, deltabase, delta in data: + deltalen = struct.pack('!Q', len(delta)) + yield '%s%s%s%s' % (node, deltabase, deltalen, delta) + +def closepart(): + return '\0' * 10 + +def receivepack(ui, fh, packpath): + receiveddata = [] + receivedhistory = [] + shallowutil.mkstickygroupdir(ui, packpath) + totalcount = 0 + ui.progress(_("receiving pack"), totalcount) + with datapack.mutabledatapack(ui, packpath) as dpack: + with historypack.mutablehistorypack(ui, packpath) as hpack: + pendinghistory = collections.defaultdict(dict) + while True: + filename = shallowutil.readpath(fh) + count = 0 + + # Store the history for later sorting + for value in readhistory(fh): + node = value[0] + pendinghistory[filename][node] = value + receivedhistory.append((filename, node)) + count += 1 + + for node, deltabase, delta in readdeltas(fh): + dpack.add(filename, node, deltabase, delta) + receiveddata.append((filename, node)) + count += 1 + + if count == 0 and filename == '': + break + totalcount += 1 + ui.progress(_("receiving pack"), totalcount) + + # Add history to pack in toposorted order + for filename, nodevalues in sorted(pendinghistory.iteritems()): + def _parentfunc(node): + p1, p2 = nodevalues[node][1:3] + parents = [] + if p1 != nullid: + parents.append(p1) + if p2 != nullid: + parents.append(p2) + return parents + sortednodes = reversed(shallowutil.sortnodes( + nodevalues.iterkeys(), + _parentfunc)) + for node in sortednodes: + node, p1, p2, linknode, copyfrom = nodevalues[node] + hpack.add(filename, node, p1, p2, linknode, copyfrom) + ui.progress(_("receiving pack"), None) + + return receiveddata, receivedhistory + +def readhistory(fh): + count = shallowutil.readunpack(fh, '!I')[0] + for i in pycompat.xrange(count): + entry = shallowutil.readunpack(fh,'!20s20s20s20sH') + if entry[4] != 0: + copyfrom = shallowutil.readexactly(fh, entry[4]) + else: + copyfrom = '' + entry = entry[:4] + (copyfrom,) + yield entry + +def readdeltas(fh): + count = shallowutil.readunpack(fh, '!I')[0] + for i in pycompat.xrange(count): + node, deltabase, deltalen = shallowutil.readunpack(fh, '!20s20sQ') + delta = shallowutil.readexactly(fh, deltalen) + yield (node, deltabase, delta) + +class wirepackstore(object): + def __init__(self, wirepack): + self._data = {} + self._history = {} + fh = StringIO.StringIO(wirepack) + self._load(fh) + + def get(self, name, node): + raise RuntimeError("must use getdeltachain with wirepackstore") + + def getdeltachain(self, name, node): + delta, deltabase = self._data[(name, node)] + return [(name, node, name, deltabase, delta)] + + def getmeta(self, name, node): + try: + size = len(self._data[(name, node)]) + except KeyError: + raise KeyError((name, hex(node))) + return {constants.METAKEYFLAG: '', + constants.METAKEYSIZE: size} + + def getancestors(self, name, node, known=None): + if known is None: + known = set() + if node in known: + return [] + + ancestors = {} + seen = set() + missing = [(name, node)] + while missing: + curname, curnode = missing.pop() + info = self._history.get((name, node)) + if info is None: + continue + + p1, p2, linknode, copyfrom = info + if p1 != nullid and p1 not in known: + key = (name if not copyfrom else copyfrom, p1) + if key not in seen: + seen.add(key) + missing.append(key) + if p2 != nullid and p2 not in known: + key = (name, p2) + if key not in seen: + seen.add(key) + missing.append(key) + + ancestors[curnode] = (p1, p2, linknode, copyfrom) + if not ancestors: + raise KeyError((name, hex(node))) + return ancestors + + def getnodeinfo(self, name, node): + try: + return self._history[(name, node)] + except KeyError: + raise KeyError((name, hex(node))) + + def add(self, *args): + raise RuntimeError("cannot add to a wirepack store") + + def getmissing(self, keys): + missing = [] + for name, node in keys: + if (name, node) not in self._data: + missing.append((name, node)) + + return missing + + def _load(self, fh): + data = self._data + history = self._history + while True: + filename = shallowutil.readpath(fh) + count = 0 + + # Store the history for later sorting + for value in readhistory(fh): + node = value[0] + history[(filename, node)] = value[1:] + count += 1 + + for node, deltabase, delta in readdeltas(fh): + data[(filename, node)] = (delta, deltabase) + count += 1 + + if count == 0 and filename == '': + break + + def markledger(self, ledger, options=None): + pass + + def cleanup(self, ledger): + pass diff --git a/setup.py b/setup.py --- a/setup.py +++ b/setup.py @@ -844,6 +844,7 @@ 'hgext.infinitepush', 'hgext.highlight', 'hgext.largefiles', 'hgext.lfs', 'hgext.narrow', + 'hgext.remotefilelog', 'hgext.zeroconf', 'hgext3rd', 'hgdemandimport'] if sys.version_info[0] == 2: diff --git a/tests/ls-l.py b/tests/ls-l.py new file mode 100755 --- /dev/null +++ b/tests/ls-l.py @@ -0,0 +1,37 @@ +#!/usr/bin/env python + +# like ls -l, but do not print date, user, or non-common mode bit, to avoid +# using globs in tests. +from __future__ import absolute_import, print_function + +import os +import stat +import sys + +def modestr(st): + mode = st.st_mode + result = '' + if mode & stat.S_IFDIR: + result += 'd' + else: + result += '-' + for owner in ['USR', 'GRP', 'OTH']: + for action in ['R', 'W', 'X']: + if mode & getattr(stat, 'S_I%s%s' % (action, owner)): + result += action.lower() + else: + result += '-' + return result + +def sizestr(st): + if st.st_mode & stat.S_IFREG: + return '%7d' % st.st_size + else: + # do not show size for non regular files + return ' ' * 7 + +os.chdir((sys.argv[1:] + ['.'])[0]) + +for name in sorted(os.listdir('.')): + st = os.stat(name) + print('%s %s %s' % (modestr(st), sizestr(st), name)) diff --git a/tests/remotefilelog-getflogheads.py b/tests/remotefilelog-getflogheads.py new file mode 100644 --- /dev/null +++ b/tests/remotefilelog-getflogheads.py @@ -0,0 +1,31 @@ +from __future__ import absolute_import + +from mercurial.i18n import _ +from mercurial import ( + hg, + registrar, +) + +cmdtable = {} +command = registrar.command(cmdtable) + +@command('getflogheads', + [], + 'path') +def getflogheads(ui, repo, path): + """ + Extension printing a remotefilelog's heads + + Used for testing purpose + """ + + dest = repo.ui.expandpath('default') + peer = hg.peer(repo, {}, dest) + + flogheads = peer.getflogheads(path) + + if flogheads: + for head in flogheads: + ui.write(head + '\n') + else: + ui.write(_('EMPTY\n')) diff --git a/tests/remotefilelog-library.sh b/tests/remotefilelog-library.sh new file mode 100644 --- /dev/null +++ b/tests/remotefilelog-library.sh @@ -0,0 +1,88 @@ +${PYTHON:-python} -c 'import lz4' || exit 80 + +CACHEDIR=$PWD/hgcache +cat >> $HGRCPATH <> $dest/.hg/hgrc <> $dest/.hg/hgrc < "$1" + hg add "$1" + hg ci -m "$1" +} + +ls_l() { + $PYTHON $TESTDIR/ls-l.py "$@" +} + +identifyrflcaps() { + xargs -n 1 echo | egrep '(remotefilelog|getflogheads|getfile)' | sort +} diff --git a/tests/test-remotefilelog-bad-configs.t b/tests/test-remotefilelog-bad-configs.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-bad-configs.t @@ -0,0 +1,41 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo y > y + $ echo z > z + $ hg commit -qAm xy + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow + +Verify error message when noc achepath specified + $ hg up -q null + $ cp $HGRCPATH $HGRCPATH.bak + $ grep -v cachepath < $HGRCPATH.bak > tmp + $ mv tmp $HGRCPATH + $ hg up tip + abort: could not find config option remotefilelog.cachepath + [255] + $ mv $HGRCPATH.bak $HGRCPATH + +Verify error message when no fallback specified + + $ hg up -q null + $ rm .hg/hgrc + $ clearcache + $ hg up tip + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + abort: no remotefilelog server configured - is your .hg/hgrc trusted? + [255] diff --git a/tests/test-remotefilelog-bgprefetch.t b/tests/test-remotefilelog-bgprefetch.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-bgprefetch.t @@ -0,0 +1,370 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo z > z + $ hg commit -qAm x + $ echo x2 > x + $ echo y > y + $ hg commit -qAm y + $ echo w > w + $ rm z + $ hg commit -qAm w + $ hg bookmark foo + + $ cd .. + +# clone the repo + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 2 files to transfer, 776 bytes of data + transferred 776 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + +# Set the prefetchdays config to zero so that all commits are prefetched +# no matter what their creation date is. Also set prefetchdelay config +# to zero so that there is no delay between prefetches. + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > prefetchdays=0 + > prefetchdelay=0 + > EOF + $ cd .. + +# prefetch a revision + $ cd shallow + + $ hg prefetch -r 0 + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 0 x + x + +# background prefetch on pull when configured + + $ cat >> .hg/hgrc < [remotefilelog] + > pullprefetch=bookmark() + > backgroundprefetch=True + > EOF + $ hg strip tip + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/6b4b6f66ef8c-b4b8bdaf-backup.hg (glob) + + $ clearcache + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + updating bookmark foo + new changesets 6b4b6f66ef8c + (run 'hg update' to get a working copy) + prefetching file contents + $ sleep 0.5 + $ hg debugwaitonprefetch >/dev/null 2>%1 + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/ef95c5376f34698742fe34f315fd82136f8f68c0 + $TESTTMP/hgcache/master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca + $TESTTMP/hgcache/master/af/f024fe4ab0fece4091de044c58c9ae4233383a/bb6ccd5dceaa5e9dc220e0dad65e051b94f69a2c + $TESTTMP/hgcache/repos + +# background prefetch with repack on pull when configured + + $ cat >> .hg/hgrc < [remotefilelog] + > backgroundrepack=True + > EOF + $ hg strip tip + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/6b4b6f66ef8c-b4b8bdaf-backup.hg (glob) + + $ clearcache + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + updating bookmark foo + new changesets 6b4b6f66ef8c + (run 'hg update' to get a working copy) + prefetching file contents + $ sleep 0.5 + $ hg debugwaitonprefetch >/dev/null 2>%1 + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/94d53eef9e622533aec1fc6d8053cb086e785d21.histidx + $TESTTMP/hgcache/master/packs/94d53eef9e622533aec1fc6d8053cb086e785d21.histpack + $TESTTMP/hgcache/master/packs/f3644bc7773e8289deda7f765138120c838f4e6e.dataidx + $TESTTMP/hgcache/master/packs/f3644bc7773e8289deda7f765138120c838f4e6e.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# background prefetch with repack on update when wcprevset configured + + $ clearcache + $ hg up -r 0 + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $TESTTMP/hgcache/master/39/5df8f7c51f007019cb30201c49e884b46b92fa/69a1b67522704ec122181c0890bd16e9d3e7516a + $TESTTMP/hgcache/repos + + $ hg up -r 1 + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 2 files fetched over 2 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + + $ cat >> .hg/hgrc < [remotefilelog] + > bgprefetchrevs=.:: + > EOF + + $ clearcache + $ hg up -r 0 + 1 files updated, 0 files merged, 1 files removed, 0 files unresolved + * files fetched over * fetches - (* misses, 0.00% hit ratio) over *s (glob) + $ sleep 1 + $ hg debugwaitonprefetch >/dev/null 2>%1 + $ sleep 1 + $ hg debugwaitonrepack >/dev/null 2>%1 + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histidx + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histpack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.dataidx + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Ensure that file 'w' was prefetched - it was not part of the update operation and therefore +# could only be downloaded by the background prefetch + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0: + w: + Node Delta Base Delta Length Blob Size + bb6ccd5dceaa 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + x: + Node Delta Base Delta Length Blob Size + ef95c5376f34 000000000000 3 3 + 1406e7411862 ef95c5376f34 14 2 + + Total: 17 5 (240.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 076f5e2225b3 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + z: + Node Delta Base Delta Length Blob Size + 69a1b6752270 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + +# background prefetch with repack on commit when wcprevset configured + + $ cat >> .hg/hgrc < [remotefilelog] + > bgprefetchrevs=0:: + > EOF + + $ clearcache + $ find $CACHEDIR -type f | sort + $ echo b > b + $ hg commit -qAm b + * files fetched over 1 fetches - (* misses, 0.00% hit ratio) over *s (glob) + $ hg bookmark temporary + $ sleep 1 + $ hg debugwaitonprefetch >/dev/null 2>%1 + $ sleep 1 + $ hg debugwaitonrepack >/dev/null 2>%1 + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histidx + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histpack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.dataidx + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Ensure that file 'w' was prefetched - it was not part of the commit operation and therefore +# could only be downloaded by the background prefetch + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0: + w: + Node Delta Base Delta Length Blob Size + bb6ccd5dceaa 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + x: + Node Delta Base Delta Length Blob Size + ef95c5376f34 000000000000 3 3 + 1406e7411862 ef95c5376f34 14 2 + + Total: 17 5 (240.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 076f5e2225b3 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + z: + Node Delta Base Delta Length Blob Size + 69a1b6752270 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + +# background prefetch with repack on rebase when wcprevset configured + + $ hg up -r 2 + 3 files updated, 0 files merged, 3 files removed, 0 files unresolved + (leaving bookmark temporary) + $ clearcache + $ find $CACHEDIR -type f | sort + $ hg rebase -s temporary -d foo + rebasing 3:58147a5b5242 "b" (temporary tip) + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/58147a5b5242-c3678817-rebase.hg (glob) + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ sleep 1 + $ hg debugwaitonprefetch >/dev/null 2>%1 + $ sleep 1 + $ hg debugwaitonrepack >/dev/null 2>%1 + +# Ensure that file 'y' was prefetched - it was not part of the rebase operation and therefore +# could only be downloaded by the background prefetch + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0: + w: + Node Delta Base Delta Length Blob Size + bb6ccd5dceaa 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + x: + Node Delta Base Delta Length Blob Size + ef95c5376f34 000000000000 3 3 + 1406e7411862 ef95c5376f34 14 2 + + Total: 17 5 (240.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 076f5e2225b3 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + z: + Node Delta Base Delta Length Blob Size + 69a1b6752270 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + +# Check that foregound prefetch with no arguments blocks until background prefetches finish + + $ hg up -r 3 + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ clearcache + $ hg prefetch --repack + waiting for lock on prefetching in $TESTTMP/shallow held by process * on host * (glob) (?) + got lock after * seconds (glob) (?) + (running background incremental repack) + * files fetched over 1 fetches - (* misses, 0.00% hit ratio) over *s (glob) (?) + + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histidx + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histpack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.dataidx + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Ensure that files were prefetched + $ hg debugdatapack $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0: + w: + Node Delta Base Delta Length Blob Size + bb6ccd5dceaa 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + x: + Node Delta Base Delta Length Blob Size + ef95c5376f34 000000000000 3 3 + 1406e7411862 ef95c5376f34 14 2 + + Total: 17 5 (240.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 076f5e2225b3 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + z: + Node Delta Base Delta Length Blob Size + 69a1b6752270 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + +# Check that foreground prefetch fetches revs specified by '. + draft() + bgprefetchrevs + pullprefetch' + + $ clearcache + $ hg prefetch --repack + waiting for lock on prefetching in $TESTTMP/shallow held by process * on host * (glob) (?) + got lock after * seconds (glob) (?) + (running background incremental repack) + * files fetched over 1 fetches - (* misses, 0.00% hit ratio) over *s (glob) (?) + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histidx + $TESTTMP/hgcache/master/packs/27c52c105a1ddf8c75143a6b279b04c24b1f4bee.histpack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.dataidx + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Ensure that files were prefetched + $ hg debugdatapack $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0.datapack + $TESTTMP/hgcache/master/packs/8299d5a1030f073f4adbb3b6bd2ad3bdcc276df0: + w: + Node Delta Base Delta Length Blob Size + bb6ccd5dceaa 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + x: + Node Delta Base Delta Length Blob Size + ef95c5376f34 000000000000 3 3 + 1406e7411862 ef95c5376f34 14 2 + + Total: 17 5 (240.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 076f5e2225b3 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + z: + Node Delta Base Delta Length Blob Size + 69a1b6752270 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + +# Test that if data was prefetched and repacked we dont need to prefetch it again +# It ensures that Mercurial looks not only in loose files but in packs as well + + $ hg prefetch --repack + (running background incremental repack) diff --git a/tests/test-remotefilelog-blame.t b/tests/test-remotefilelog-blame.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-blame.t @@ -0,0 +1,33 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo y >> x + $ hg commit -qAm y + $ echo z >> x + $ hg commit -qAm z + $ echo a > a + $ hg commit -qAm a + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow + +Test blame + + $ hg blame x + 0: x + 1: y + 2: z + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) diff --git a/tests/test-remotefilelog-bundle2-legacy.t b/tests/test-remotefilelog-bundle2-legacy.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-bundle2-legacy.t @@ -0,0 +1,93 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + +generaldelta to generaldelta interactions with bundle2 but legacy clients +without changegroup2 support + $ cat > testcg2.py << EOF + > from mercurial import changegroup, registrar, util + > import sys + > cmdtable = {} + > command = registrar.command(cmdtable) + > @command('testcg2', norepo=True) + > def testcg2(ui): + > if not util.safehasattr(changegroup, 'cg2packer'): + > sys.exit(80) + > EOF + $ cat >> $HGRCPATH << EOF + > [extensions] + > testcg2 = $TESTTMP/testcg2.py + > EOF + $ hg testcg2 || exit 80 + + $ cat > disablecg2.py << EOF + > from mercurial import changegroup, util, error + > deleted = False + > def reposetup(ui, repo): + > global deleted + > if deleted: + > return + > packermap = changegroup._packermap + > # protect against future changes + > if len(packermap) != 3: + > raise error.Abort('packermap has %d versions, expected 3!' % len(packermap)) + > for k in ['01', '02', '03']: + > if not packermap.get(k): + > raise error.Abort("packermap doesn't have key '%s'!" % k) + > + > del packermap['02'] + > deleted = True + > EOF + + $ hginit master + $ grep generaldelta master/.hg/requires + generaldelta + $ cd master +preferuncompressed = False so that we can make both generaldelta and non-generaldelta clones + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > [experimental] + > bundle2-exp = True + > [server] + > preferuncompressed = False + > EOF + $ echo x > x + $ hg commit -qAm x + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q --pull --config experimental.bundle2-exp=True + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow + $ cat >> .hg/hgrc << EOF + > [extensions] + > disablecg2 = $TESTTMP/disablecg2.py + > EOF + + $ cd ../master + $ echo y > y + $ hg commit -qAm y + + $ cd ../shallow + $ hg pull -u + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets d34c38483be9 + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ echo a > a + $ hg commit -qAm a + $ hg push + pushing to ssh://user@dummy/master + searching for changes + remote: adding changesets + remote: adding manifests + remote: adding file changes + remote: added 1 changesets with 1 changes to 1 files diff --git a/tests/test-remotefilelog-bundle2.t b/tests/test-remotefilelog-bundle2.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-bundle2.t @@ -0,0 +1,79 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ grep generaldelta master/.hg/requires + generaldelta + $ cd master +preferuncompressed = False so that we can make both generaldelta and non-generaldelta clones + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > [experimental] + > bundle2-exp = True + > [server] + > preferuncompressed = False + > EOF + $ echo x > x + $ hg commit -qAm x + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow-generaldelta -q --pull --config experimental.bundle2-exp=True + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ grep generaldelta shallow-generaldelta/.hg/requires + generaldelta + $ hgcloneshallow ssh://user@dummy/master shallow-plain -q --pull --config format.usegeneraldelta=False --config format.generaldelta=False --config experimental.bundle2-exp=True + $ grep generaldelta shallow-plain/.hg/requires + [1] + + $ cd master + $ echo a > a + $ hg commit -qAm a + +pull from generaldelta to generaldelta + $ cd ../shallow-generaldelta + $ hg pull -u + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets 2fbb8bb2b903 + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) +push from generaldelta to generaldelta + $ echo b > b + $ hg commit -qAm b + $ hg push + pushing to ssh://user@dummy/master + searching for changes + remote: adding changesets + remote: adding manifests + remote: adding file changes + remote: added 1 changesets with 1 changes to 1 files +pull from generaldelta to non-generaldelta + $ cd ../shallow-plain + $ hg pull -u + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 2 changesets with 0 changes to 0 files + new changesets 2fbb8bb2b903:d6788bd632ca + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) +push from non-generaldelta to generaldelta + $ echo c > c + $ hg commit -qAm c + $ hg push + pushing to ssh://user@dummy/master + searching for changes + remote: adding changesets + remote: adding manifests + remote: adding file changes + remote: added 1 changesets with 1 changes to 1 files diff --git a/tests/test-remotefilelog-bundles.t b/tests/test-remotefilelog-bundles.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-bundles.t @@ -0,0 +1,76 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo y >> x + $ hg commit -qAm y + $ echo z >> x + $ hg commit -qAm z + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow + +Unbundling a shallow bundle + + $ hg strip -r 66ee28d0328c + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/66ee28d0328c-3d7aafd1-backup.hg (glob) + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ hg unbundle .hg/strip-backup/66ee28d0328c-3d7aafd1-backup.hg + adding changesets + adding manifests + adding file changes + added 2 changesets with 0 changes to 0 files + new changesets 66ee28d0328c:16db62c5946f + (run 'hg update' to get a working copy) + +Unbundling a full bundle + + $ hg -R ../master bundle -r 66ee28d0328c:: --base "66ee28d0328c^" ../fullbundle.hg + 2 changesets found + $ hg strip -r 66ee28d0328c + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/66ee28d0328c-3d7aafd1-backup.hg (glob) + $ hg unbundle ../fullbundle.hg + adding changesets + adding manifests + adding file changes + added 2 changesets with 2 changes to 1 files + new changesets 66ee28d0328c:16db62c5946f (2 drafts) + (run 'hg update' to get a working copy) + +Pulling from a shallow bundle + + $ hg strip -r 66ee28d0328c + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/66ee28d0328c-3d7aafd1-backup.hg (glob) + $ hg pull -r 66ee28d0328c .hg/strip-backup/66ee28d0328c-3d7aafd1-backup.hg + pulling from .hg/strip-backup/66ee28d0328c-3d7aafd1-backup.hg + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets 66ee28d0328c (1 drafts) + (run 'hg update' to get a working copy) + +Pulling from a full bundle + + $ hg strip -r 66ee28d0328c + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/66ee28d0328c-b6ee89e7-backup.hg (glob) + $ hg pull -r 66ee28d0328c ../fullbundle.hg + pulling from ../fullbundle.hg + searching for changes + abort: cannot pull from full bundles + (use `hg unbundle` instead) + [255] diff --git a/tests/test-remotefilelog-cacheprocess.t b/tests/test-remotefilelog-cacheprocess.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-cacheprocess.t @@ -0,0 +1,122 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hg init repo + $ cd repo + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo y > y + $ echo z > z + $ hg commit -qAm xy + $ cd .. + + $ cat > cacheprocess-logger.py < import sys, os, shutil + > f = open('$TESTTMP/cachelog.log', 'w') + > srccache = os.path.join('$TESTTMP', 'oldhgcache') + > def log(message): + > f.write(message) + > f.flush() + > destcache = sys.argv[-1] + > try: + > while True: + > cmd = sys.stdin.readline().strip() + > log('got command %r\n' % cmd) + > if cmd == 'exit': + > sys.exit(0) + > elif cmd == 'get': + > count = int(sys.stdin.readline()) + > log('client wants %r blobs\n' % count) + > wants = [] + > for _ in xrange(count): + > key = sys.stdin.readline()[:-1] + > wants.append(key) + > if '\0' in key: + > _, key = key.split('\0') + > srcpath = os.path.join(srccache, key) + > if os.path.exists(srcpath): + > dest = os.path.join(destcache, key) + > destdir = os.path.dirname(dest) + > if not os.path.exists(destdir): + > os.makedirs(destdir) + > shutil.copyfile(srcpath, dest) + > else: + > # report a cache miss + > sys.stdout.write(key + '\n') + > sys.stdout.write('0\n') + > for key in sorted(wants): + > log('requested %r\n' % key) + > sys.stdout.flush() + > elif cmd == 'set': + > assert False, 'todo writing' + > else: + > assert False, 'unknown command! %r' % cmd + > except Exception as e: + > log('Exception! %r\n' % e) + > raise + > EOF + + $ cat >> $HGRCPATH < [remotefilelog] + > cacheprocess = python $TESTTMP/cacheprocess-logger.py + > EOF + +Test cache keys and cache misses. + $ hgcloneshallow ssh://user@dummy/repo clone -q + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ cat cachelog.log + got command 'get' + client wants 3 blobs + requested 'master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0' + requested 'master/39/5df8f7c51f007019cb30201c49e884b46b92fa/69a1b67522704ec122181c0890bd16e9d3e7516a' + requested 'master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca' + got command 'set' + Exception! AssertionError('todo writing',) + +Test cache hits. + $ mv hgcache oldhgcache + $ rm cachelog.log + $ hgcloneshallow ssh://user@dummy/repo clone-cachehit -q + 3 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over *s (glob) + $ cat cachelog.log | grep -v exit + got command 'get' + client wants 3 blobs + requested 'master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0' + requested 'master/39/5df8f7c51f007019cb30201c49e884b46b92fa/69a1b67522704ec122181c0890bd16e9d3e7516a' + requested 'master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca' + + $ cat >> $HGRCPATH < [remotefilelog] + > cacheprocess.includepath = yes + > EOF + +Test cache keys and cache misses with includepath. + $ rm -r hgcache oldhgcache + $ rm cachelog.log + $ hgcloneshallow ssh://user@dummy/repo clone-withpath -q + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ cat cachelog.log + got command 'get' + client wants 3 blobs + requested 'x\x00master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0' + requested 'y\x00master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca' + requested 'z\x00master/39/5df8f7c51f007019cb30201c49e884b46b92fa/69a1b67522704ec122181c0890bd16e9d3e7516a' + got command 'set' + Exception! AssertionError('todo writing',) + +Test cache hits with includepath. + $ mv hgcache oldhgcache + $ rm cachelog.log + $ hgcloneshallow ssh://user@dummy/repo clone-withpath-cachehit -q + 3 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over *s (glob) + $ cat cachelog.log | grep -v exit + got command 'get' + client wants 3 blobs + requested 'x\x00master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0' + requested 'y\x00master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca' + requested 'z\x00master/39/5df8f7c51f007019cb30201c49e884b46b92fa/69a1b67522704ec122181c0890bd16e9d3e7516a' diff --git a/tests/test-remotefilelog-clone-tree.t b/tests/test-remotefilelog-clone-tree.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-clone-tree.t @@ -0,0 +1,117 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ echo treemanifest >> .hg/requires + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF +# uppercase directory name to test encoding + $ mkdir -p A/B + $ echo x > A/B/x + $ hg commit -qAm x + + $ cd .. + +# shallow clone from full + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 4 files to transfer, 449 bytes of data + transferred 449 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow + $ cat .hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store + treemanifest + $ find .hg/store/meta | sort + .hg/store/meta + .hg/store/meta/_a + .hg/store/meta/_a/00manifest.i + .hg/store/meta/_a/_b + .hg/store/meta/_a/_b/00manifest.i + + $ hg update + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ cat A/B/x + x + + $ ls .hg/store/data + $ echo foo > A/B/F + $ hg add A/B/F + $ hg ci -m 'local content' + $ ls .hg/store/data + ca31988f085bfb945cb8115b78fabdee40f741aa + + $ cd .. + +# shallow clone from shallow + + $ hgcloneshallow ssh://user@dummy/shallow shallow2 --noupdate + streaming all changes + 5 files to transfer, 1008 bytes of data + transferred 1008 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow2 + $ cat .hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store + treemanifest + $ ls .hg/store/data + ca31988f085bfb945cb8115b78fabdee40f741aa + + $ hg update + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + + $ cat A/B/x + x + + $ cd .. + +# full clone from shallow +# - send stderr to /dev/null because the order of stdout/err causes +# flakiness here + $ hg clone --noupdate ssh://user@dummy/shallow full 2>/dev/null + streaming all changes + remote: abort: Cannot clone from a shallow repo to a full repo. + [255] + +# getbundle full clone + + $ printf '[server]\npreferuncompressed=False\n' >> master/.hg/hgrc + $ hgcloneshallow ssh://user@dummy/master shallow3 + requesting all changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets 18d955ee7ba0 + updating to branch default + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + + $ ls shallow3/.hg/store/data + $ cat shallow3/.hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store + treemanifest diff --git a/tests/test-remotefilelog-clone.t b/tests/test-remotefilelog-clone.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-clone.t @@ -0,0 +1,113 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + + $ cd .. + +# shallow clone from full + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 2 files to transfer, 227 bytes of data + transferred 227 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow + $ cat .hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store + + $ hg update + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ cat x + x + + $ ls .hg/store/data + $ echo foo > f + $ hg add f + $ hg ci -m 'local content' + $ ls .hg/store/data + 4a0a19218e082a343a1b17e5333409af9d98f0f5 + + $ cd .. + +# shallow clone from shallow + + $ hgcloneshallow ssh://user@dummy/shallow shallow2 --noupdate + streaming all changes + 3 files to transfer, 564 bytes of data + transferred 564 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow2 + $ cat .hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store + $ ls .hg/store/data + 4a0a19218e082a343a1b17e5333409af9d98f0f5 + + $ hg update + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + + $ cat x + x + + $ cd .. + +# full clone from shallow + +Note: the output to STDERR comes from a different process to the output on +STDOUT and their relative ordering is not deterministic. As a result, the test +was failing sporadically. To avoid this, we capture STDERR to a file and +check its contents separately. + + $ TEMP_STDERR=full-clone-from-shallow.stderr.tmp + $ hg clone --noupdate ssh://user@dummy/shallow full 2>$TEMP_STDERR + streaming all changes + remote: abort: Cannot clone from a shallow repo to a full repo. + [255] + $ cat $TEMP_STDERR + abort: pull failed on remote + $ rm $TEMP_STDERR + +# getbundle full clone + + $ printf '[server]\npreferuncompressed=False\n' >> master/.hg/hgrc + $ hgcloneshallow ssh://user@dummy/master shallow3 + requesting all changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets b292c1e3311f + updating to branch default + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + + $ ls shallow3/.hg/store/data + $ cat shallow3/.hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store diff --git a/tests/test-remotefilelog-corrupt-cache.t b/tests/test-remotefilelog-corrupt-cache.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-corrupt-cache.t @@ -0,0 +1,73 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo y > y + $ echo z > z + $ hg commit -qAm xy + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow + +Verify corrupt cache handling repairs by default + + $ hg up -q null + $ chmod u+w $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $ echo x > $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $ hg up tip + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +Verify corrupt cache error message + + $ hg up -q null + $ cat >> .hg/hgrc < [remotefilelog] + > validatecache=off + > EOF + $ chmod u+w $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $ echo x > $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $ hg up tip 2>&1 | egrep "^RuntimeError" + RuntimeError: unexpected remotefilelog header: illegal format + +Verify detection and remediation when remotefilelog.validatecachelog is set + + $ cat >> .hg/hgrc < [remotefilelog] + > validatecachelog=$PWD/.hg/remotefilelog_cache.log + > validatecache=strict + > EOF + $ chmod u+w $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $ echo x > $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 + $ hg up tip + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cat .hg/remotefilelog_cache.log + corrupt $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 during contains + +Verify handling of corrupt server cache + + $ rm -f ../master/.hg/remotefilelogcache/y/076f5e2225b3ff0400b98c92aa6cdf403ee24cca + $ touch ../master/.hg/remotefilelogcache/y/076f5e2225b3ff0400b98c92aa6cdf403ee24cca + $ clearcache + $ hg prefetch -r . + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ test -s ../master/.hg/remotefilelogcache/y/076f5e2225b3ff0400b98c92aa6cdf403ee24cca + $ hg debugremotefilelog $CACHEDIR/master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca + size: 2 bytes + path: $TESTTMP/hgcache/master/95/cb0bfd2977c761298d9624e4b4d4c72a39974a/076f5e2225b3ff0400b98c92aa6cdf403ee24cca + key: 076f5e2225b3 + + node => p1 p2 linknode copyfrom + 076f5e2225b3 => 000000000000 000000000000 f3d0bb0d1e48 diff --git a/tests/test-remotefilelog-datapack.py b/tests/test-remotefilelog-datapack.py new file mode 100755 --- /dev/null +++ b/tests/test-remotefilelog-datapack.py @@ -0,0 +1,388 @@ +#!/usr/bin/env python +from __future__ import absolute_import, print_function + +import hashlib +import os +import random +import shutil +import stat +import struct +import sys +import tempfile +import time +import unittest + +import silenttestrunner + +# Load the local remotefilelog, not the system one +sys.path[0:0] = [os.path.join(os.path.dirname(__file__), '..')] +from mercurial.node import nullid +from mercurial import ( + ui as uimod, +) +from hgext.remotefilelog import ( + basepack, + constants, + datapack, +) + +class datapacktestsbase(object): + def __init__(self, datapackreader, paramsavailable): + self.datapackreader = datapackreader + self.paramsavailable = paramsavailable + + def setUp(self): + self.tempdirs = [] + + def tearDown(self): + for d in self.tempdirs: + shutil.rmtree(d) + + def makeTempDir(self): + tempdir = tempfile.mkdtemp() + self.tempdirs.append(tempdir) + return tempdir + + def getHash(self, content): + return hashlib.sha1(content).digest() + + def getFakeHash(self): + return ''.join(chr(random.randint(0, 255)) for _ in range(20)) + + def createPack(self, revisions=None, packdir=None, version=0): + if revisions is None: + revisions = [("filename", self.getFakeHash(), nullid, "content")] + + if packdir is None: + packdir = self.makeTempDir() + + packer = datapack.mutabledatapack( + uimod.ui(), packdir, version=version) + + for args in revisions: + filename, node, base, content = args[0:4] + # meta is optional + meta = None + if len(args) > 4: + meta = args[4] + packer.add(filename, node, base, content, metadata=meta) + + path = packer.close() + return self.datapackreader(path) + + def _testAddSingle(self, content): + """Test putting a simple blob into a pack and reading it out. + """ + filename = "foo" + node = self.getHash(content) + + revisions = [(filename, node, nullid, content)] + pack = self.createPack(revisions) + if self.paramsavailable: + self.assertEquals(pack.params.fanoutprefix, + basepack.SMALLFANOUTPREFIX) + + chain = pack.getdeltachain(filename, node) + self.assertEquals(content, chain[0][4]) + + def testAddSingle(self): + self._testAddSingle('') + + def testAddSingleEmpty(self): + self._testAddSingle('abcdef') + + def testAddMultiple(self): + """Test putting multiple unrelated blobs into a pack and reading them + out. + """ + revisions = [] + for i in range(10): + filename = "foo%s" % i + content = "abcdef%s" % i + node = self.getHash(content) + revisions.append((filename, node, self.getFakeHash(), content)) + + pack = self.createPack(revisions) + + for filename, node, base, content in revisions: + entry = pack.getdelta(filename, node) + self.assertEquals((content, filename, base, {}), entry) + + chain = pack.getdeltachain(filename, node) + self.assertEquals(content, chain[0][4]) + + def testAddDeltas(self): + """Test putting multiple delta blobs into a pack and read the chain. + """ + revisions = [] + filename = "foo" + lastnode = nullid + for i in range(10): + content = "abcdef%s" % i + node = self.getHash(content) + revisions.append((filename, node, lastnode, content)) + lastnode = node + + pack = self.createPack(revisions) + + entry = pack.getdelta(filename, revisions[0][1]) + realvalue = (revisions[0][3], filename, revisions[0][2], {}) + self.assertEquals(entry, realvalue) + + # Test that the chain for the final entry has all the others + chain = pack.getdeltachain(filename, node) + for i in range(10): + content = "abcdef%s" % i + self.assertEquals(content, chain[-i - 1][4]) + + def testPackMany(self): + """Pack many related and unrelated objects. + """ + # Build a random pack file + revisions = [] + blobs = {} + random.seed(0) + for i in range(100): + filename = "filename-%s" % i + filerevs = [] + for j in range(random.randint(1, 100)): + content = "content-%s" % j + node = self.getHash(content) + lastnode = nullid + if len(filerevs) > 0: + lastnode = filerevs[random.randint(0, len(filerevs) - 1)] + filerevs.append(node) + blobs[(filename, node, lastnode)] = content + revisions.append((filename, node, lastnode, content)) + + pack = self.createPack(revisions) + + # Verify the pack contents + for (filename, node, lastnode), content in sorted(blobs.iteritems()): + chain = pack.getdeltachain(filename, node) + for entry in chain: + expectedcontent = blobs[(entry[0], entry[1], entry[3])] + self.assertEquals(entry[4], expectedcontent) + + def testPackMetadata(self): + revisions = [] + for i in range(100): + filename = '%s.txt' % i + content = 'put-something-here \n' * i + node = self.getHash(content) + meta = {constants.METAKEYFLAG: i ** 4, + constants.METAKEYSIZE: len(content), + 'Z': 'random_string', + '_': '\0' * i} + revisions.append((filename, node, nullid, content, meta)) + pack = self.createPack(revisions, version=1) + for name, node, x, content, origmeta in revisions: + parsedmeta = pack.getmeta(name, node) + # flag == 0 should be optimized out + if origmeta[constants.METAKEYFLAG] == 0: + del origmeta[constants.METAKEYFLAG] + self.assertEquals(parsedmeta, origmeta) + + def testPackMetadataThrows(self): + filename = '1' + content = '2' + node = self.getHash(content) + meta = {constants.METAKEYFLAG: 3} + revisions = [(filename, node, nullid, content, meta)] + try: + self.createPack(revisions, version=0) + self.assertTrue(False, "should throw if metadata is not supported") + except RuntimeError: + pass + + def testGetMissing(self): + """Test the getmissing() api. + """ + revisions = [] + filename = "foo" + lastnode = nullid + for i in range(10): + content = "abcdef%s" % i + node = self.getHash(content) + revisions.append((filename, node, lastnode, content)) + lastnode = node + + pack = self.createPack(revisions) + + missing = pack.getmissing([("foo", revisions[0][1])]) + self.assertFalse(missing) + + missing = pack.getmissing([("foo", revisions[0][1]), + ("foo", revisions[1][1])]) + self.assertFalse(missing) + + fakenode = self.getFakeHash() + missing = pack.getmissing([("foo", revisions[0][1]), ("foo", fakenode)]) + self.assertEquals(missing, [("foo", fakenode)]) + + def testAddThrows(self): + pack = self.createPack() + + try: + pack.add('filename', nullid, 'contents') + self.assertTrue(False, "datapack.add should throw") + except RuntimeError: + pass + + def testBadVersionThrows(self): + pack = self.createPack() + path = pack.path + '.datapack' + with open(path) as f: + raw = f.read() + raw = struct.pack('!B', 255) + raw[1:] + os.chmod(path, os.stat(path).st_mode | stat.S_IWRITE) + with open(path, 'w+') as f: + f.write(raw) + + try: + pack = self.datapackreader(pack.path) + self.assertTrue(False, "bad version number should have thrown") + except RuntimeError: + pass + + def testMissingDeltabase(self): + fakenode = self.getFakeHash() + revisions = [("filename", fakenode, self.getFakeHash(), "content")] + pack = self.createPack(revisions) + chain = pack.getdeltachain("filename", fakenode) + self.assertEquals(len(chain), 1) + + def testLargePack(self): + """Test creating and reading from a large pack with over X entries. + This causes it to use a 2^16 fanout table instead.""" + revisions = [] + blobs = {} + total = basepack.SMALLFANOUTCUTOFF + 1 + for i in xrange(total): + filename = "filename-%s" % i + content = filename + node = self.getHash(content) + blobs[(filename, node)] = content + revisions.append((filename, node, nullid, content)) + + pack = self.createPack(revisions) + if self.paramsavailable: + self.assertEquals(pack.params.fanoutprefix, + basepack.LARGEFANOUTPREFIX) + + for (filename, node), content in blobs.iteritems(): + actualcontent = pack.getdeltachain(filename, node)[0][4] + self.assertEquals(actualcontent, content) + + def testPacksCache(self): + """Test that we remember the most recent packs while fetching the delta + chain.""" + + packdir = self.makeTempDir() + deltachains = [] + + numpacks = 10 + revisionsperpack = 100 + + for i in range(numpacks): + chain = [] + revision = (str(i), self.getFakeHash(), nullid, "content") + + for _ in range(revisionsperpack): + chain.append(revision) + revision = ( + str(i), + self.getFakeHash(), + revision[1], + self.getFakeHash() + ) + + self.createPack(chain, packdir) + deltachains.append(chain) + + class testdatapackstore(datapack.datapackstore): + # Ensures that we are not keeping everything in the cache. + DEFAULTCACHESIZE = numpacks / 2 + + store = testdatapackstore(uimod.ui(), packdir) + + random.shuffle(deltachains) + for randomchain in deltachains: + revision = random.choice(randomchain) + chain = store.getdeltachain(revision[0], revision[1]) + + mostrecentpack = next(iter(store.packs), None) + self.assertEquals( + mostrecentpack.getdeltachain(revision[0], revision[1]), + chain + ) + + self.assertEquals(randomchain.index(revision) + 1, len(chain)) + + # perf test off by default since it's slow + def _testIndexPerf(self): + random.seed(0) + print("Multi-get perf test") + packsizes = [ + 100, + 10000, + 100000, + 500000, + 1000000, + 3000000, + ] + lookupsizes = [ + 10, + 100, + 1000, + 10000, + 100000, + 1000000, + ] + for packsize in packsizes: + revisions = [] + for i in xrange(packsize): + filename = "filename-%s" % i + content = "content-%s" % i + node = self.getHash(content) + revisions.append((filename, node, nullid, content)) + + path = self.createPack(revisions).path + + # Perf of large multi-get + import gc + gc.disable() + pack = self.datapackreader(path) + for lookupsize in lookupsizes: + if lookupsize > packsize: + continue + random.shuffle(revisions) + findnodes = [(rev[0], rev[1]) for rev in revisions] + + start = time.time() + pack.getmissing(findnodes[:lookupsize]) + elapsed = time.time() - start + print ("%s pack %s lookups = %0.04f" % + (('%s' % packsize).rjust(7), + ('%s' % lookupsize).rjust(7), + elapsed)) + + print("") + gc.enable() + + # The perf test is meant to produce output, so we always fail the test + # so the user sees the output. + raise RuntimeError("perf test always fails") + +class datapacktests(datapacktestsbase, unittest.TestCase): + def __init__(self, *args, **kwargs): + datapacktestsbase.__init__(self, datapack.datapack, True) + unittest.TestCase.__init__(self, *args, **kwargs) + +# TODO: +# datapack store: +# - getmissing +# - GC two packs into one + +if __name__ == '__main__': + silenttestrunner.main(__name__) diff --git a/tests/test-remotefilelog-gc.t b/tests/test-remotefilelog-gc.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-gc.t @@ -0,0 +1,113 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > serverexpiration=-1 + > EOF + $ echo x > x + $ hg commit -qAm x + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Set the prefetchdays config to zero so that all commits are prefetched +# no matter what their creation date is. + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > prefetchdays=0 + > EOF + $ cd .. + +# commit a new version of x so we can gc the old one + + $ cd master + $ echo y > x + $ hg commit -qAm y + $ cd .. + + $ cd shallow + $ hg pull -q + $ hg update -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd .. + +# gc client cache + + $ lastweek=`$PYTHON -c 'import datetime,time; print(datetime.datetime.fromtimestamp(time.time() - (86400 * 7)).strftime("%y%m%d%H%M"))'` + $ find $CACHEDIR -type f -exec touch -t $lastweek {} \; + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 (glob) + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/48023ec064c1d522f0d792a5a912bb1bf7859a4a (glob) + $TESTTMP/hgcache/repos (glob) + $ hg gc + finished: removed 1 of 2 files (0.00 GB to 0.00 GB) + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/48023ec064c1d522f0d792a5a912bb1bf7859a4a (glob) + $TESTTMP/hgcache/repos + +# gc server cache + + $ find master/.hg/remotefilelogcache -type f | sort + master/.hg/remotefilelogcache/x/1406e74118627694268417491f018a4a883152f0 (glob) + master/.hg/remotefilelogcache/x/48023ec064c1d522f0d792a5a912bb1bf7859a4a (glob) + $ hg gc master + finished: removed 0 of 1 files (0.00 GB to 0.00 GB) + $ find master/.hg/remotefilelogcache -type f | sort + master/.hg/remotefilelogcache/x/48023ec064c1d522f0d792a5a912bb1bf7859a4a (glob) + +# Test that GC keepset includes pullprefetch revset if it is configured + + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > pullprefetch=all() + > EOF + $ hg prefetch + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ cd .. + $ hg gc + finished: removed 0 of 2 files (0.00 GB to 0.00 GB) + +# Ensure that there are 2 versions of the file in cache + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1406e74118627694268417491f018a4a883152f0 (glob) + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/48023ec064c1d522f0d792a5a912bb1bf7859a4a (glob) + $TESTTMP/hgcache/repos (glob) + +# Test that if garbage collection on repack and repack on hg gc flags are set then incremental repack with garbage collector is run + + $ hg gc --config remotefilelog.gcrepack=True --config remotefilelog.repackonhggc=True + +# Ensure that loose files are repacked + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/8d3499c65d926e4f107cf03c6b0df833222025b4.histidx + $TESTTMP/hgcache/master/packs/8d3499c65d926e4f107cf03c6b0df833222025b4.histpack + $TESTTMP/hgcache/master/packs/9c7046f8cad0417c39aa7c03ce13e0ba991306c2.dataidx + $TESTTMP/hgcache/master/packs/9c7046f8cad0417c39aa7c03ce13e0ba991306c2.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Test that warning is displayed when there are no valid repos in repofile + + $ cp $CACHEDIR/repos $CACHEDIR/repos.bak + $ echo " " > $CACHEDIR/repos + $ hg gc + warning: no valid repos in repofile + $ mv $CACHEDIR/repos.bak $CACHEDIR/repos + +# Test that warning is displayed when the repo path is malformed + + $ printf "asdas\0das" >> $CACHEDIR/repos + $ hg gc 2>&1 | head -n2 + warning: malformed path: * (glob) + Traceback (most recent call last): diff --git a/tests/test-remotefilelog-gcrepack.t b/tests/test-remotefilelog-gcrepack.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-gcrepack.t @@ -0,0 +1,160 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo y > y + $ rm x + $ hg commit -qAm DxAy + $ echo yy > y + $ hg commit -qAm y + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Set the prefetchdays config to zero so that all commits are prefetched +# no matter what their creation date is. + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > prefetchdays=0 + > EOF + $ cd .. + +# Prefetch all data and repack + + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > bgprefetchrevs=all() + > EOF + + $ hg prefetch + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ hg repack + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + + $ find $CACHEDIR | sort | grep ".datapack\|.histpack" + $TESTTMP/hgcache/master/packs/9a2ea858fe2967db9b6ea4c0ca238881cae9d6eb.histpack + $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4.datapack + +# Ensure that all file versions were prefetched + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4.datapack + $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4: + x: + Node Delta Base Delta Length Blob Size + 1406e7411862 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 50dbc4572b8e 000000000000 3 3 + 076f5e2225b3 50dbc4572b8e 14 2 + + Total: 17 5 (240.0% bigger) + +# Test garbage collection during repack + + $ cat >> .hg/hgrc < [remotefilelog] + > bgprefetchrevs=tip + > gcrepack=True + > nodettl=86400 + > EOF + + $ hg repack + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + + $ find $CACHEDIR | sort | grep ".datapack\|.histpack" + $TESTTMP/hgcache/master/packs/05baa499c6b07f2bf0ea3d2c8151da1cb86f5e33.datapack + $TESTTMP/hgcache/master/packs/9a2ea858fe2967db9b6ea4c0ca238881cae9d6eb.histpack + +# Ensure that file 'x' was garbage collected. It should be GCed because it is not in the keepset +# and is old (commit date is 0.0 in tests). Ensure that file 'y' is present as it is in the keepset. + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/05baa499c6b07f2bf0ea3d2c8151da1cb86f5e33.datapack + $TESTTMP/hgcache/master/packs/05baa499c6b07f2bf0ea3d2c8151da1cb86f5e33: + y: + Node Delta Base Delta Length Blob Size + 50dbc4572b8e 000000000000 3 3 + + Total: 3 3 (0.0% bigger) + +# Prefetch all data again and repack for later garbage collection + + $ cat >> .hg/hgrc < [remotefilelog] + > bgprefetchrevs=all() + > EOF + + $ hg prefetch + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ hg repack + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + + $ find $CACHEDIR | sort | grep ".datapack\|.histpack" + $TESTTMP/hgcache/master/packs/9a2ea858fe2967db9b6ea4c0ca238881cae9d6eb.histpack + $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4.datapack + +# Ensure that all file versions were prefetched + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4.datapack + $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4: + x: + Node Delta Base Delta Length Blob Size + 1406e7411862 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 50dbc4572b8e 000000000000 3 3 + 076f5e2225b3 50dbc4572b8e 14 2 + + Total: 17 5 (240.0% bigger) + +# Test garbage collection during repack. Ensure that new files are not removed even though they are not in the keepset +# For the purposes of the test the TTL of a file is set to current time + 100 seconds. i.e. all commits in tests have +# a date of 1970 and therefore to prevent garbage collection we have to set nodettl to be farther from 1970 than we are now. + + $ cat >> .hg/hgrc < [remotefilelog] + > bgprefetchrevs= + > nodettl=$(($(date +%s) + 100)) + > EOF + + $ hg repack + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>%1 + + $ find $CACHEDIR | sort | grep ".datapack\|.histpack" + $TESTTMP/hgcache/master/packs/9a2ea858fe2967db9b6ea4c0ca238881cae9d6eb.histpack + $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4.datapack + +# Ensure that all file versions were prefetched + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4.datapack + $TESTTMP/hgcache/master/packs/f7a942a6e4673d2c7b697fdd926ca2d153831ca4: + x: + Node Delta Base Delta Length Blob Size + 1406e7411862 000000000000 2 2 + + Total: 2 2 (0.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 50dbc4572b8e 000000000000 3 3 + 076f5e2225b3 50dbc4572b8e 14 2 + + Total: 17 5 (240.0% bigger) diff --git a/tests/test-remotefilelog-histpack.py b/tests/test-remotefilelog-histpack.py new file mode 100755 --- /dev/null +++ b/tests/test-remotefilelog-histpack.py @@ -0,0 +1,274 @@ +#!/usr/bin/env python +from __future__ import absolute_import + +import hashlib +import os +import random +import shutil +import stat +import struct +import sys +import tempfile +import unittest + +import silenttestrunner + +from mercurial.node import nullid +from mercurial import ( + ui as uimod, +) +# Load the local remotefilelog, not the system one +sys.path[0:0] = [os.path.join(os.path.dirname(__file__), '..')] +from hgext.remotefilelog import ( + basepack, + historypack, +) + +class histpacktests(unittest.TestCase): + def setUp(self): + self.tempdirs = [] + + def tearDown(self): + for d in self.tempdirs: + shutil.rmtree(d) + + def makeTempDir(self): + tempdir = tempfile.mkdtemp() + self.tempdirs.append(tempdir) + return tempdir + + def getHash(self, content): + return hashlib.sha1(content).digest() + + def getFakeHash(self): + return ''.join(chr(random.randint(0, 255)) for _ in range(20)) + + def createPack(self, revisions=None): + """Creates and returns a historypack containing the specified revisions. + + `revisions` is a list of tuples, where each tuple contains a filanem, + node, p1node, p2node, and linknode. + """ + if revisions is None: + revisions = [("filename", self.getFakeHash(), nullid, nullid, + self.getFakeHash(), None)] + + packdir = self.makeTempDir() + packer = historypack.mutablehistorypack(uimod.ui(), packdir, + version=1) + + for filename, node, p1, p2, linknode, copyfrom in revisions: + packer.add(filename, node, p1, p2, linknode, copyfrom) + + path = packer.close() + return historypack.historypack(path) + + def testAddSingle(self): + """Test putting a single entry into a pack and reading it out. + """ + filename = "foo" + node = self.getFakeHash() + p1 = self.getFakeHash() + p2 = self.getFakeHash() + linknode = self.getFakeHash() + + revisions = [(filename, node, p1, p2, linknode, None)] + pack = self.createPack(revisions) + + actual = pack.getancestors(filename, node)[node] + self.assertEquals(p1, actual[0]) + self.assertEquals(p2, actual[1]) + self.assertEquals(linknode, actual[2]) + + def testAddMultiple(self): + """Test putting multiple unrelated revisions into a pack and reading + them out. + """ + revisions = [] + for i in range(10): + filename = "foo-%s" % i + node = self.getFakeHash() + p1 = self.getFakeHash() + p2 = self.getFakeHash() + linknode = self.getFakeHash() + revisions.append((filename, node, p1, p2, linknode, None)) + + pack = self.createPack(revisions) + + for filename, node, p1, p2, linknode, copyfrom in revisions: + actual = pack.getancestors(filename, node)[node] + self.assertEquals(p1, actual[0]) + self.assertEquals(p2, actual[1]) + self.assertEquals(linknode, actual[2]) + self.assertEquals(copyfrom, actual[3]) + + def testAddAncestorChain(self): + """Test putting multiple revisions in into a pack and read the ancestor + chain. + """ + revisions = [] + filename = "foo" + lastnode = nullid + for i in range(10): + node = self.getFakeHash() + revisions.append((filename, node, lastnode, nullid, nullid, None)) + lastnode = node + + # revisions must be added in topological order, newest first + revisions = list(reversed(revisions)) + pack = self.createPack(revisions) + + # Test that the chain has all the entries + ancestors = pack.getancestors(revisions[0][0], revisions[0][1]) + for filename, node, p1, p2, linknode, copyfrom in revisions: + ap1, ap2, alinknode, acopyfrom = ancestors[node] + self.assertEquals(ap1, p1) + self.assertEquals(ap2, p2) + self.assertEquals(alinknode, linknode) + self.assertEquals(acopyfrom, copyfrom) + + def testPackMany(self): + """Pack many related and unrelated ancestors. + """ + # Build a random pack file + allentries = {} + ancestorcounts = {} + revisions = [] + random.seed(0) + for i in range(100): + filename = "filename-%s" % i + entries = [] + p2 = nullid + linknode = nullid + for j in range(random.randint(1, 100)): + node = self.getFakeHash() + p1 = nullid + if len(entries) > 0: + p1 = entries[random.randint(0, len(entries) - 1)] + entries.append(node) + revisions.append((filename, node, p1, p2, linknode, None)) + allentries[(filename, node)] = (p1, p2, linknode) + if p1 == nullid: + ancestorcounts[(filename, node)] = 1 + else: + newcount = ancestorcounts[(filename, p1)] + 1 + ancestorcounts[(filename, node)] = newcount + + # Must add file entries in reverse topological order + revisions = list(reversed(revisions)) + pack = self.createPack(revisions) + + # Verify the pack contents + for (filename, node), (p1, p2, lastnode) in allentries.iteritems(): + ancestors = pack.getancestors(filename, node) + self.assertEquals(ancestorcounts[(filename, node)], + len(ancestors)) + for anode, (ap1, ap2, alinknode, copyfrom) in ancestors.iteritems(): + ep1, ep2, elinknode = allentries[(filename, anode)] + self.assertEquals(ap1, ep1) + self.assertEquals(ap2, ep2) + self.assertEquals(alinknode, elinknode) + self.assertEquals(copyfrom, None) + + def testGetNodeInfo(self): + revisions = [] + filename = "foo" + lastnode = nullid + for i in range(10): + node = self.getFakeHash() + revisions.append((filename, node, lastnode, nullid, nullid, None)) + lastnode = node + + pack = self.createPack(revisions) + + # Test that getnodeinfo returns the expected results + for filename, node, p1, p2, linknode, copyfrom in revisions: + ap1, ap2, alinknode, acopyfrom = pack.getnodeinfo(filename, node) + self.assertEquals(ap1, p1) + self.assertEquals(ap2, p2) + self.assertEquals(alinknode, linknode) + self.assertEquals(acopyfrom, copyfrom) + + def testGetMissing(self): + """Test the getmissing() api. + """ + revisions = [] + filename = "foo" + for i in range(10): + node = self.getFakeHash() + p1 = self.getFakeHash() + p2 = self.getFakeHash() + linknode = self.getFakeHash() + revisions.append((filename, node, p1, p2, linknode, None)) + + pack = self.createPack(revisions) + + missing = pack.getmissing([(filename, revisions[0][1])]) + self.assertFalse(missing) + + missing = pack.getmissing([(filename, revisions[0][1]), + (filename, revisions[1][1])]) + self.assertFalse(missing) + + fakenode = self.getFakeHash() + missing = pack.getmissing([(filename, revisions[0][1]), + (filename, fakenode)]) + self.assertEquals(missing, [(filename, fakenode)]) + + # Test getmissing on a non-existant filename + missing = pack.getmissing([("bar", fakenode)]) + self.assertEquals(missing, [("bar", fakenode)]) + + def testAddThrows(self): + pack = self.createPack() + + try: + pack.add('filename', nullid, nullid, nullid, nullid, None) + self.assertTrue(False, "historypack.add should throw") + except RuntimeError: + pass + + def testBadVersionThrows(self): + pack = self.createPack() + path = pack.path + '.histpack' + with open(path) as f: + raw = f.read() + raw = struct.pack('!B', 255) + raw[1:] + os.chmod(path, os.stat(path).st_mode | stat.S_IWRITE) + with open(path, 'w+') as f: + f.write(raw) + + try: + pack = historypack.historypack(pack.path) + self.assertTrue(False, "bad version number should have thrown") + except RuntimeError: + pass + + def testLargePack(self): + """Test creating and reading from a large pack with over X entries. + This causes it to use a 2^16 fanout table instead.""" + total = basepack.SMALLFANOUTCUTOFF + 1 + revisions = [] + for i in xrange(total): + filename = "foo-%s" % i + node = self.getFakeHash() + p1 = self.getFakeHash() + p2 = self.getFakeHash() + linknode = self.getFakeHash() + revisions.append((filename, node, p1, p2, linknode, None)) + + pack = self.createPack(revisions) + self.assertEquals(pack.params.fanoutprefix, basepack.LARGEFANOUTPREFIX) + + for filename, node, p1, p2, linknode, copyfrom in revisions: + actual = pack.getancestors(filename, node)[node] + self.assertEquals(p1, actual[0]) + self.assertEquals(p2, actual[1]) + self.assertEquals(linknode, actual[2]) + self.assertEquals(copyfrom, actual[3]) +# TODO: +# histpack store: +# - repack two packs into one + +if __name__ == '__main__': + silenttestrunner.main(__name__) diff --git a/tests/test-remotefilelog-http.t b/tests/test-remotefilelog-http.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-http.t @@ -0,0 +1,98 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo y > y + $ hg commit -qAm x + $ hg serve -p $HGPORT -d --pid-file=../hg1.pid -E ../error.log -A ../access.log + +Build a query string for later use: + $ GET=`hg debugdata -m 0 | $PYTHON -c \ + > 'import sys ; print [("?cmd=getfile&file=%s&node=%s" % tuple(s.split("\0"))) for s in sys.stdin.read().splitlines()][0]'` + + $ cd .. + $ cat hg1.pid >> $DAEMON_PIDS + + $ hgcloneshallow http://localhost:$HGPORT/ shallow -q + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + + $ grep getfile access.log + * "GET /?cmd=batch HTTP/1.1" 200 - x-hgarg-1:cmds=getfile+*node%3D1406e74118627694268417491f018a4a883152f0* (glob) + +Clear filenode cache so we can test fetching with a modified batch size + $ rm -r $TESTTMP/hgcache +Now do a fetch with a large batch size so we're sure it works + $ hgcloneshallow http://localhost:$HGPORT/ shallow-large-batch \ + > --config remotefilelog.batchsize=1000 -q + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +The 'remotefilelog' capability should *not* be exported over http(s), +as the getfile method it offers doesn't work with http. + $ get-with-headers.py localhost:$HGPORT '?cmd=capabilities' | grep lookup | identifyrflcaps + getfile + getflogheads + + $ get-with-headers.py localhost:$HGPORT '?cmd=hello' | grep lookup | identifyrflcaps + getfile + getflogheads + + $ get-with-headers.py localhost:$HGPORT '?cmd=this-command-does-not-exist' | head -n 1 + 400 no such method: this-command-does-not-exist + $ get-with-headers.py localhost:$HGPORT '?cmd=getfiles' | head -n 1 + 400 no such method: getfiles + +Verify serving from a shallow clone doesn't allow for remotefile +fetches. This also serves to test the error handling for our batchable +getfile RPC. + + $ cd shallow + $ hg serve -p $HGPORT1 -d --pid-file=../hg2.pid -E ../error2.log + $ cd .. + $ cat hg2.pid >> $DAEMON_PIDS + +This GET should work, because this server is serving master, which is +a full clone. + + $ get-with-headers.py localhost:$HGPORT "$GET" + 200 Script output follows + + 0\x00U\x00\x00\x00\xff (esc) + 2\x00x (esc) + \x14\x06\xe7A\x18bv\x94&\x84\x17I\x1f\x01\x8aJ\x881R\xf0\x00\x01\x00\x14\xf0\x06T\xd8\xef\x99"\x04\xd01\xe6\xa6i\xf4~\x98\xb3\xe3Dw>T\x00 (no-eol) (esc) + +This GET should fail using the in-band signalling mechanism, because +it's not a full clone. Note that it's also plausible for servers to +refuse to serve file contents for other reasons, like the file +contents not being visible to the current user. + + $ get-with-headers.py localhost:$HGPORT1 "$GET" + 200 Script output follows + + 1\x00cannot fetch remote files from shallow repo (no-eol) (esc) + +Clones should work with httppostargs turned on + + $ cd master + $ hg --config experimental.httppostargs=1 serve -p $HGPORT2 -d --pid-file=../hg3.pid -E ../error3.log + + $ cd .. + $ cat hg3.pid >> $DAEMON_PIDS + +Clear filenode cache so we can test fetching with a modified batch size + $ rm -r $TESTTMP/hgcache + + $ hgcloneshallow http://localhost:$HGPORT2/ shallow-postargs -q + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +All error logs should be empty: + $ cat error.log + $ cat error2.log + $ cat error3.log diff --git a/tests/test-remotefilelog-keepset.t b/tests/test-remotefilelog-keepset.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-keepset.t @@ -0,0 +1,40 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > serverexpiration=-1 + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo y > y + $ hg commit -qAm y + $ echo z > z + $ hg commit -qAm z + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + +# Compute keepset for 0th and 2nd commit, which implies that we do not process +# the 1st commit, therefore we diff 2nd manifest with the 0th manifest and +# populate the keepkeys from the diff + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > pullprefetch=0+2 + > EOF + $ hg debugkeepset + +# Compute keepset for all commits, which implies that we only process deltas of +# manifests of commits 1 and 2 and therefore populate the keepkeys from deltas + $ cat >> .hg/hgrc < [remotefilelog] + > pullprefetch=all() + > EOF + $ hg debugkeepset diff --git a/tests/test-remotefilelog-linknodes.t b/tests/test-remotefilelog-linknodes.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-linknodes.t @@ -0,0 +1,195 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + +# Tests for the complicated linknode logic in remotefilelog.py::ancestormap() + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > serverexpiration=-1 + > EOF + $ echo x > x + $ hg commit -qAm x + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Rebase produces correct log -f linknodes + + $ cd shallow + $ echo y > y + $ hg commit -qAm y + $ hg up 0 + 0 files updated, 0 files merged, 1 files removed, 0 files unresolved + $ echo x >> x + $ hg commit -qAm xx + $ hg log -f x --template "{node|short}\n" + 0632994590a8 + b292c1e3311f + + $ hg rebase -d 1 + rebasing 2:0632994590a8 "xx" (tip) + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/0632994590a8-0bc786d8-rebase.hg (glob) + $ hg log -f x --template "{node|short}\n" + 81deab2073bc + b292c1e3311f + +# Rebase back, log -f still works + + $ hg rebase -d 0 -r 2 + rebasing 2:81deab2073bc "xx" (tip) + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/81deab2073bc-80cb4fda-rebase.hg (glob) + $ hg log -f x --template "{node|short}\n" + b3fca10fb42d + b292c1e3311f + + $ hg rebase -d 1 -r 2 + rebasing 2:b3fca10fb42d "xx" (tip) + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/b3fca10fb42d-da73a0c7-rebase.hg (glob) + + $ cd .. + +# Reset repos + $ clearcache + + $ rm -rf master + $ rm -rf shallow + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > serverexpiration=-1 + > EOF + $ echo x > x + $ hg commit -qAm x + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Rebase stack onto landed commit + + $ cd master + $ echo x >> x + $ hg commit -Aqm xx + + $ cd ../shallow + $ echo x >> x + $ hg commit -Aqm xx2 + $ echo y >> x + $ hg commit -Aqm xxy + + $ hg pull -q + $ hg rebase -d tip + rebasing 1:4549721d828f "xx2" + note: rebase of 1:4549721d828f created no changes to commit + rebasing 2:5ef6d97e851c "xxy" + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/4549721d828f-b084e33c-rebase.hg (glob) + $ hg log -f x --template '{node|short}\n' + 4ae8e31c85ef + 0632994590a8 + b292c1e3311f + + $ cd .. + +# system cache has invalid linknode, but .hg/store/data has valid + + $ cd shallow + $ hg strip -r 1 -q + $ rm -rf .hg/store/data/* + $ echo x >> x + $ hg commit -Aqm xx_local + $ hg log -f x --template '{rev}:{node|short}\n' + 1:21847713771d + 0:b292c1e3311f + + $ cd .. + $ rm -rf shallow + +/* Local linknode is invalid; remote linknode is valid (formerly slow case) */ + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ cd shallow + $ echo x >> x + $ hg commit -Aqm xx2 + $ cd ../master + $ echo y >> y + $ hg commit -Aqm yy2 + $ echo x >> x + $ hg commit -Aqm xx2-fake-rebased + $ echo y >> y + $ hg commit -Aqm yy3 + $ cd ../shallow + $ hg pull --config remotefilelog.debug=True + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 3 changesets with 0 changes to 0 files (+1 heads) + new changesets 01979f9404f8:7200df4e0aca + (run 'hg heads' to see heads, 'hg merge' to merge) + $ hg update tip -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ echo x > x + $ hg commit -qAm xx3 + +# At this point, the linknode points to c1254e70bad1 instead of 32e6611f6149 + $ hg log -G -T '{node|short} {desc} {phase} {files}\n' + @ a5957b6bf0bd xx3 draft x + | + o 7200df4e0aca yy3 public y + | + o 32e6611f6149 xx2-fake-rebased public x + | + o 01979f9404f8 yy2 public y + | + | o c1254e70bad1 xx2 draft x + |/ + o 0632994590a8 xx public x + | + o b292c1e3311f x public x + +# Check the contents of the local blob for incorrect linknode + $ hg debugremotefilelog .hg/store/data/11f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + size: 6 bytes + path: .hg/store/data/11f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + key: d4a3ed9310e5 + + node => p1 p2 linknode copyfrom + d4a3ed9310e5 => aee31534993a 000000000000 c1254e70bad1 + aee31534993a => 1406e7411862 000000000000 0632994590a8 + 1406e7411862 => 000000000000 000000000000 b292c1e3311f + +# Verify that we do a fetch on the first log (remote blob fetch for linkrev fix) + $ hg log -f x -T '{node|short} {desc} {phase} {files}\n' + a5957b6bf0bd xx3 draft x + 32e6611f6149 xx2-fake-rebased public x + 0632994590a8 xx public x + b292c1e3311f x public x + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# But not after that + $ hg log -f x -T '{node|short} {desc} {phase} {files}\n' + a5957b6bf0bd xx3 draft x + 32e6611f6149 xx2-fake-rebased public x + 0632994590a8 xx public x + b292c1e3311f x public x + +# Check the contents of the remote blob for correct linknode + $ hg debugremotefilelog $CACHEDIR/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + size: 6 bytes + path: $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + key: d4a3ed9310e5 + + node => p1 p2 linknode copyfrom + d4a3ed9310e5 => aee31534993a 000000000000 32e6611f6149 + aee31534993a => 1406e7411862 000000000000 0632994590a8 + 1406e7411862 => 000000000000 000000000000 b292c1e3311f diff --git a/tests/test-remotefilelog-local.t b/tests/test-remotefilelog-local.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-local.t @@ -0,0 +1,208 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo y > y + $ echo z > z + $ hg commit -qAm xy + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow + +# status + + $ clearcache + $ echo xx > x + $ echo yy > y + $ touch a + $ hg status + M x + M y + ? a + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ hg add a + $ hg status + M x + M y + A a + +# diff + + $ hg debugrebuilddirstate # fixes dirstate non-determinism + $ hg add a + $ clearcache + $ hg diff + diff -r f3d0bb0d1e48 x + --- a/x* (glob) + +++ b/x* (glob) + @@ -1,1 +1,1 @@ + -x + +xx + diff -r f3d0bb0d1e48 y + --- a/y* (glob) + +++ b/y* (glob) + @@ -1,1 +1,1 @@ + -y + +yy + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + +# local commit + + $ clearcache + $ echo a > a + $ echo xxx > x + $ echo yyy > y + $ hg commit -m a + ? files fetched over 1 fetches - (? misses, 0.00% hit ratio) over *s (glob) + +# local commit where the dirstate is clean -- ensure that we do just one fetch +# (update to a commit on the server first) + + $ hg --config debug.dirstate.delaywrite=1 up 0 + 2 files updated, 0 files merged, 1 files removed, 0 files unresolved + $ clearcache + $ hg debugdirstate + n 644 2 * x (glob) + n 644 2 * y (glob) + n 644 2 * z (glob) + $ echo xxxx > x + $ echo yyyy > y + $ hg commit -m x + created new head + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +# restore state for future tests + + $ hg -q strip . + $ hg -q up tip + +# rebase + + $ clearcache + $ cd ../master + $ echo w > w + $ hg commit -qAm w + + $ cd ../shallow + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files (+1 heads) + new changesets fed61014d323 + (run 'hg heads' to see heads, 'hg merge' to merge) + + $ hg rebase -d tip + rebasing 1:9abfe7bca547 "a" + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/9abfe7bca547-8b11e5ff-rebase.hg (glob) + 3 files fetched over 2 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + +# strip + + $ clearcache + $ hg debugrebuilddirstate # fixes dirstate non-determinism + $ hg strip -r . + 2 files updated, 0 files merged, 1 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/19edf50f4de7-df3d0f74-backup.hg (glob) + 4 files fetched over 2 fetches - (4 misses, 0.00% hit ratio) over *s (glob) + +# unbundle + + $ clearcache + $ ls + w + x + y + z + + $ hg debugrebuilddirstate # fixes dirstate non-determinism + $ hg unbundle .hg/strip-backup/19edf50f4de7-df3d0f74-backup.hg + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets 19edf50f4de7 (1 drafts) + (run 'hg update' to get a working copy) + + $ hg up + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + 4 files fetched over 1 fetches - (4 misses, 0.00% hit ratio) over *s (glob) + $ cat a + a + +# revert + + $ clearcache + $ hg revert -r .~2 y z + no changes needed to z + 2 files fetched over 2 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ hg checkout -C -r . -q + +# explicit bundle should produce full bundle file + + $ hg bundle -r 2 --base 1 ../local.bundle + 1 changesets found + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow2 -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow2 + $ hg unbundle ../local.bundle + adding changesets + adding manifests + adding file changes + added 1 changesets with 3 changes to 3 files + new changesets 19edf50f4de7 (1 drafts) + (run 'hg update' to get a working copy) + + $ hg log -r 2 --stat + changeset: 2:19edf50f4de7 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: a + + a | 1 + + x | 2 +- + y | 2 +- + 3 files changed, 3 insertions(+), 2 deletions(-) + +# Merge + + $ echo merge >> w + $ hg commit -m w + created new head + $ hg merge 2 + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + (branch merge, don't forget to commit) + $ hg commit -m merge + $ hg strip -q -r ".^" + +# commit without producing new node + + $ cd $TESTTMP + $ hgcloneshallow ssh://user@dummy/master shallow3 -q + $ cd shallow3 + $ echo 1 > A + $ hg commit -m foo -A A + $ hg log -r . -T '{node}\n' + 383ce605500277f879b7460a16ba620eb6930b7f + $ hg update -r '.^' -q + $ echo 1 > A + $ hg commit -m foo -A A + $ hg log -r . -T '{node}\n' + 383ce605500277f879b7460a16ba620eb6930b7f diff --git a/tests/test-remotefilelog-log.t b/tests/test-remotefilelog-log.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-log.t @@ -0,0 +1,118 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + $ mkdir dir + $ echo y > dir/y + $ hg commit -qAm y + + $ cd .. + +Shallow clone from full + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 2 files to transfer, 473 bytes of data + transferred 473 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow + $ cat .hg/requires + dotencode + fncache + generaldelta + remotefilelog + revlogv1 + store + + $ hg update + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +Log on a file without -f + + $ hg log dir/y + warning: file log can be slow on large repos - use -f to speed it up + changeset: 1:2e73264fab97 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: y + +Log on a file with -f + + $ hg log -f dir/y + changeset: 1:2e73264fab97 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: y + +Log on a file with kind in path + $ hg log -r "filelog('path:dir/y')" + changeset: 1:2e73264fab97 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: y + +Log on multiple files with -f + + $ hg log -f dir/y x + changeset: 1:2e73264fab97 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: y + + changeset: 0:b292c1e3311f + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: x + +Log on a directory + + $ hg log dir + changeset: 1:2e73264fab97 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: y + +Log on a file from inside a directory + + $ cd dir + $ hg log y + warning: file log can be slow on large repos - use -f to speed it up + changeset: 1:2e73264fab97 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: y + +Log on a file via -fr + $ cd .. + $ hg log -fr tip dir/ --template '{rev}\n' + 1 + +Trace renames + $ hg mv x z + $ hg commit -m move + $ hg log -f z -T '{desc}\n' -G + @ move + : + o x + + +Verify remotefilelog handles rename metadata stripping when comparing file sizes + $ hg debugrebuilddirstate + $ hg status diff --git a/tests/test-remotefilelog-partial-shallow.t b/tests/test-remotefilelog-partial-shallow.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-partial-shallow.t @@ -0,0 +1,76 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > foo + $ echo y > bar + $ hg commit -qAm one + + $ cd .. + +# partial shallow clone + + $ hg clone --shallow ssh://user@dummy/master shallow --noupdate --config remotefilelog.includepattern=foo + streaming all changes + 3 files to transfer, 336 bytes of data + transferred 336 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cat >> shallow/.hg/hgrc < [remotefilelog] + > cachepath=$PWD/hgcache + > debug=True + > includepattern=foo + > reponame = master + > [extensions] + > remotefilelog= + > EOF + $ ls shallow/.hg/store/data + bar.i + +# update partial clone + + $ cd shallow + $ hg update + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cat foo + x + $ cat bar + y + $ cd .. + +# pull partial clone + + $ cd master + $ echo a >> foo + $ echo b >> bar + $ hg commit -qm two + $ cd ../shallow + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets a9688f18cb91 + (run 'hg update' to get a working copy) + $ hg update + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cat foo + x + a + $ cat bar + y + b + + $ cd .. diff --git a/tests/test-remotefilelog-permissions.t b/tests/test-remotefilelog-permissions.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-permissions.t @@ -0,0 +1,47 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ cd master + $ echo xx > x + $ hg commit -qAm x2 + $ cd .. + +# Test cache misses with read only permissions on server + + $ chmod -R a-w master/.hg/remotefilelogcache + $ cd shallow + $ hg pull -q + $ hg update + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd .. + + $ chmod -R u+w master/.hg/remotefilelogcache + +# Test setting up shared cache with the right permissions +# (this is hard to test in a cross platform way, so we just make sure nothing +# crashes) + + $ rm -rf $CACHEDIR + $ umask 002 + $ mkdir $CACHEDIR + $ hg -q clone --shallow ssh://user@dummy/master shallow2 --config remotefilelog.cachegroup="`id -g -n`" + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ ls -ld $CACHEDIR/11 + drwxrws* $TESTTMP/hgcache/11 (glob) diff --git a/tests/test-remotefilelog-prefetch.t b/tests/test-remotefilelog-prefetch.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-prefetch.t @@ -0,0 +1,266 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo z > z + $ hg commit -qAm x + $ echo x2 > x + $ echo y > y + $ hg commit -qAm y + $ hg bookmark foo + + $ cd .. + +# prefetch a revision + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 2 files to transfer, 528 bytes of data + transferred 528 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow + + $ hg prefetch -r 0 + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 0 x + x + +# prefetch with base + + $ clearcache + $ hg prefetch -r 0::1 -b 0 + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 1 x + x2 + $ hg cat -r 1 y + y + + $ hg cat -r 0 x + x + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 0 z + z + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ hg prefetch -r 0::1 --base 0 + $ hg prefetch -r 0::1 -b 1 + $ hg prefetch -r 0::1 + +# prefetch a range of revisions + + $ clearcache + $ hg prefetch -r 0::1 + 4 files fetched over 1 fetches - (4 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 0 x + x + $ hg cat -r 1 x + x2 + +# prefetch certain files + + $ clearcache + $ hg prefetch -r 1 x + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 1 x + x2 + + $ hg cat -r 1 y + y + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# prefetch on pull when configured + + $ printf "[remotefilelog]\npullprefetch=bookmark()\n" >> .hg/hgrc + $ hg strip tip + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/109c3a557a73-3f43405e-backup.hg (glob) + + $ clearcache + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + updating bookmark foo + new changesets 109c3a557a73 + (run 'hg update' to get a working copy) + prefetching file contents + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over *s (glob) + + $ hg up tip + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + +# prefetch only fetches changes not in working copy + + $ hg strip tip + 1 files updated, 0 files merged, 1 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/109c3a557a73-3f43405e-backup.hg (glob) + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ clearcache + + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + updating bookmark foo + new changesets 109c3a557a73 + (run 'hg update' to get a working copy) + prefetching file contents + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +# Make some local commits that produce the same file versions as are on the +# server. To simulate a situation where we have local commits that were somehow +# pushed, and we will soon pull. + + $ hg prefetch -r 'all()' + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ hg strip -q -r 0 + $ echo x > x + $ echo z > z + $ hg commit -qAm x + $ echo x2 > x + $ echo y > y + $ hg commit -qAm y + +# prefetch server versions, even if local versions are available + + $ clearcache + $ hg strip -q tip + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + updating bookmark foo + new changesets 109c3a557a73 + 1 local changesets published (?) + (run 'hg update' to get a working copy) + prefetching file contents + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + + $ cd .. + +# Prefetch unknown files during checkout + + $ hgcloneshallow ssh://user@dummy/master shallow2 + streaming all changes + 2 files to transfer, 528 bytes of data + transferred 528 bytes in * seconds * (glob) + searching for changes + no changes found + updating to branch default + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ cd shallow2 + $ hg up -q null + $ echo x > x + $ echo y > y + $ echo z > z + $ clearcache + $ hg up tip + x: untracked file differs + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over * (glob) + abort: untracked files in working directory differ from files in requested revision + [255] + $ hg revert --all + +# Test batch fetching of lookup files during hg status + $ hg up --clean tip + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ hg debugrebuilddirstate + $ clearcache + $ hg status + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over * (glob) + +# Prefetch during addrename detection + $ hg up -q --clean tip + $ hg revert --all + $ mv x x2 + $ mv y y2 + $ mv z z2 + $ clearcache + $ hg addremove -s 50 > /dev/null + 3 files fetched over 1 fetches - (3 misses, 0.00% hit ratio) over * (glob) + + $ cd .. + +# Prefetch packs + $ hgcloneshallow ssh://user@dummy/master packprefetch + streaming all changes + 2 files to transfer, 528 bytes of data + transferred 528 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + updating to branch default + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ cd packprefetch + $ cat >> .hg/hgrc < [remotefilelog] + > fetchpacks=True + > backgroundrepack=True + > EOF + $ clearcache + $ hg prefetch -r . + 3 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ find $TESTTMP/hgcache -type f | sort + $TESTTMP/hgcache/master/packs/47d8f1b90a73af4ff8af19fcd10bdc027b6a881a.histidx + $TESTTMP/hgcache/master/packs/47d8f1b90a73af4ff8af19fcd10bdc027b6a881a.histpack + $TESTTMP/hgcache/master/packs/8c654541e4f20141a894bbfe428e36fc92202e39.dataidx + $TESTTMP/hgcache/master/packs/8c654541e4f20141a894bbfe428e36fc92202e39.datapack + $ hg cat -r . x + x2 + $ hg cat -r . y + y + $ hg cat -r . z + z + +# Prefetch packs that include renames + $ cd ../master + $ hg mv z z2 + $ hg commit -m 'move z -> z2' + $ cd ../packprefetch + $ hg pull -q + (running background incremental repack) + $ hg prefetch -r tip + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg up tip -q + $ hg log -f z2 -T '{desc}\n' + move z -> z2 + x + +# Revert across double renames. Note: the scary "abort", error is because +# https://bz.mercurial-scm.org/5419 . + + $ clearcache + $ hg mv y y2 + $ hg mv x x2 + $ hg mv z2 z3 + $ hg revert -a -r 1 || true + forgetting x2 + forgetting y2 + forgetting z3 + adding z + undeleting x + undeleting y + 3 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + abort: z2@109c3a557a73: not found in manifest! (?) diff --git a/tests/test-remotefilelog-pull-noshallow.t b/tests/test-remotefilelog-pull-noshallow.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-pull-noshallow.t @@ -0,0 +1,80 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + +Set up an extension to make sure remotefilelog clientsetup() runs +unconditionally even if we have never used a local shallow repo. +This mimics behavior when using remotefilelog with chg. clientsetup() can be +triggered due to a shallow repo, and then the code can later interact with +non-shallow repositories. + + $ cat > setupremotefilelog.py << EOF + > from mercurial import extensions + > def extsetup(ui): + > remotefilelog = extensions.find('remotefilelog') + > remotefilelog.onetimeclientsetup(ui) + > EOF + +Set up the master repository to pull from. + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + + $ cd .. + + $ hg clone ssh://user@dummy/master child -q + +We should see the remotefilelog capability here, which advertises that +the server supports our custom getfiles method. + + $ cd master + $ echo 'hello' | hg -R . serve --stdio | grep capa | identifyrflcaps + getfile + getflogheads + remotefilelog + $ echo 'capabilities' | hg -R . serve --stdio | identifyrflcaps ; echo + getfile + getflogheads + remotefilelog + + +Pull to the child repository. Use our custom setupremotefilelog extension +to ensure that remotefilelog.onetimeclientsetup() gets triggered. (Without +using chg it normally would not be run in this case since the local repository +is not shallow.) + + $ echo y > y + $ hg commit -qAm y + + $ cd ../child + $ hg pull --config extensions.setuprfl=$TESTTMP/setupremotefilelog.py + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 1 changes to 1 files + new changesets d34c38483be9 + (run 'hg update' to get a working copy) + + $ hg up + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + + $ cat y + y + +Test that bundle works in a non-remotefilelog repo w/ remotefilelog loaded + + $ echo y >> y + $ hg commit -qAm "modify y" + $ hg bundle --base ".^" --rev . mybundle.hg --config extensions.setuprfl=$TESTTMP/setupremotefilelog.py + 1 changesets found + + $ cd .. diff --git a/tests/test-remotefilelog-push-pull.t b/tests/test-remotefilelog-push-pull.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-push-pull.t @@ -0,0 +1,230 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ hgcloneshallow ssh://user@dummy/master shallow2 -q + +We should see the remotefilelog capability here, which advertises that +the server supports our custom getfiles method. + + $ cd master + $ echo 'hello' | hg -R . serve --stdio | grep capa | identifyrflcaps + getfile + getflogheads + remotefilelog + $ echo 'capabilities' | hg -R . serve --stdio | identifyrflcaps ; echo + getfile + getflogheads + remotefilelog + +# pull to shallow from full + + $ echo y > y + $ hg commit -qAm y + + $ cd ../shallow + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets d34c38483be9 + (run 'hg update' to get a working copy) + + $ hg up + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ cat y + y + + $ cd .. + +# pull from shallow to shallow (local) + + $ cd shallow + $ echo z > z + $ hg commit -qAm z + $ echo x >> x + $ echo y >> y + $ hg commit -qAm xxyy + $ cd ../shallow2 + $ clearcache + $ hg pull ../shallow + pulling from ../shallow + searching for changes + adding changesets + adding manifests + adding file changes + added 3 changesets with 4 changes to 3 files + new changesets d34c38483be9:d7373980d475 (2 drafts) + (run 'hg update' to get a working copy) + 2 files fetched over 2 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +# pull from shallow to shallow (ssh) + + $ hg strip -r 1 + saved backup bundle to $TESTTMP/shallow2/.hg/strip-backup/d34c38483be9-89d325c9-backup.hg (glob) + $ hg pull ssh://user@dummy/$TESTTMP/shallow --config remotefilelog.cachepath=${CACHEDIR}2 + pulling from ssh://user@dummy/$TESTTMP/shallow + searching for changes + adding changesets + adding manifests + adding file changes + added 3 changesets with 4 changes to 3 files + new changesets d34c38483be9:d7373980d475 (2 drafts) + (run 'hg update' to get a working copy) + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ hg up + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ cat z + z + + $ hg -R ../shallow strip -qr 3 + $ hg strip -qr 3 + $ cd .. + +# push from shallow to shallow + + $ cd shallow + $ echo a > a + $ hg commit -qAm a + $ hg push ssh://user@dummy/$TESTTMP/shallow2 + pushing to ssh://user@dummy/$TESTTMP/shallow2 + searching for changes + remote: adding changesets + remote: adding manifests + remote: adding file changes + remote: added 1 changesets with 1 changes to 1 files + + $ cd ../shallow2 + $ hg up + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ cat a + a + +# verify files are read-only + + $ ls -l .hg/store/data + total * (glob) + drwxrwxr-x* 11f6ad8ec52a2984abaafd7c3b516503785c2072 (glob) + drwxrwxr-x* 395df8f7c51f007019cb30201c49e884b46b92fa (glob) + drwxrwxr-x* 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 (glob) + drwxrwxr-x* 95cb0bfd2977c761298d9624e4b4d4c72a39974a (glob) + $ ls -l .hg/store/data/395df8f7c51f007019cb30201c49e884b46b92fa + total * (glob) + -r--r--r--* 69a1b67522704ec122181c0890bd16e9d3e7516a (glob) + -r--r--r--* 69a1b67522704ec122181c0890bd16e9d3e7516a_old (glob) + $ cd .. + +# push from shallow to full + + $ cd shallow + $ hg push + pushing to ssh://user@dummy/master + searching for changes + remote: adding changesets + remote: adding manifests + remote: adding file changes + remote: added 2 changesets with 2 changes to 2 files + + $ cd ../master + $ hg log -l 1 --style compact + 3[tip] 1489bbbc46f0 1970-01-01 00:00 +0000 test + a + + $ hg up + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ cat a + a + +# push public commits + + $ cd ../shallow + $ echo p > p + $ hg commit -qAm p + $ hg phase -f -p -r . + $ echo d > d + $ hg commit -qAm d + + $ cd ../shallow2 + $ hg pull ../shallow + pulling from ../shallow + searching for changes + adding changesets + adding manifests + adding file changes + added 2 changesets with 2 changes to 2 files + new changesets 3a2e32c04641:cedeb4167c1f (1 drafts) + 2 local changesets published (?) + (run 'hg update' to get a working copy) + + $ cd .. + +# Test pushing from shallow to shallow with multiple manifests introducing the +# same filenode. Test this by constructing two separate histories of file 'c' +# that share a file node and verifying that the history works after pushing. + + $ hginit multimf-master + $ hgcloneshallow ssh://user@dummy/multimf-master multimf-shallow -q + $ hgcloneshallow ssh://user@dummy/multimf-master multimf-shallow2 -q + $ cd multimf-shallow + $ echo a > a + $ hg commit -qAm a + $ echo b > b + $ hg commit -qAm b + $ echo c > c + $ hg commit -qAm c1 + $ hg up -q 0 + $ echo c > c + $ hg commit -qAm c2 + $ echo cc > c + $ hg commit -qAm c22 + $ hg log -G -T '{rev} {desc}\n' + @ 4 c22 + | + o 3 c2 + | + | o 2 c1 + | | + | o 1 b + |/ + o 0 a + + + $ cd ../multimf-shallow2 +- initial commit to prevent hg pull from being a clone + $ echo z > z && hg commit -qAm z + $ hg pull -f ssh://user@dummy/$TESTTMP/multimf-shallow + pulling from ssh://user@dummy/$TESTTMP/multimf-shallow + searching for changes + warning: repository is unrelated + requesting all changes + adding changesets + adding manifests + adding file changes + added 5 changesets with 4 changes to 3 files (+2 heads) + new changesets cb9a9f314b8b:d8f06a4c6d38 (5 drafts) + (run 'hg heads' to see heads, 'hg merge' to merge) + + $ hg up -q 5 + $ hg log -f -T '{rev}\n' c + 5 + 4 diff --git a/tests/test-remotefilelog-repack-fast.t b/tests/test-remotefilelog-repack-fast.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-repack-fast.t @@ -0,0 +1,402 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ cat >> $HGRCPATH < [remotefilelog] + > fastdatapack=True + > EOF + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > serverexpiration=-1 + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo x >> x + $ hg commit -qAm x2 + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Set the prefetchdays config to zero so that all commits are prefetched +# no matter what their creation date is. + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > prefetchdays=0 + > EOF + $ cd .. + +# Test that repack cleans up the old files and creates new packs + + $ cd shallow + $ find $CACHEDIR | sort + $TESTTMP/hgcache + $TESTTMP/hgcache/master + $TESTTMP/hgcache/master/11 + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072 + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/aee31534993a501858fb6dd96a065671922e7d51 + $TESTTMP/hgcache/repos + + $ hg repack + + $ find $CACHEDIR | sort + $TESTTMP/hgcache + $TESTTMP/hgcache/master + $TESTTMP/hgcache/master/packs + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histidx + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histpack + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Test that the packs are readonly + $ ls_l $CACHEDIR/master/packs + -r--r--r-- 1145 276d308429d0303762befa376788300f0310f90e.histidx + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 1074 8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + -r--r--r-- 69 8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + -rw-r--r-- 0 repacklock + +# Test that the data in the new packs is accessible + $ hg cat -r . x + x + x + +# Test that adding new data and repacking it results in the loose data and the +# old packs being combined. + + $ cd ../master + $ echo x >> x + $ hg commit -m x3 + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histidx + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histpack + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + + $ hg repack --traceback + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histidx + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.dataidx + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Verify all the file data is still available + $ hg cat -r . x + x + x + x + $ hg cat -r '.^' x + x + x + +# Test that repacking again without new data does not delete the pack files +# and did not change the pack names + $ hg repack + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histidx + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.dataidx + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Run two repacks at once + $ hg repack --config "hooks.prerepack=sleep 3" & + $ sleep 1 + $ hg repack + skipping repack - another repack is already running + $ hg debugwaitonrepack >/dev/null 2>&1 + +# Run repack in the background + $ cd ../master + $ echo x >> x + $ hg commit -m x4 + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1bb2e6237e035c8f8ef508e281f1ce075bc6db72 + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histidx + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.dataidx + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + + $ hg repack --background + (running background repack) + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>&1 + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/094b530486dad4427a0faf6bcbc031571b99ca24.histidx + $TESTTMP/hgcache/master/packs/094b530486dad4427a0faf6bcbc031571b99ca24.histpack + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.dataidx + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Test debug commands + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + aee31534993a d4a3ed9310e5 12 4 + + Total: 32 18 (77.8% bigger) + $ hg debugdatapack --long $TESTTMP/hgcache/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 0000000000000000000000000000000000000000 8 8 + d4a3ed9310e5bd9887e3bf779da5077efab28216 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 12 6 + aee31534993a501858fb6dd96a065671922e7d51 d4a3ed9310e5bd9887e3bf779da5077efab28216 12 4 + + Total: 32 18 (77.8% bigger) + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.datapack --node d4a3ed9310e5bd9887e3bf779da5077efab28216 + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15: + + x + Node Delta Base Delta SHA1 Delta Length + d4a3ed9310e5bd9887e3bf779da5077efab28216 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 77029ab56e83ea2115dd53ff87483682abe5d7ca 12 + Node Delta Base Delta SHA1 Delta Length + 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 0000000000000000000000000000000000000000 7ca8c71a64f7b56380e77573da2f7a5fdd2ecdb5 8 + $ hg debughistorypack $TESTTMP/hgcache/master/packs/*.histidx + + x + Node P1 Node P2 Node Link Node Copy From + 1bb2e6237e03 d4a3ed9310e5 000000000000 0b03bbc9e1e7 + d4a3ed9310e5 aee31534993a 000000000000 421535db10b6 + aee31534993a 1406e7411862 000000000000 a89d614e2364 + 1406e7411862 000000000000 000000000000 b292c1e3311f + +# Test copy tracing from a pack + $ cd ../master + $ hg mv x y + $ hg commit -m 'move x to y' + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ hg repack + $ hg log -f y -T '{desc}\n' + move x to y + x4 + x3 + x2 + x + +# Test copy trace across rename and back + $ cp -R $TESTTMP/hgcache/master/packs $TESTTMP/backuppacks + $ cd ../master + $ hg mv y x + $ hg commit -m 'move y back to x' + $ hg revert -r 0 x + $ mv x y + $ hg add y + $ echo >> y + $ hg revert x + $ hg commit -m 'add y back without metadata' + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 2 files fetched over 2 fetches - (2 misses, 0.00% hit ratio) over * (glob) + $ hg repack + $ ls $TESTTMP/hgcache/master/packs + e8fdf7ae22b772dcc291f905b9c6e5f381d28739.dataidx + e8fdf7ae22b772dcc291f905b9c6e5f381d28739.datapack + ebbd7411e00456c0eec8d1150a77e2b3ef490f3f.histidx + ebbd7411e00456c0eec8d1150a77e2b3ef490f3f.histpack + repacklock + $ hg debughistorypack $TESTTMP/hgcache/master/packs/*.histidx + + x + Node P1 Node P2 Node Link Node Copy From + cd410a44d584 577959738234 000000000000 609547eda446 y + 1bb2e6237e03 d4a3ed9310e5 000000000000 0b03bbc9e1e7 + d4a3ed9310e5 aee31534993a 000000000000 421535db10b6 + aee31534993a 1406e7411862 000000000000 a89d614e2364 + 1406e7411862 000000000000 000000000000 b292c1e3311f + + y + Node P1 Node P2 Node Link Node Copy From + 577959738234 1bb2e6237e03 000000000000 c7faf2fc439a x + 21f46f2721e7 000000000000 000000000000 d6868642b790 + $ hg strip -r '.^' + 1 files updated, 0 files merged, 1 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/609547eda446-b26b56a8-backup.hg (glob) + $ hg -R ../master strip -r '.^' + 1 files updated, 0 files merged, 1 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/master/.hg/strip-backup/609547eda446-b26b56a8-backup.hg (glob) + + $ rm -rf $TESTTMP/hgcache/master/packs + $ cp -R $TESTTMP/backuppacks $TESTTMP/hgcache/master/packs + +# Test repacking datapack without history + $ rm -rf $CACHEDIR/master/packs/*hist* + $ hg repack + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/a8d86ff8e1a11a77a85f5fea567f56a757583eda: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + aee31534993a d4a3ed9310e5 12 4 + + Total: 32 18 (77.8% bigger) + y: + Node Delta Base Delta Length Blob Size + 577959738234 000000000000 70 8 + + Total: 70 8 (775.0% bigger) + + $ hg cat -r ".^" x + x + x + x + x + +Incremental repack + $ rm -rf $CACHEDIR/master/packs/* + $ cat >> .hg/hgrc < [remotefilelog] + > data.generations=60 + > 150 + > fetchpacks=True + > EOF + +Single pack - repack does nothing + $ hg prefetch -r 0 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + $ hg repack --incremental + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + +3 gen1 packs, 1 gen0 pack - packs 3 gen1 into 1 + $ hg prefetch -r 1 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 2 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 3 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 65 6c499d21350d79f92fd556b4b7a902569d88e3c9.datapack + -r--r--r-- 61 817d294043bd21a3de01f807721971abe45219ce.datapack + -r--r--r-- 63 ff45add45ab3f59c4f75efc6a087d86c821219d6.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 254 077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + $ hg repack --incremental + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 225 8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + +1 gen3 pack, 1 gen0 pack - does nothing + $ hg repack --incremental + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 225 8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + +Pull should run background repack + $ cat >> .hg/hgrc < [remotefilelog] + > backgroundrepack=True + > EOF + $ clearcache + $ hg prefetch -r 0 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 1 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 2 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 3 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 65 6c499d21350d79f92fd556b4b7a902569d88e3c9.datapack + -r--r--r-- 61 817d294043bd21a3de01f807721971abe45219ce.datapack + -r--r--r-- 63 ff45add45ab3f59c4f75efc6a087d86c821219d6.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 254 077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + no changes found + (running background incremental repack) + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>&1 + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 225 8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + +Test environment variable resolution + $ CACHEPATH=$TESTTMP/envcache hg prefetch --config 'remotefilelog.cachepath=$CACHEPATH' + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ find $TESTTMP/envcache | sort + $TESTTMP/envcache + $TESTTMP/envcache/master + $TESTTMP/envcache/master/packs + $TESTTMP/envcache/master/packs/54afbfda203716c1aa2636029ccc0df18165129e.dataidx + $TESTTMP/envcache/master/packs/54afbfda203716c1aa2636029ccc0df18165129e.datapack + $TESTTMP/envcache/master/packs/dcebd8e8d4d97ee88e40dd8f92d8678c10e1a3ad.histidx + $TESTTMP/envcache/master/packs/dcebd8e8d4d97ee88e40dd8f92d8678c10e1a3ad.histpack + +Test local remotefilelog blob is correct when based on a pack + $ hg prefetch -r . + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ echo >> y + $ hg commit -m y2 + $ hg debugremotefilelog .hg/store/data/95cb0bfd2977c761298d9624e4b4d4c72a39974a/b70860edba4f8242a1d52f2a94679dd23cb76808 + size: 9 bytes + path: .hg/store/data/95cb0bfd2977c761298d9624e4b4d4c72a39974a/b70860edba4f8242a1d52f2a94679dd23cb76808 + key: b70860edba4f + + node => p1 p2 linknode copyfrom + b70860edba4f => 577959738234 000000000000 08d3fbc98c48 + 577959738234 => 1bb2e6237e03 000000000000 c7faf2fc439a x + 1bb2e6237e03 => d4a3ed9310e5 000000000000 0b03bbc9e1e7 + d4a3ed9310e5 => aee31534993a 000000000000 421535db10b6 + aee31534993a => 1406e7411862 000000000000 a89d614e2364 + 1406e7411862 => 000000000000 000000000000 b292c1e3311f diff --git a/tests/test-remotefilelog-repack.t b/tests/test-remotefilelog-repack.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-repack.t @@ -0,0 +1,483 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > serverexpiration=-1 + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo x >> x + $ hg commit -qAm x2 + $ cd .. + + $ hgcloneshallow ssh://user@dummy/master shallow -q + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Set the prefetchdays config to zero so that all commits are prefetched +# no matter what their creation date is. + $ cd shallow + $ cat >> .hg/hgrc < [remotefilelog] + > prefetchdays=0 + > EOF + $ cd .. + +# Test that repack cleans up the old files and creates new packs + + $ cd shallow + $ find $CACHEDIR | sort + $TESTTMP/hgcache + $TESTTMP/hgcache/master + $TESTTMP/hgcache/master/11 + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072 + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/aee31534993a501858fb6dd96a065671922e7d51 + $TESTTMP/hgcache/repos + + $ hg repack + + $ find $CACHEDIR | sort + $TESTTMP/hgcache + $TESTTMP/hgcache/master + $TESTTMP/hgcache/master/packs + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histidx + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histpack + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Test that the packs are readonly + $ ls_l $CACHEDIR/master/packs + -r--r--r-- 1145 276d308429d0303762befa376788300f0310f90e.histidx + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 1074 8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + -r--r--r-- 69 8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + -rw-r--r-- 0 repacklock + +# Test that the data in the new packs is accessible + $ hg cat -r . x + x + x + +# Test that adding new data and repacking it results in the loose data and the +# old packs being combined. + + $ cd ../master + $ echo x >> x + $ hg commit -m x3 + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histidx + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histpack + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# First assert that with --packsonly, the loose object will be ignored: + + $ hg repack --packsonly + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/d4a3ed9310e5bd9887e3bf779da5077efab28216 + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histidx + $TESTTMP/hgcache/master/packs/276d308429d0303762befa376788300f0310f90e.histpack + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.dataidx + $TESTTMP/hgcache/master/packs/8e25dec685d5e0bb1f1b39df3acebda0e0d75c6e.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + + $ hg repack --traceback + + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histidx + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.dataidx + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Verify all the file data is still available + $ hg cat -r . x + x + x + x + $ hg cat -r '.^' x + x + x + +# Test that repacking again without new data does not delete the pack files +# and did not change the pack names + $ hg repack + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histidx + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.dataidx + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Run two repacks at once + $ hg repack --config "hooks.prerepack=sleep 3" & + $ sleep 1 + $ hg repack + skipping repack - another repack is already running + $ hg debugwaitonrepack >/dev/null 2>&1 + +# Run repack in the background + $ cd ../master + $ echo x >> x + $ hg commit -m x4 + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/11/f6ad8ec52a2984abaafd7c3b516503785c2072/1bb2e6237e035c8f8ef508e281f1ce075bc6db72 + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histidx + $TESTTMP/hgcache/master/packs/077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.dataidx + $TESTTMP/hgcache/master/packs/935861cae0be6ce41a0d47a529e4d097e9e68a69.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + + $ hg repack --background + (running background repack) + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>&1 + $ find $CACHEDIR -type f | sort + $TESTTMP/hgcache/master/packs/094b530486dad4427a0faf6bcbc031571b99ca24.histidx + $TESTTMP/hgcache/master/packs/094b530486dad4427a0faf6bcbc031571b99ca24.histpack + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.dataidx + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $TESTTMP/hgcache/master/packs/repacklock + $TESTTMP/hgcache/repos + +# Test debug commands + + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + aee31534993a d4a3ed9310e5 12 4 + + Total: 32 18 (77.8% bigger) + $ hg debugdatapack --long $TESTTMP/hgcache/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 0000000000000000000000000000000000000000 8 8 + d4a3ed9310e5bd9887e3bf779da5077efab28216 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 12 6 + aee31534993a501858fb6dd96a065671922e7d51 d4a3ed9310e5bd9887e3bf779da5077efab28216 12 4 + + Total: 32 18 (77.8% bigger) + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.datapack --node d4a3ed9310e5bd9887e3bf779da5077efab28216 + $TESTTMP/hgcache/master/packs/8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15: + + x + Node Delta Base Delta SHA1 Delta Length + d4a3ed9310e5bd9887e3bf779da5077efab28216 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 77029ab56e83ea2115dd53ff87483682abe5d7ca 12 + Node Delta Base Delta SHA1 Delta Length + 1bb2e6237e035c8f8ef508e281f1ce075bc6db72 0000000000000000000000000000000000000000 7ca8c71a64f7b56380e77573da2f7a5fdd2ecdb5 8 + $ hg debughistorypack $TESTTMP/hgcache/master/packs/*.histidx + + x + Node P1 Node P2 Node Link Node Copy From + 1bb2e6237e03 d4a3ed9310e5 000000000000 0b03bbc9e1e7 + d4a3ed9310e5 aee31534993a 000000000000 421535db10b6 + aee31534993a 1406e7411862 000000000000 a89d614e2364 + 1406e7411862 000000000000 000000000000 b292c1e3311f + +# Test copy tracing from a pack + $ cd ../master + $ hg mv x y + $ hg commit -m 'move x to y' + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over * (glob) + $ hg repack + $ hg log -f y -T '{desc}\n' + move x to y + x4 + x3 + x2 + x + +# Test copy trace across rename and back + $ cp -R $TESTTMP/hgcache/master/packs $TESTTMP/backuppacks + $ cd ../master + $ hg mv y x + $ hg commit -m 'move y back to x' + $ hg revert -r 0 x + $ mv x y + $ hg add y + $ echo >> y + $ hg revert x + $ hg commit -m 'add y back without metadata' + $ cd ../shallow + $ hg pull -q + $ hg up -q tip + 2 files fetched over 2 fetches - (2 misses, 0.00% hit ratio) over * (glob) + $ hg repack + $ ls $TESTTMP/hgcache/master/packs + e8fdf7ae22b772dcc291f905b9c6e5f381d28739.dataidx + e8fdf7ae22b772dcc291f905b9c6e5f381d28739.datapack + ebbd7411e00456c0eec8d1150a77e2b3ef490f3f.histidx + ebbd7411e00456c0eec8d1150a77e2b3ef490f3f.histpack + repacklock + $ hg debughistorypack $TESTTMP/hgcache/master/packs/*.histidx + + x + Node P1 Node P2 Node Link Node Copy From + cd410a44d584 577959738234 000000000000 609547eda446 y + 1bb2e6237e03 d4a3ed9310e5 000000000000 0b03bbc9e1e7 + d4a3ed9310e5 aee31534993a 000000000000 421535db10b6 + aee31534993a 1406e7411862 000000000000 a89d614e2364 + 1406e7411862 000000000000 000000000000 b292c1e3311f + + y + Node P1 Node P2 Node Link Node Copy From + 577959738234 1bb2e6237e03 000000000000 c7faf2fc439a x + 21f46f2721e7 000000000000 000000000000 d6868642b790 + $ hg strip -r '.^' + 1 files updated, 0 files merged, 1 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/609547eda446-b26b56a8-backup.hg (glob) + $ hg -R ../master strip -r '.^' + 1 files updated, 0 files merged, 1 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/master/.hg/strip-backup/609547eda446-b26b56a8-backup.hg (glob) + + $ rm -rf $TESTTMP/hgcache/master/packs + $ cp -R $TESTTMP/backuppacks $TESTTMP/hgcache/master/packs + +# Test repacking datapack without history + $ rm -rf $CACHEDIR/master/packs/*hist* + $ hg repack + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/a8d86ff8e1a11a77a85f5fea567f56a757583eda: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + aee31534993a d4a3ed9310e5 12 4 + + Total: 32 18 (77.8% bigger) + y: + Node Delta Base Delta Length Blob Size + 577959738234 000000000000 70 8 + + Total: 70 8 (775.0% bigger) + + $ hg cat -r ".^" x + x + x + x + x + +Incremental repack + $ rm -rf $CACHEDIR/master/packs/* + $ cat >> .hg/hgrc < [remotefilelog] + > data.generations=60 + > 150 + > fetchpacks=True + > EOF + +Single pack - repack does nothing + $ hg prefetch -r 0 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + $ hg repack --incremental + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + +3 gen1 packs, 1 gen0 pack - packs 3 gen1 into 1 + $ hg prefetch -r 1 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 2 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 3 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 65 6c499d21350d79f92fd556b4b7a902569d88e3c9.datapack + -r--r--r-- 61 817d294043bd21a3de01f807721971abe45219ce.datapack + -r--r--r-- 63 ff45add45ab3f59c4f75efc6a087d86c821219d6.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 254 077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + +For the data packs, setting the limit for the repackmaxpacksize to be 64 such +that data pack with size 65 is more than the limit. This effectively ensures +that no generation has 3 packs and therefore, no packs are chosen for the +incremental repacking. As for the history packs, setting repackmaxpacksize to be +0 which should always result in no repacking. + $ hg repack --incremental --config remotefilelog.data.repackmaxpacksize=64 \ + > --config remotefilelog.history.repackmaxpacksize=0 + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 65 6c499d21350d79f92fd556b4b7a902569d88e3c9.datapack + -r--r--r-- 61 817d294043bd21a3de01f807721971abe45219ce.datapack + -r--r--r-- 63 ff45add45ab3f59c4f75efc6a087d86c821219d6.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 254 077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + +Setting limit for the repackmaxpacksize to be the size of the biggest pack file +which ensures that it is effectively ignored in the incremental repacking. + $ hg repack --incremental --config remotefilelog.data.repackmaxpacksize=65 \ + > --config remotefilelog.history.repackmaxpacksize=336 + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 225 8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + +1 gen3 pack, 1 gen0 pack - does nothing + $ hg repack --incremental + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 225 8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + +Pull should run background repack + $ cat >> .hg/hgrc < [remotefilelog] + > backgroundrepack=True + > EOF + $ clearcache + $ hg prefetch -r 0 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 1 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 2 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg prefetch -r 3 + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 65 6c499d21350d79f92fd556b4b7a902569d88e3c9.datapack + -r--r--r-- 61 817d294043bd21a3de01f807721971abe45219ce.datapack + -r--r--r-- 63 ff45add45ab3f59c4f75efc6a087d86c821219d6.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 254 077e7ce5dfe862dc40cc8f3c9742d96a056865f2.histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + -r--r--r-- 172 276d308429d0303762befa376788300f0310f90e.histpack + -r--r--r-- 90 c3399b56e035f73c3295276ed098235a08a0ed8c.histpack + + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + no changes found + (running background incremental repack) + $ sleep 0.5 + $ hg debugwaitonrepack >/dev/null 2>&1 + $ ls_l $TESTTMP/hgcache/master/packs/ | grep datapack + -r--r--r-- 59 5b7dec902026f0cddb0ef8acb62f27b5698494d4.datapack + -r--r--r-- 225 8fe685c56f6f7edf550bfcec74eeecc5f3c2ba15.datapack + $ ls_l $TESTTMP/hgcache/master/packs/ | grep histpack + -r--r--r-- 336 094b530486dad4427a0faf6bcbc031571b99ca24.histpack + +Test environment variable resolution + $ CACHEPATH=$TESTTMP/envcache hg prefetch --config 'remotefilelog.cachepath=$CACHEPATH' + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ find $TESTTMP/envcache | sort + $TESTTMP/envcache + $TESTTMP/envcache/master + $TESTTMP/envcache/master/packs + $TESTTMP/envcache/master/packs/54afbfda203716c1aa2636029ccc0df18165129e.dataidx + $TESTTMP/envcache/master/packs/54afbfda203716c1aa2636029ccc0df18165129e.datapack + $TESTTMP/envcache/master/packs/dcebd8e8d4d97ee88e40dd8f92d8678c10e1a3ad.histidx + $TESTTMP/envcache/master/packs/dcebd8e8d4d97ee88e40dd8f92d8678c10e1a3ad.histpack + +Test local remotefilelog blob is correct when based on a pack + $ hg prefetch -r . + 1 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ echo >> y + $ hg commit -m y2 + $ hg debugremotefilelog .hg/store/data/95cb0bfd2977c761298d9624e4b4d4c72a39974a/b70860edba4f8242a1d52f2a94679dd23cb76808 + size: 9 bytes + path: .hg/store/data/95cb0bfd2977c761298d9624e4b4d4c72a39974a/b70860edba4f8242a1d52f2a94679dd23cb76808 + key: b70860edba4f + + node => p1 p2 linknode copyfrom + b70860edba4f => 577959738234 000000000000 08d3fbc98c48 + 577959738234 => 1bb2e6237e03 000000000000 c7faf2fc439a x + 1bb2e6237e03 => d4a3ed9310e5 000000000000 0b03bbc9e1e7 + d4a3ed9310e5 => aee31534993a 000000000000 421535db10b6 + aee31534993a => 1406e7411862 000000000000 a89d614e2364 + 1406e7411862 => 000000000000 000000000000 b292c1e3311f + +Test limiting the max delta chain length + $ hg repack --config packs.maxchainlen=1 + $ hg debugdatapack $TESTTMP/hgcache/master/packs/*.dataidx + $TESTTMP/hgcache/master/packs/a2731c9a16403457b67337a620931797fce8c821: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + aee31534993a 000000000000 4 4 + 1406e7411862 aee31534993a 12 2 + + Total: 36 20 (80.0% bigger) + y: + Node Delta Base Delta Length Blob Size + 577959738234 000000000000 8 8 + + Total: 8 8 (0.0% bigger) + +Test huge pack cleanup using different values of packs.maxpacksize: + $ hg repack --incremental --debug + $ hg repack --incremental --debug --config packs.maxpacksize=512 + removing oversize packfile $TESTTMP/hgcache/master/packs/a2731c9a16403457b67337a620931797fce8c821.datapack (365 bytes) + removing oversize packfile $TESTTMP/hgcache/master/packs/a2731c9a16403457b67337a620931797fce8c821.dataidx (1.21 KB) + +Do a repack where the new pack reuses a delta from the old pack + $ clearcache + $ hg prefetch -r '2::3' + 2 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg repack + $ hg debugdatapack $CACHEDIR/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/abf210f6c3aa4dd0ecc7033633ad73591be16c95: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + + Total: 20 14 (42.9% bigger) + $ hg prefetch -r '0::1' + 2 files fetched over 1 fetches - (0 misses, 100.00% hit ratio) over * (glob) + $ hg repack + $ hg debugdatapack $CACHEDIR/master/packs/*.datapack + $TESTTMP/hgcache/master/packs/09b8bf49256b3fc2175977ba97d6402e91a9a604: + x: + Node Delta Base Delta Length Blob Size + 1bb2e6237e03 000000000000 8 8 + d4a3ed9310e5 1bb2e6237e03 12 6 + aee31534993a d4a3ed9310e5 12 4 + 1406e7411862 aee31534993a 12 2 + + Total: 44 20 (120.0% bigger) diff --git a/tests/test-remotefilelog-sparse.t b/tests/test-remotefilelog-sparse.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-sparse.t @@ -0,0 +1,110 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ echo z > z + $ hg commit -qAm x1 + $ echo x2 > x + $ echo z2 > z + $ hg commit -qAm x2 + $ hg bookmark foo + + $ cd .. + +# prefetch a revision w/ a sparse checkout + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 2 files to transfer, 527 bytes of data + transferred 527 bytes in 0.* seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow + $ printf "[extensions]\nsparse=\n" >> .hg/hgrc + + $ hg debugsparse -I x + $ hg prefetch -r 0 + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 0 x + x + + $ hg debugsparse -I z + $ hg prefetch -r 0 + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + + $ hg cat -r 0 z + z + +# prefetch sparse only on pull when configured + + $ printf "[remotefilelog]\npullprefetch=bookmark()\n" >> .hg/hgrc + $ hg strip tip + saved backup bundle to $TESTTMP/shallow/.hg/strip-backup/876b1317060d-b2e91d8d-backup.hg (glob) + + $ hg debugsparse --delete z + + $ clearcache + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + updating bookmark foo + new changesets 876b1317060d + (run 'hg update' to get a working copy) + prefetching file contents + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + +# Dont consider filtered files when doing copy tracing + +## Push an unrelated commit + $ cd ../ + + $ hgcloneshallow ssh://user@dummy/master shallow2 + streaming all changes + 2 files to transfer, 527 bytes of data + transferred 527 bytes in 0.* seconds (*) (glob) + searching for changes + no changes found + updating to branch default + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) + $ cd shallow2 + $ printf "[extensions]\nsparse=\n" >> .hg/hgrc + + $ hg up -q 0 + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + $ touch a + $ hg ci -Aqm a + $ hg push -q -f + +## Pull the unrelated commit and rebase onto it - verify unrelated file was not +pulled + + $ cd ../shallow + $ hg up -q 1 + $ hg pull -q + $ hg debugsparse -I z + $ clearcache + $ hg prefetch -r '. + .^' -I x -I z + 4 files fetched over 1 fetches - (4 misses, 0.00% hit ratio) over * (glob) +Originally this was testing that the rebase doesn't fetch pointless +blobs. Right now it fails because core's sparse can't load a spec from +the working directory. Presumably there's a fix, but I'm not sure what it is. + $ hg rebase -d 2 --keep + rebasing 1:876b1317060d "x2" (foo) + transaction abort! + rollback completed + abort: cannot parse sparse patterns from working directory + [255] diff --git a/tests/test-remotefilelog-tags.t b/tests/test-remotefilelog-tags.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-tags.t @@ -0,0 +1,79 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > foo + $ echo y > bar + $ hg commit -qAm one + $ hg tag tag1 + $ cd .. + +# clone with tags + + $ hg clone --shallow ssh://user@dummy/master shallow --noupdate --config remotefilelog.excludepattern=.hgtags + streaming all changes + 3 files to transfer, 662 bytes of data + transferred 662 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cat >> shallow/.hg/hgrc < [remotefilelog] + > cachepath=$PWD/hgcache + > debug=True + > reponame = master + > excludepattern=.hgtags + > [extensions] + > remotefilelog= + > EOF + + $ cd shallow + $ ls .hg/store/data + ~2ehgtags.i + $ hg tags + tip 1:6ce44dcfda68 + tag1 0:e0360bc0d9e1 + $ hg update + 3 files updated, 0 files merged, 0 files removed, 0 files unresolved + 2 files fetched over 1 fetches - (2 misses, 0.00% hit ratio) over *s (glob) + +# pull with tags + + $ cd ../master + $ hg tag tag2 + $ cd ../shallow + $ hg pull + pulling from ssh://user@dummy/master + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 0 changes to 0 files + new changesets 6a22dfa4fd34 + (run 'hg update' to get a working copy) + $ hg tags + tip 2:6a22dfa4fd34 + tag2 1:6ce44dcfda68 + tag1 0:e0360bc0d9e1 + $ hg update + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + + $ ls .hg/store/data + ~2ehgtags.i + + $ hg log -l 1 --stat + changeset: 2:6a22dfa4fd34 + tag: tip + user: test + date: Thu Jan 01 00:00:00 1970 +0000 + summary: Added tag tag2 for changeset 6ce44dcfda68 + + .hgtags | 1 + + 1 files changed, 1 insertions(+), 0 deletions(-) + diff --git a/tests/test-remotefilelog-wireproto.t b/tests/test-remotefilelog-wireproto.t new file mode 100644 --- /dev/null +++ b/tests/test-remotefilelog-wireproto.t @@ -0,0 +1,49 @@ + $ PYTHONPATH=$TESTDIR/..:$PYTHONPATH + $ export PYTHONPATH + + $ . "$TESTDIR/remotefilelog-library.sh" + + $ hginit master + $ cd master + $ cat >> .hg/hgrc < [remotefilelog] + > server=True + > EOF + $ echo x > x + $ hg commit -qAm x + $ echo y >> x + $ hg commit -qAm y + $ echo z >> x + $ hg commit -qAm z + $ hg update 1 + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ echo w >> x + $ hg commit -qAm w + + $ cd .. + +Shallow clone and activate getflogheads testing extension + + $ hgcloneshallow ssh://user@dummy/master shallow --noupdate + streaming all changes + 2 files to transfer, 908 bytes of data + transferred 908 bytes in * seconds (*/sec) (glob) + searching for changes + no changes found + $ cd shallow + + $ cat >> .hg/hgrc < [extensions] + > getflogheads=$TESTDIR/remotefilelog-getflogheads.py + > EOF + +Get heads of a remotefilelog + + $ hg getflogheads x + 2797809ca5e9c2f307d82b1345e832f655fb99a2 + ca758b402ddc91e37e3113e1a97791b537e1b7bb + +Get heads of a non-existing remotefilelog + + $ hg getflogheads y + EMPTY