store: introduce _matchtrackedpath() and use it to filter store files
ClosedPublic

Authored by pulkit on Oct 17 2018, 10:53 AM.

Details

Summary

This patch introduces a function to filter store files on the basis of the path
which they are tracking.

The function assumes that the entries can be of two types, 'meta/*' and 'data/*'
which means it will just work on revlog based storage and not with another
storage ways.

For the 'data/*' entries, we remove the 'data/' part and '.i/.d' part from the
beginning and the end then pass that to matcher.

For the 'meta/*' entries, we remove the 'meta/' and '/00manifest.(i/d)' part from
beginning and end then call matcher.visitdir() with it to make sure all the
parent directories are also downloaded.

Since the storage filtering for narrow stream clones is implemented with this
patch, we remove the un-implemented error message, add some more tests and add
the treemanifest case to tests too.

The tests demonstrate that it works correctly.

After this patch, we have now narrow stream clones working. Narrow stream clones
are a very important feature for large repositories who have good internet
connection because they use streamclones for cloning and if they do normal
narrow clone, that takes more time then a full streamclone. Also narrow-stream
clone will drastically speed up clone timings.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
pulkit created this revision.Oct 17 2018, 10:53 AM
pulkit updated this revision to Diff 12259.Oct 19 2018, 9:01 AM
durin42 added inline comments.Mon, Nov 5, 11:08 AM
mercurial/store.py
39

Please follow up with a patch that raises ProgrammingError at the end of this function - right now you just magically return False if you don't recognize the path, which feels dangerous.

This revision was automatically updated to reflect the committed changes.
yuja added a subscriber: yuja.Thu, Nov 8, 7:12 AM

Can you add the following tests?

  • encoded filename differs from the original name (e.g. uppercase letter)
  • fncache disabled, but encodedstore is used
def datafiles(self, matcher=None):
    for a, b, size in super(encodedstore, self).datafiles():

+ if not _matchtrackedpath(a, matcher):
+ continue

try:
    a = decodefilename(a)

I'm pretty sure it's wrong to pass in an encoded filename to matcher.