This is an archive of the discontinued Mercurial Phabricator instance.

store: pass matcher to store.datafiles()
ClosedPublic

Authored by pulkit on Oct 3 2018, 11:02 AM.

Details

Summary

To get narrow stream clones working, we need a way to filter the storage files
using a matcher. This patch adds matcher as an argument to store.walk() and
store.datafiles() so that we can filter the files returned according to the
matcher.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pulkit created this revision.Oct 3 2018, 11:02 AM
martinvonz added inline comments.
mercurial/store.py
414–420

This doesn't seem right to me. Let's say the matcher is rootfilesin:some/dir, then matcher('some/dir/foo') will be True, but matcher('some') (the first-level directory) will not be. That seems to mean that the client will not get all the directories it needs.

Maybe this code needs to be made less generic and start walking the directories like other tree-walking algorithms we have do. In repos that use tree manifests for all their commits, we should be able to walk the directories in .hg/store/meta and look for files in .hg/store/data only for directories found in that walk. However, that only works if all commits use treemanifests. I think it's good enough for now (and maybe forever) to instead pass all the file names into a util.dirs object and then walk those directories using matcher.visitdir(). For each directory found that way, we would look for the manifest revlog in .hg/store/meta and include it if it's found (and ignore it if it's not).

martinvonz requested changes to this revision.Oct 7 2018, 5:48 PM
This revision now requires changes to proceed.Oct 7 2018, 5:48 PM

Another conceptual problem with this is that it assumes data/ and meta/ are used for tracking just filelogs and manifestlogs. In theory, other revlogs / data files could be stored there.

For files / data/ paths, I think we're OK making this assumption. But for manifests / meta/, I would feel better if we built up a set of tree manifest directories and then intersected that with files in meta/ that map to their revlogs.

Another conceptual problem with this is that it assumes data/ and meta/ are used for tracking just filelogs and manifestlogs. In theory, other revlogs / data files could be stored there.
For files / data/ paths, I think we're OK making this assumption. But for manifests / meta/, I would feel better if we built up a set of tree manifest directories and then intersected that with files in meta/ that map to their revlogs.

I discussed this with martinvonz on Friday and we decided to use matcher.visitdir() for the meta/ ones.

pulkit retitled this revision from store: pass matcher to store.datafiles() and filter files according to it to store: pass matcher to store.datafiles().Oct 17 2018, 10:53 AM
pulkit updated this revision to Diff 12210.
This revision was automatically updated to reflect the committed changes.