To get narrow stream clones working, we need a way to filter the storage files
using a matcher. This patch adds matcher as an argument to store.walk() and
store.datafiles() so that we can filter the files returned according to the
matcher.
Details
- Reviewers
martinvonz - Group Reviewers
hg-reviewers - Commits
- rHG2d45b549392f: store: pass matcher to store.datafiles()
Diff Detail
- Repository
- rHG Mercurial
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
mercurial/store.py | ||
---|---|---|
414–420 | This doesn't seem right to me. Let's say the matcher is rootfilesin:some/dir, then matcher('some/dir/foo') will be True, but matcher('some') (the first-level directory) will not be. That seems to mean that the client will not get all the directories it needs. Maybe this code needs to be made less generic and start walking the directories like other tree-walking algorithms we have do. In repos that use tree manifests for all their commits, we should be able to walk the directories in .hg/store/meta and look for files in .hg/store/data only for directories found in that walk. However, that only works if all commits use treemanifests. I think it's good enough for now (and maybe forever) to instead pass all the file names into a util.dirs object and then walk those directories using matcher.visitdir(). For each directory found that way, we would look for the manifest revlog in .hg/store/meta and include it if it's found (and ignore it if it's not). |
Another conceptual problem with this is that it assumes data/ and meta/ are used for tracking just filelogs and manifestlogs. In theory, other revlogs / data files could be stored there.
For files / data/ paths, I think we're OK making this assumption. But for manifests / meta/, I would feel better if we built up a set of tree manifest directories and then intersected that with files in meta/ that map to their revlogs.
I discussed this with martinvonz on Friday and we decided to use matcher.visitdir() for the meta/ ones.
This doesn't seem right to me. Let's say the matcher is rootfilesin:some/dir, then matcher('some/dir/foo') will be True, but matcher('some') (the first-level directory) will not be. That seems to mean that the client will not get all the directories it needs.
Maybe this code needs to be made less generic and start walking the directories like other tree-walking algorithms we have do. In repos that use tree manifests for all their commits, we should be able to walk the directories in .hg/store/meta and look for files in .hg/store/data only for directories found in that walk. However, that only works if all commits use treemanifests. I think it's good enough for now (and maybe forever) to instead pass all the file names into a util.dirs object and then walk those directories using matcher.visitdir(). For each directory found that way, we would look for the manifest revlog in .hg/store/meta and include it if it's found (and ignore it if it's not).