This is an archive of the discontinued Mercurial Phabricator instance.

match: add visitchildrenset complement to visitdir
ClosedPublic

Authored by spectral on Aug 6 2018, 2:50 PM.

Details

Summary

visitdir(d) lets a caller query whether the directory is part of the matcher.
It can receive a response of 'all' (yes, and all children, you can stop calling
visitdir now), False (no, and no children either), or True (yes, either
something in this directory or a child is part of the matcher).

visitchildrenset(d) augments that by instead of returning True, it returns a
list of items to actually investigate. With this, code can be modified from:

for f in self.all_items:
    if match.visitdir(self.dir + '/' + f):
        <do stuff>

to be:

for f in self.all_items.intersect(match.visitchildrenset(self.dir)):
    <do stuff>

use of this function can provide significant performance improvements,
especially when using narrow (so that the matcher is much smaller than the stuff
we see on disk) and/or treemanifests (so that we can avoid loading manifests for
trees that aren't part of the matcher).

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

spectral created this revision.Aug 6 2018, 2:50 PM
spectral planned changes to this revision.Aug 6 2018, 3:56 PM
spectral updated this revision to Diff 10017.Aug 6 2018, 5:17 PM
yuja added a subscriber: yuja.Aug 8 2018, 11:16 AM

+ def testVisitchildrensetRootfilesin(self):
+ m = matchmod.match('x', '', patterns=['rootfilesin:dir/subdir'])
+ assert isinstance(m, matchmod.patternmatcher)
+ self.assertEqual(m.visitchildrenset('.'), 'this')
+ self.assertEqual(m.visitchildrenset('dir/subdir/x'), set())
+ self.assertEqual(m.visitchildrenset('folder'), set())

+ self.assertEqual(m.visitchildrenset('dir'), set())
+ self.assertEqual(m.visitchildrenset('dir/subdir'), set())

I assumed these two have the same bug as the one mentioned in testVisitdirRootfilesin()

@@ -748,6 +903,14 @@

    return self._matcher.visitdir(dir[len(self._pathprefix):])
return dir in self._pathdirs

+ def visitchildrenset(self, dir):
+ if dir == self._path:
+ return self._matcher.visitchildrenset('.')
+ if dir.startswith(self._pathprefix):
+ return self._matcher.visitchildrenset(dir[len(self._pathprefix):])
+ if dir in self._pathdirs:
+ return 'this'

Maybe missing return set()?

This revision was automatically updated to reflect the committed changes.
In D4130#64345, @yuja wrote:

+ def testVisitchildrensetRootfilesin(self):
+ m = matchmod.match('x', '', patterns=['rootfilesin:dir/subdir'])
+ assert isinstance(m, matchmod.patternmatcher)
+ self.assertEqual(m.visitchildrenset('.'), 'this')
+ self.assertEqual(m.visitchildrenset('dir/subdir/x'), set())
+ self.assertEqual(m.visitchildrenset('folder'), set())
+ self.assertEqual(m.visitchildrenset('dir'), set())
+ self.assertEqual(m.visitchildrenset('dir/subdir'), set())

I assumed these two have the same bug as the one mentioned in testVisitdirRootfilesin()

Indeed, adding a comment.

@@ -748,6 +903,14 @@

    return self._matcher.visitdir(dir[len(self._pathprefix):])
return dir in self._pathdirs

+ def visitchildrenset(self, dir):
+ if dir == self._path:
+ return self._matcher.visitchildrenset('.')
+ if dir.startswith(self._pathprefix):
+ return self._matcher.visitchildrenset(dir[len(self._pathprefix):])
+ if dir in self._pathdirs:
+ return 'this'

Maybe missing return set()?

Yikes, good catch.