This is an archive of the discontinued Mercurial Phabricator instance.

match: add visitchildrenset complement to visitdir
ClosedPublic

Authored by spectral on Aug 6 2018, 2:50 PM.

Details

Summary

visitdir(d) lets a caller query whether the directory is part of the matcher.
It can receive a response of 'all' (yes, and all children, you can stop calling
visitdir now), False (no, and no children either), or True (yes, either
something in this directory or a child is part of the matcher).

visitchildrenset(d) augments that by instead of returning True, it returns a
list of items to actually investigate. With this, code can be modified from:

for f in self.all_items:
    if match.visitdir(self.dir + '/' + f):
        <do stuff>

to be:

for f in self.all_items.intersect(match.visitchildrenset(self.dir)):
    <do stuff>

use of this function can provide significant performance improvements,
especially when using narrow (so that the matcher is much smaller than the stuff
we see on disk) and/or treemanifests (so that we can avoid loading manifests for
trees that aren't part of the matcher).

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

spectral created this revision.Aug 6 2018, 2:50 PM
spectral planned changes to this revision.Aug 6 2018, 3:56 PM
spectral updated this revision to Diff 10017.Aug 6 2018, 5:17 PM
yuja added a subscriber: yuja.Aug 8 2018, 11:16 AM

+ def testVisitchildrensetRootfilesin(self):
+ m = matchmod.match('x', '', patterns=['rootfilesin:dir/subdir'])
+ assert isinstance(m, matchmod.patternmatcher)
+ self.assertEqual(m.visitchildrenset('.'), 'this')
+ self.assertEqual(m.visitchildrenset('dir/subdir/x'), set())
+ self.assertEqual(m.visitchildrenset('folder'), set())

+ self.assertEqual(m.visitchildrenset('dir'), set())
+ self.assertEqual(m.visitchildrenset('dir/subdir'), set())

I assumed these two have the same bug as the one mentioned in testVisitdirRootfilesin()

@@ -748,6 +903,14 @@

    return self._matcher.visitdir(dir[len(self._pathprefix):])
return dir in self._pathdirs

+ def visitchildrenset(self, dir):
+ if dir == self._path:
+ return self._matcher.visitchildrenset('.')
+ if dir.startswith(self._pathprefix):
+ return self._matcher.visitchildrenset(dir[len(self._pathprefix):])
+ if dir in self._pathdirs:
+ return 'this'

Maybe missing return set()?

This revision was automatically updated to reflect the committed changes.
In D4130#64345, @yuja wrote:

+ def testVisitchildrensetRootfilesin(self):
+ m = matchmod.match('x', '', patterns=['rootfilesin:dir/subdir'])
+ assert isinstance(m, matchmod.patternmatcher)
+ self.assertEqual(m.visitchildrenset('.'), 'this')
+ self.assertEqual(m.visitchildrenset('dir/subdir/x'), set())
+ self.assertEqual(m.visitchildrenset('folder'), set())
+ self.assertEqual(m.visitchildrenset('dir'), set())
+ self.assertEqual(m.visitchildrenset('dir/subdir'), set())

I assumed these two have the same bug as the one mentioned in testVisitdirRootfilesin()

Indeed, adding a comment.

@@ -748,6 +903,14 @@

    return self._matcher.visitdir(dir[len(self._pathprefix):])
return dir in self._pathdirs

+ def visitchildrenset(self, dir):
+ if dir == self._path:
+ return self._matcher.visitchildrenset('.')
+ if dir.startswith(self._pathprefix):
+ return self._matcher.visitchildrenset(dir[len(self._pathprefix):])
+ if dir in self._pathdirs:
+ return 'this'

Maybe missing return set()?

Yikes, good catch.