This is an archive of the discontinued Mercurial Phabricator instance.

hgignore: conversion glob into regex faster
AbandonedPublic

Authored by valentin.gatienbaron on Nov 14 2018, 2:48 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

Measuring this in one of my repos:

a = repo.dirstate
before = datetime.datetime.now()
ignore = a._ignore
after = datetime.datetime.now()
print 'duration:', str(after - before)

base : 0.207s
this change : 0.061s
reescapechar = lambda x:x : 0.056s

So hg status gets faster by this much, orthogonally to chg and
fsmonitor.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

yuja added a subscriber: yuja.Nov 15 2018, 6:33 AM
I couldn't find documentation on how encoding works for this (user
data interpreted by hg). This function appears to assume the
encoding of the input pattern is an extension of ascii, so I think
my change should be correct for that.

Correct. It expects ASCII superset.

def reescape(pat):

"""Drop-in replacement for re.escape."""
# NOTE: it is intentional that this works on unicodes and not
# bytes, as it's only possible to do the escaping with
# unicode.translate, not bytes.translate. Sigh.
wantuni = True
if isinstance(pat, bytes):

+ if len(pat) == 1:
+ # fast path for hgignore parsing, which calls this on one
+ # char at a time
+ return _regexescapemapb.get(pat, pat)

Doh. I think it's better to add a function that escapes exactly one character.
The caller gets around the dict lookup of globals, so it would really want to
be fast.

valentin.gatienbaron edited the summary of this revision. (Show Details)Nov 19 2018, 1:33 PM
valentin.gatienbaron retitled this revision from hgignore: faster conversion from globs to regexp to hgignore: conversion glob into regex faster.
valentin.gatienbaron updated this revision to Diff 12564.

will be subsumed by a more thorough change