match: support rooted globs in hgignore
ClosedPublic

Authored by valentin.gatienbaron on Jan 4 2019, 3:34 PM.

Details

Summary

In a .hgignore, "glob:foo" always means "/foo". This cannot be
avoided because there is no syntax like "^" in regexes to say you
don't want the implied "
/" (of course one can use regexes, but glob
syntax is nice).

When you have a long list of fairly specific globs like
path/to/some/thing, this has two consequences:

  1. unintended files may be ignored (not too common though)
  2. matching performance can suffer significantly Here is vanilla hg status timing on a private repository:

    Using syntax:glob everywhere real 0m2.199s user 0m1.545s sys 0m0.619s

    When rooting the appropriate globs real 0m1.434s user 0m0.847s sys 0m0.565s

    (tangentially, none of this shows up in --profile's output. It seems that C code doesn't play well with profiling)

The code already supports this but there is no syntax to make use of
it, so it seems reasonable to create such syntax. I create a new
hgignore syntax "rootglob".

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
yuja added subscribers: foozy, yuja.Jan 9 2019, 8:50 AM
The code already supports this but there is no syntax to make use of
it, so it seems reasonable to create such syntax. I create a new
hgignore syntax "rootedglob". There might be a better name, but
"rooted" is the terminology in use in user-facing documentation.

Perhaps, it can be called a "rootglob".

https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089604.html

I don't remember how the discussion ended, but since we have new "rootfilesin"
matcher, "root<whatever>" should be the new naming convention.

+ syntaxes = collections.OrderedDict([
+ ('re', 'relre:'),
+ ('regexp', 'relre:'),
+ ('glob', 'relglob:'),
+ ('rootedglob', 'glob:'), # after 'glob' line, so glob:a means relglob:a

Can you add a new matcher kind 'rootglob'? I understand 'glob' can be abused
for .hgignore, but it's the pattern relative to cwd.

valentin.gatienbaron edited the summary of this revision. (Show Details)

Sorry, it took me a while to come back to this. I renamed the syntax to rootglob.

I haven't handled your other remark, however I noticed that globs in .gitignore can be rooted [1]. I had looked for a precedent for syntax for rooting a glob last week without success. Now that I see this, their choice looks sensible enough to me, so I think I'll try that, as I think that'd be nicer overall, and easier to implement (and side-step your remark about abusing the glob syntax).

[1] From https://git-scm.com/docs/gitignore: A leading slash matches the beginning of the pathname. For example, "/*.c" matches "cat-file.c" but not "mozilla-sha1/sha1.c".

yuja added a comment.Jan 16 2019, 8:24 AM

I haven't handled your other remark,

I think rootglob: can be easily added. As we already have cwd-relative
glob:, we'll just need some "if"s to not rewrite a repo.root-relative path
to cwd-relative path.

This patch works, but it's technically incorrect since glob: isn't a
repo.root-relative pattern. That's my concern.

however I noticed that globs in .gitignore can be rooted [1]. I had looked for a precedent for syntax for rooting a glob last week without success. Now that I see this, their choice looks sensible enough to me, so I think I'll try that, as I think that'd be nicer overall, and easier to implement (and side-step your remark about abusing the glob syntax).

[1] From https://git-scm.com/docs/gitignore: A leading slash matches the beginning of the pathname. For example, "/*.c" matches "cat-file.c" but not "mozilla-sha1/sha1.c".

Do you mean you're planning to copy the git's syntax? I'm -1 on it because
it would look quite different from the other hg's pattern syntax.

Alright, I think I did what you asked for.

yuja added a comment.Jan 17 2019, 8:17 AM

Queued this, many thanks.

This revision was automatically updated to reflect the committed changes.