rust-filepatterns: add a Rust implementation of pattern-related utils
ClosedPublic

Authored by Alphare on Apr 18 2019, 8:10 AM.

Details

Summary

This change introduces Rust implementations of two functions related to
pattern handling, all located in match.py:

  • _regex
  • readpatternfile

These utils are useful in the long-term effort to improve hg status's
performance using Rust. Experimental work done by Valentin Gatien-Baron
shows very promising improvements, but is too different from the current
Mercurial core code structure to be used "as-is".
This is the first - albeit very small - step towards the code revamp
needed down the line.

Two dependencies were added: regex and lazy_static. Both of them
will be useful for a majority of the Rust code that will be written,
are well known and maintained either by the Rust core team, or by
very frequent contributors.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped
Alphare created this revision.Apr 18 2019, 8:10 AM
kevincox requested changes to this revision.Apr 20 2019, 4:21 PM
kevincox added inline comments.
rust/hg-core/src/filepatterns.rs
19

Why not use https://docs.rs/regex/1.1.6/regex/fn.escape.html? If you are worried about performance you can use https://docs.rs/regex-syntax/0.6.6/regex_syntax/fn.escape_into.html which is quite fast.

22

It is worth putting a comment that these are matched in order.

30

Put doc comments before the thing that they are documenting and use ///

52

I suspect this .iter() isn't required.

62
input.iter().skip(1).position(|b| *b == b']')
227

This function appears to be way more complicated then it needs to be. I have tried to highlight the changes in comments below but I have a full version at https://rust.godbolt.org/z/08MrDP. I haven't actually run the code but I think my version has the same functionality as this one.

240

If you are worried about performance you can use a buffer outside the loop and replace it with line.replace_range(.., line_str). This saves continuously reallocating the string.

241

This copy is unnecessary and having the lifetime start so early is confusing. The only time this is used is as a temporary in the syntax: case and to set line_syntax where it is always unchanged from the default. This variable should just be removed.

242

This is also unnecessary. Because of the above comment.

250

You can just call str methods on String because they implement Deref<str>.

line = line.as_ref().trim_end().to_string();
260

rel_syntax: &'static str so this can be Cow::Borrowed. Actually current_syntax is only ever set to static lifetime strings so you don't have to use Cow at all.

268

This clone is unnecessary it is always a reference to line from herein so you can just use &str.

283

Slight preference for format!(). In theory it can do clever things like checking argument sizes to only allocate once. Either way I think it looks nice.

287

I'm pretty sure this is redundant because the only time syntax is modified is in the syntax: case which A) Sets current_syntax and B) continues.

300

Mapping to an always-ok result is odd. You should just map to the parse_pattern_file_contents(). Then you don't need to use ? to unwrap the extra layer. However you then loose the error conversion which you can add back with .map_err(|e| e.into()).

https://rust.godbolt.org/z/Mflpga

Even better is just use the try for the read and call parse_pattern_file_contents at the function level.

f.read_to_string(&mut contents)?;
Ok(parse_pattern_file_contents(&contents, &file_path, warn))
rust/hg-core/src/lib.rs
53

This seems like overkill to me. I would either remove it or use pub struct LineNumber(pub usize) if you want to do this for type safety. However I don't think type safety is really needed here so I would just remove the alias.

This revision now requires changes to proceed.Apr 20 2019, 4:21 PM
Alphare updated this revision to Diff 14904.Wed, Apr 24, 5:49 AM
Alphare marked 12 inline comments as done.
Alphare added inline comments.Wed, Apr 24, 5:49 AM
rust/hg-core/src/filepatterns.rs
19

I think I hit the same problem as the person in this issue: https://github.com/rust-lang/regex/issues/451. It's a hassle to escape bytes using the regex crate, and this felt like a good enough replacement.

52

It is required since repl is &&[u8], which is not an iterator.

62

This has a different behavior, since position expects a boolean return value in the closure for each element of the iterator.

227

While your version is just plain better, removing the regex-based comment escape breaks support for comments in test-hgignore.t, so this will still be necessary.

rust/hg-core/src/lib.rs
53

I would argue it's here for readability's sake. I used to have usize, but was advised to put an alias here because it would lessen mental overhead to read the function signatures. I'll remove it If you disagree.

kevincox accepted this revision.Wed, Apr 24, 5:58 AM

Thanks for the changes.

rust/hg-core/src/filepatterns.rs
19

Ah, I understand. This make sense then.

52

It is probably more clear to do *repl then to show that you are dereferencing it.

62

I don't understand. The example I showed would give a boolean. The only difference I see is that the returned value would be 1 less (because .position() wouldn't count the skipped element) but this should be easy to handle in the code and I think would be more clear overall.

227

Oops, I should have pointed out that I just commented that out because I don't have access to the regex crate on the website. It should still be included in the code.

This looks good now.

Alphare updated this revision to Diff 14905.Wed, Apr 24, 8:31 AM
Alphare marked 13 inline comments as done.
Alphare added inline comments.Wed, Apr 24, 8:32 AM
rust/hg-core/src/filepatterns.rs
62

You're right, I was chasing the wrong problem. This is indeed more semantic.

durin42 accepted this revision.Wed, May 15, 2:10 PM

Meta-comment: we should extract an hgignore crate (with minimal other deps) and publish that, because that'd be a step towards automatic hgignore support in ripgrep (burntsushi has previously indicated that if there was a maintained hgignore crate that it'd be a doable feature).

This revision is now accepted and ready to land.Wed, May 15, 2:10 PM

Ugh, but I got conflicts in Cargo.lock - could you rebase and let me know when you're ready?

durin42 requested changes to this revision.Wed, May 15, 2:11 PM

(forgot to tag as needing changes)

This revision now requires changes to proceed.Wed, May 15, 2:11 PM
Alphare updated this revision to Diff 15141.Thu, May 16, 11:21 AM
This revision was automatically updated to reflect the committed changes.
martinvonz added inline comments.
rust/hg-core/src/lib.rs
21

I get the following from HGWITHRUSTEXT= make local:

error[E0583]: file not found for module `filepatterns`
  --> hg-core/src/lib.rs:21:5
   |
21 | mod filepatterns;
   |     ^^^^^^^^^^^^
   |
   = help: name the file either filepatterns.rs or filepatterns/mod.rs inside the directory "hg-core/src"

error: aborting due to previous error

For more information about this error, try `rustc --explain E0583`.
error: Could not compile `hg-core`.

Also, the command modifies the rust/Cargo.lock, which is really annoying. Is there a way to fix that? Or does it only happen when the build fails, so it's not really a problem?

martinvonz added inline comments.Fri, May 17, 1:51 AM
rust/hg-core/src/lib.rs
21

I guess I'll drop this patch until that's fixed. Sorry.

Alphare added inline comments.Fri, May 17, 4:54 AM
rust/hg-core/src/lib.rs
21

It looks like the filepatterns.rs file was dropped. Maybe during a histedit or other command, but it was definitely there when I sent my patch, and I can't reproduce this issue locally, the file exists. Do you know what happened?

martinvonz added inline comments.Fri, May 17, 9:11 AM
rust/hg-core/src/lib.rs
21

Dropped from where? Where did the file use to be? It doesn't seem to be added by this patch here in Phabricator. I get the same compilation error on b05c1041de8f, which is the version before my histedit.

martinvonz added inline comments.Fri, May 17, 9:14 AM
rust/hg-core/src/lib.rs
21

Ah, it was there before it got queued. I just found the History tab here. I'll see if I can convince phabread to get that version somehow.

Alphare added inline comments.Fri, May 17, 9:15 AM
rust/hg-core/src/lib.rs
21

I'm not sure about how Phabricator handles this, but in https://phab.mercurial-scm.org/D6271?id=15141, the filepatterns.rs file existed.

Alphare updated this revision to Diff 15169.Fri, May 17, 10:16 AM