Page MenuHomePhabricator

rust-filepatterns: add a Rust implementation of pattern-related utils

Authored by Alphare on Apr 18 2019, 8:10 AM.



This change introduces Rust implementations of two functions related to
pattern handling, all located in

  • _regex
  • readpatternfile

These utils are useful in the long-term effort to improve hg status's
performance using Rust. Experimental work done by Valentin Gatien-Baron
shows very promising improvements, but is too different from the current
Mercurial core code structure to be used "as-is".
This is the first - albeit very small - step towards the code revamp
needed down the line.

Two dependencies were added: regex and lazy_static. Both of them
will be useful for a majority of the Rust code that will be written,
are well known and maintained either by the Rust core team, or by
very frequent contributors.

Diff Detail

rHG Mercurial
Lint Skipped
Unit Tests Skipped

Event Timeline

Alphare created this revision.Apr 18 2019, 8:10 AM
kevincox requested changes to this revision.Apr 20 2019, 4:21 PM
kevincox added inline comments.

Why not use If you are worried about performance you can use which is quite fast.


It is worth putting a comment that these are matched in order.


Put doc comments before the thing that they are documenting and use ///


I suspect this .iter() isn't required.

input.iter().skip(1).position(|b| *b == b']')

This function appears to be way more complicated then it needs to be. I have tried to highlight the changes in comments below but I have a full version at I haven't actually run the code but I think my version has the same functionality as this one.


If you are worried about performance you can use a buffer outside the loop and replace it with line.replace_range(.., line_str). This saves continuously reallocating the string.


This copy is unnecessary and having the lifetime start so early is confusing. The only time this is used is as a temporary in the syntax: case and to set line_syntax where it is always unchanged from the default. This variable should just be removed.


This is also unnecessary. Because of the above comment.


You can just call str methods on String because they implement Deref<str>.

line = line.as_ref().trim_end().to_string();

rel_syntax: &'static str so this can be Cow::Borrowed. Actually current_syntax is only ever set to static lifetime strings so you don't have to use Cow at all.


This clone is unnecessary it is always a reference to line from herein so you can just use &str.


Slight preference for format!(). In theory it can do clever things like checking argument sizes to only allocate once. Either way I think it looks nice.


I'm pretty sure this is redundant because the only time syntax is modified is in the syntax: case which A) Sets current_syntax and B) continues.


Mapping to an always-ok result is odd. You should just map to the parse_pattern_file_contents(). Then you don't need to use ? to unwrap the extra layer. However you then loose the error conversion which you can add back with .map_err(|e| e.into()).

Even better is just use the try for the read and call parse_pattern_file_contents at the function level.

f.read_to_string(&mut contents)?;
Ok(parse_pattern_file_contents(&contents, &file_path, warn))

This seems like overkill to me. I would either remove it or use pub struct LineNumber(pub usize) if you want to do this for type safety. However I don't think type safety is really needed here so I would just remove the alias.

This revision now requires changes to proceed.Apr 20 2019, 4:21 PM
Alphare marked 12 inline comments as done.Apr 24 2019, 5:49 AM
Alphare updated this revision to Diff 14904.
Alphare added inline comments.Apr 24 2019, 5:49 AM

I think I hit the same problem as the person in this issue: It's a hassle to escape bytes using the regex crate, and this felt like a good enough replacement.


It is required since repl is &&[u8], which is not an iterator.


This has a different behavior, since position expects a boolean return value in the closure for each element of the iterator.


While your version is just plain better, removing the regex-based comment escape breaks support for comments in test-hgignore.t, so this will still be necessary.


I would argue it's here for readability's sake. I used to have usize, but was advised to put an alias here because it would lessen mental overhead to read the function signatures. I'll remove it If you disagree.

kevincox accepted this revision.Apr 24 2019, 5:58 AM

Thanks for the changes.


Ah, I understand. This make sense then.


It is probably more clear to do *repl then to show that you are dereferencing it.


I don't understand. The example I showed would give a boolean. The only difference I see is that the returned value would be 1 less (because .position() wouldn't count the skipped element) but this should be easy to handle in the code and I think would be more clear overall.


Oops, I should have pointed out that I just commented that out because I don't have access to the regex crate on the website. It should still be included in the code.

This looks good now.

Alphare marked 13 inline comments as done.Apr 24 2019, 8:31 AM
Alphare updated this revision to Diff 14905.
Alphare added inline comments.Apr 24 2019, 8:32 AM

You're right, I was chasing the wrong problem. This is indeed more semantic.

durin42 accepted this revision.May 15 2019, 2:10 PM

Meta-comment: we should extract an hgignore crate (with minimal other deps) and publish that, because that'd be a step towards automatic hgignore support in ripgrep (burntsushi has previously indicated that if there was a maintained hgignore crate that it'd be a doable feature).

This revision is now accepted and ready to land.May 15 2019, 2:10 PM

Ugh, but I got conflicts in Cargo.lock - could you rebase and let me know when you're ready?

durin42 requested changes to this revision.May 15 2019, 2:11 PM

(forgot to tag as needing changes)

This revision now requires changes to proceed.May 15 2019, 2:11 PM
Alphare updated this revision to Diff 15141.May 16 2019, 11:21 AM
This revision was automatically updated to reflect the committed changes.
martinvonz added inline comments.

I get the following from HGWITHRUSTEXT= make local:

error[E0583]: file not found for module `filepatterns`
  --> hg-core/src/
21 | mod filepatterns;
   |     ^^^^^^^^^^^^
   = help: name the file either or filepatterns/ inside the directory "hg-core/src"

error: aborting due to previous error

For more information about this error, try `rustc --explain E0583`.
error: Could not compile `hg-core`.

Also, the command modifies the rust/Cargo.lock, which is really annoying. Is there a way to fix that? Or does it only happen when the build fails, so it's not really a problem?

martinvonz added inline comments.May 17 2019, 1:51 AM

I guess I'll drop this patch until that's fixed. Sorry.

Alphare added inline comments.May 17 2019, 4:54 AM

It looks like the file was dropped. Maybe during a histedit or other command, but it was definitely there when I sent my patch, and I can't reproduce this issue locally, the file exists. Do you know what happened?

martinvonz added inline comments.May 17 2019, 9:11 AM

Dropped from where? Where did the file use to be? It doesn't seem to be added by this patch here in Phabricator. I get the same compilation error on b05c1041de8f, which is the version before my histedit.

martinvonz added inline comments.May 17 2019, 9:14 AM

Ah, it was there before it got queued. I just found the History tab here. I'll see if I can convince phabread to get that version somehow.

Alphare added inline comments.May 17 2019, 9:15 AM

I'm not sure about how Phabricator handles this, but in, the file existed.

Alphare updated this revision to Diff 15169.May 17 2019, 10:16 AM