This is an archive of the discontinued Mercurial Phabricator instance.

[RFC] rust-matchers: add `Matcher` trait and implement `AlwaysMatcher`
ClosedPublic

Authored by Alphare on Oct 29 2019, 12:27 PM.

Details

Summary

In our quest of a faster Mercurial, we have arrived at the point where we need
to implement the matchers in Rust.
This RFC mainly for the Matcher trait to see if the changes proposed feel
fine to people with more experience on the matter. While the AlwaysMatcher
implementation is here as a trivial example, it should be the first step
towards matchers use in Rust as it is currently the only supported one.

Notable changes:

  • exact is renamed to exact_match
  • enums for visit* methods with Recursive instead of 'all', etc.
  • a new roots, separate from file_set
  • no bad, explicitdir or traversedir functions as they can be passed to the high functions instead of the matchers

Thanks to Martin for suggesting the last two (most important) changes and for
reaching out to help a few weeks ago.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Alphare created this revision.Oct 29 2019, 12:27 PM
martinvonz added inline comments.
rust/hg-core/src/matchers.rs
20–23

Could you add a comment explaining what each value means? (I *think* Recursive means to match all files recursively here, This means to visit this directory and all subdirectories, Empty means to not visit any subdirectories, and Set means to visit exactly that set.)

21–22

Is Empty needed as an optimization? Or could we just use an empty set?

26

It may be better to remove everything we're not sure we'll need to start with. That way it's easier for you to add new matchers and we won't have to clean up later if we never end up needing some of these things. Things I think we can remove to start with:

roots()
prefix()
anypats()

28

Drop the _ prefix from _filename?

29

Option<CollectionType<T>> is usually suspicious, IMO. Can we just return an empty set instead of None?

33

I think {''} (in Python syntax) makes more sense here.

36

Drop the _ prefix here too. Does the compiler really warn about unused arguments even in trait definitions?

36

Call it matches() instead?

37

Default to false like the Python version does?

40–50

Maybe we can try without this one? It doesn't seem like we need visit_dir() now that we have visit_children_set() and it seems to have a superset of the functionality. I asked @spectral about that and they thought we should only need the latter, except that it may be slower in some cases, since it needs to return a set of children to visit. However, it seems (to me) that that cost should be made up for by the speed gained by not visiting uninteresting subdirectories. So I would suggest not adding visit_dir() to start with and we can add it later if it seems useful in some case. We should also see if we can switch over existing uses of visit_dir() to use visit_children_set() in the Python code.

105

Should this be is_always() per Rust naming conventions?

113–123

I think is_always() and is_exact() are very often useful for optimizations, but I'm not sure about these two.

115

Similarly, should this be is_prefix()?

120

And here

131–133

Seems like a good default implementation. Put this in the trait instead?

I see a lot of the functions are here to give optimization hints. In order to make someone non-familiar with the code able to understand it each method should state the contracts that it is making. I am having a really hard time reconciling how the different functions interact and which methods have precedence over each other.

rust/hg-core/src/matchers.rs
29

Please document.

32

Please document.

105

always isn't an adjective so I don't think that really makes sense. How about matches_everything.

105

It is probably worth noting that false-negatives are fine but false-positives will lead to incorrect behaviour.

Alphare marked 6 inline comments as done.Oct 31 2019, 7:34 AM
Alphare updated this revision to Diff 17417.
Alphare added inline comments.Oct 31 2019, 9:39 AM
rust/hg-core/src/matchers.rs
21–22

I feel like Rust enums are really cheap to make and maintain, and since the original code hints at possible optimizations for this case I feel like it's worth it for readability. We can always back out of this if it turns out to be a pain

36

It does when it's implemented. matches() sounds good

From test-check-commit.t:

@@ -25,3 +25,10 @@
   >     fi
   >   done
   > fi
+  Revision e4f5ab3faa27 does not comply with rules
+  ------------------------------------------------------
+  6: summary line doesn't start with 'topic: '
+   [RFC] rust-matchers: add `Matcher` trait and implement `AlwaysMatcher`
+  6: summary keyword should be most user-relevant one-word command or topic
+   [RFC] rust-matchers: add `Matcher` trait and implement `AlwaysMatcher`
+

From test-check-commit.t:

@@ -25,3 +25,10 @@
   >     fi
   >   done
   > fi
+  Revision e4f5ab3faa27 does not comply with rules
+  ------------------------------------------------------
+  6: summary line doesn't start with 'topic: '
+   [RFC] rust-matchers: add `Matcher` trait and implement `AlwaysMatcher`
+  6: summary keyword should be most user-relevant one-word command or topic
+   [RFC] rust-matchers: add `Matcher` trait and implement `AlwaysMatcher`
+

Sorry, I didn't even see the [RFC] prefix, which is what the test was complaining about. I thought it didn't like the hyphen. I'll just drop the [RFC] in flight then.

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.
yuja added a subscriber: yuja.Nov 6 2019, 7:16 AM

+pub trait Matcher {
+ /// Explicitly listed files
+ fn file_set(&self) -> HashSet<&HgPath>;

[...]

+ / Matcher will match everything and files_set() will be empty:
+
/ optimization might be possible.
+ fn matches_everything(&self) -> bool {
+ false
+ }

Maybe better to not provide the default implementations since not a few
matchers will have to reimplement them.

+impl Matcher for AlwaysMatcher {
+ fn file_set(&self) -> HashSet<&HgPath> {
+ HashSet::new()
+ }
+
+ fn visit_children_set(
+ &self,
+ _directory: impl AsRef<HgPath>,
+ ) -> VisitChildrenSet {
+ VisitChildrenSet::Recursive
+ }

Need to implement matches() and matches_everything()?