Page MenuHomePhabricator

treedirstate: add vecmap implementation

Authored by mbthomas on Nov 14 2017, 12:39 PM.


Group Reviewers
Restricted Project
rFBHGX3d6ea6138e86: treedirstate: add vecmap implementation

This adds an implementation of an ordered map that uses a vector pairs, sorted
by the key.

This is largely compatibly with std::collections::BTreeMap, but has performance
characteristics more suited for use in treedirstate.

Diff Detail

rFBHGX Facebook Mercurial Extensions
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mbthomas created this revision.Nov 14 2017, 12:39 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 14 2017, 12:39 PM
durham added a subscriber: durham.Nov 15 2017, 12:24 PM

Mostly noob questions


Maybe a rust-noob question, but why is '?Sized' needed here?


Is this a standard function? If not, maybe a doc comment explaining what 'start' means. Took me a couple reads of the code to realize.


Could we have returned if start { 0 } else { self.vec.len() } for this instead? Seems like that would hide the special case from the caller. Or do you intentionally return None because it allows us to detect the special case and do Iter(self.vec.iter()) without slicing later?


Since self.vec.binary_search_by(|e| e.0.borrow().cmp(q)) is a common pattern (I see it at least 3 times above), should there be a function to do that for us? I think it'd clean up the one dense line in each of these functions.


rust-noob question: What is self.0 here? I thought self was an Iter instance, and I thought self.0 was something used on a tuple instance. So I'm a little confused.

Also, I thought .next() would return the next value. But I thought .map() would only operate on some sort of generator or collection?

mbthomas added inline comments.Nov 15 2017, 12:55 PM

Q is allowed to be a reference to a generic type for which the size is not known (e.g. a trait type). The important thing is that it's a thing that can be compared with a borrowed Key. This allows storing String or Vec[u8] keys, but then accessing/removing the entries using a &str or &[u8] borrowed value from elsewhere.

Rust defaults to assuming Sized for type parameters, so you have to remove it when you don't need it.


No, this is a private function to make implementing range and range_mut simpler.


That would probably make sense and make the other function simpler. I'll rework it.


I'm not sure how easy it will be to extract (there's a lot of type inferencing going on here), but I'll give it a shot.


vecmap::Iter is a wrapper around std::slice::Iter (see line 15) that convert its API to one that works in a map-like way. The difference is subtle:

std::slice::Iter for a Vec<Item> yields Option<&Item>, which is Option<&(K, V)> for this module. To be able to iterate over the key/value pairs, we need to convert this to Option<(&K, &V)>. This is more important for IterMut where we want to map Option<&mut (K, V)> to Option<(&K, &mut V)> to prevent the caller mutating the key.

To answer the first question, vecmap::Iter is a 1-tuple containing a std::slice::Iter, so we access the inner iter through self.0. I think this is a common Rust pattern.

For the second question, .next() returns Option<&(K, V)>, which we want to map to Option<(&K, &V)>. The map call here is over the Option type.

mbthomas updated this revision to Diff 3566.Nov 16 2017, 11:42 AM
mbthomas marked 10 inline comments as done.Nov 16 2017, 11:43 AM
mbthomas updated this revision to Diff 3593.Nov 17 2017, 9:44 AM
jsgf requested changes to this revision.Nov 17 2017, 12:30 PM
jsgf added a subscriber: jsgf.

Nits mostly.


"that value is >replaced and< returned" for maximum clarity.


I prefer to avoid putting mut v: V into signatures, because its a little confusing for readers - it's not obvious that mut has local effect and doesn't affect the function's interface.


pub fn insert(&mut self, k: K, v: V) -> Option<V> {
    let mut v = v;

Is this common enough to just always do as part of insert()?


How does this look if you just split it into separate range_index_start/range_index_end rather than toggling on start? It would still have some amount of duplication of the overall function and the calls to find_index, but it might be clearer in the end.

Or alternatively take an instance of RangeArgument and handle both at once, returning (usize, usize).


RangeArgument? It's already implemented for (Bound<T>, Bound<T>), but also allows using ../..= notation.


I prefer:

fn next(&mut self) -> Option<Self::Item> {

to be very explicit.


Pattern matching might be a little clearer: .map(|&(ref k, ref v)| (k, v))




ditto pattern match

This revision now requires changes to proceed.Nov 17 2017, 12:30 PM
mbthomas added inline comments.Nov 20 2017, 12:19 PM

RangeArgument is nightly-only experimental still. I'm trying to keep to stable rust features.

mbthomas updated this revision to Diff 3648.Nov 20 2017, 12:20 PM
mbthomas marked 9 inline comments as done.Nov 20 2017, 12:22 PM
mbthomas added inline comments.

Possibly. I'll see if I can think of a way to measure it.

stash accepted this revision.Nov 21 2017, 5:12 AM
jsgf accepted this revision.Nov 22 2017, 5:47 PM
This revision is now accepted and ready to land.Nov 22 2017, 5:47 PM
mbthomas updated this revision to Diff 3818.Nov 24 2017, 11:52 AM
durham accepted this revision.Nov 27 2017, 12:23 PM
This revision was automatically updated to reflect the committed changes.