rust: core implementation of missingancestors (no bindings)
Needs ReviewPublic

Authored by gracinet on Tue, Dec 4, 2:05 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

rust: iterator version of Graph.parents

This is handy for callers that want to simply do:

for p in graph.parents_iter(rev)

although one could argue that actually parents() should return an
array instead of a tuple, giving us a similar iterator for free (but on
references instead of values, unless we also use the arrayvec crate
could help). Notably, the current C-backed parents() internally uses an
array for communication with C code, so that currently, we'd get less memory
copy and less code using an array.

rust: translation of missingancestors

This is as direct as possible a translation of the ancestor.missingancestors
Python class in pure Rust. The goal for this changeset is to make it easy
to compare with the Python version.

We also add to Python tests the cases that helped us develop and debug
this implementation.

Some possible optimizations are marked along the way as TODO comments

rust: translated random test of missingancestors

An alternative would have been to expose to Python
MissingAncestors<VecGraphs> but that would have meant

  • pollution of the release build used from Python, whereas we do it in this changeset within the tests submodule
  • waiting on rust-cpython bindings to be ready or doing the cumbersome direct-ffi (more pollution with unsafe C code)

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage
gracinet created this revision.Tue, Dec 4, 2:05 PM

Overall looks good to me. Just a couple of points.

  • Using random numbers for tests without a seed that is logged will create failures which are basically impossible to reproduce.
  • A lot of the comments are comparing to the python, I don't know how useful this is to most readers.
rust/hg-core/src/ancestors.rs
155

I prefer doing bases: impl IntoIterator<Item = Revision> instead of using a where clause.

157

If this is going to be stored in the bases field I would just call it bases.

This will also allow you to do MissingAncestors { graph, bases } if you prefer.

159

Is always having an item in the set actually saving many corner cases? It seems like you usually check for empty sets anyways.

170

s/-1/NULL_REVISISON

195

I don't think this comment is helpful to the reader.

201

Isn't this loop redundant with the retain call above?

219

I find the pattern match and deref slightly confusing. I would prefer a ** to make it obvious you are doing a double deref.

235

I would prefer let (p0, p1) = .... This makes it obvious exactly what data you are getting.

262

I would simply call it revs.

281

Replace the match with .unwrap_or(NULL_REVISION)

286

If an inclusive range use 0..=start.

293

You may as well do this now. Just move the remove into the if.

309

Do this remove in the if as well.

484

Since you have used IntoIter you shouldn't need the vec!s.

486

Do this assignment on the same line as the declaration.

rust/hg-core/src/lib.rs
20

I think the better solution here is to make parents return [Revision; 2] (or &[Revision] and not return nulls.

26

Would something like the following be simpler?

fn parents_iter(
    &self,
    r: Revision,
) -> Result<impl Iterator<Item=Revision>, GraphError> {
    let (p0, p1) = self.parents(r)?;
    let iter = [p0, p1].into_iter()
        .map(|r| if r == NULL_REVISION { None} else { Some(r) });
    Ok(iter)
}
36

I don't think this comment is necessary. Also this should be quite transparent to the optimizer.

52

I would prefer putting the result in a variable and then doing the second match later.

I'll include your remark in the upcoming v2, thanks again !

rust/hg-core/src/ancestors.rs
159

Not sure about this, so for now, I've been playing safe, ie, exactly as the Python version.

rust/hg-core/src/lib.rs
20

I agree, it was a pythonism to use a tuple, but it has impact on AncestorsIterators, as well as the implementation in hg-direct-ffi, and in more code I've not submitted yet, so I'd prefer to do that change independently in a follow-up.

26

The reason I've preferred to implement it directly is that into_iter() iterates on references, which I found an unnecessary overhead,.

But maybe that's a case of premature optimization ? In the same vein, I don't think HashSet is the best choice for a set of i32, unless it becomes really big, but I don't know where the threshold would be, compared to, say, a bit array.

I think sticking close to the python version makes sense for the initial version. Then improvements can be made in follow-ups.

rust/hg-core/src/lib.rs
26

You can add .cloned() to remove the references. I would be surprised if this has poor code generation.

As for HashSet vs bit array I would stick to HashSet for now. HashSet also has the advantage of being sparse. I guess after the initial implementation we could benchmark the two to see what is better.

yuja added a subscriber: yuja.Thu, Dec 6, 8:57 AM

Quickly scanned, and looks generally good to me.

rust: iterator version of Graph.parents
rust: translation of missingancestors
rust: translated random test of missingancestors

Can you send these as separate patches?

An alternative would have been to expose to Python
MissingAncestors<VecGraphs> but that would have meant

- pollution of the release build used from Python,  whereas we do it in this changeset within the tests submodule
- waiting on rust-cpython bindings to be ready or doing the cumbersome direct-ffi (more pollution with unsafe C code)

Still I want some CPython interface to measure the perf win. Are there
lots of work remaining to bring rust-cpython to us?

although one could argue that actually parents() should return an
array instead of a tuple, giving us a similar iterator for free (but on
references instead of values, unless we also use the arrayvec crate
could help). Notably, the current C-backed parents() internally uses an
array for communication with C code, so that currently, we'd get less memory
copy and less code using an array.

I prefer changing parents() to return [Revision; 2]. Then, we can write a
simple utility function that drops NULL_REVISION, &[Revision; 2] -> &[Revision].

@yuja: do you mean one of those Differential Revisions of this system for each commit, sure I can do.

With respect to rust-cpython bindings, I'm currently waiting for feedback on https://github.com/dgrunwald/rust-cpython/issues/164
Perhaps you'd have an idea ?
Short summary: sporadic segfaults that I can for now reproduce only by running the whole test suite.

That being said, I do have rust-cpython bindings for AncestorsIterator and MissingAncestors (plus the full lazyancestors class). Unless I'm mistaken, that means that together with the existing C implementations, there's a potential for a full native version of mercurial/ancestor.py.
I can send them to the mailing-list or here. What would be the correct flag for that ? RFC ?

I'm also working on a perfmissingancestors (started a few hours ago). For now it confirms measurements I'd observed earlier: about x5 performance boost.

yuja added a comment.Thu, Dec 6, 10:06 AM
@yuja: do you mean one of those Differential Revisions of this system for each commit, sure I can do.

I don't know how Phabricator is working, but probably yes. I meant we want
D5370, D5371, and D5372 instead of a folded D5370. I think phabsend will
do that way.

With respect to rust-cpython bindings, I'm currently waiting for feedback on https://github.com/dgrunwald/rust-cpython/issues/164
Perhaps you'd have an idea ?

No, but it looks interesting. I'll take a look this weekend.