This is an archive of the discontinued Mercurial Phabricator instance.

rust-node: handling binary Node prefix
ClosedPublic

Authored by gracinet on Jan 6 2020, 2:25 PM.

Details

Summary

Parallel to the inner signatures of the nodetree functions in
revlog.c, we'll have to handle prefixes of Node in binary
form.

Another motivation is that it allows to convert from full Node
references to NodePrefixRef without copy. This is expected to
be by far the most common case in practice.

There's a slight complication due to the fact that we'll be sometimes
interested in prefixes with an odd number of hexadecimal digits,
which translates in binary form by a last byte in which only the
highest weight 4 bits are considered. This is totally transparent for
callers and could be revised once we have proper means to measure
performance.

The C implementation does the same, passing the length in nybbles as
function arguments. Because Rust byte slices already have a length, we carry
the even/odd informaton as a boolean, to avoid introducing logical
redundancies and the related potential inconsistency bugs.

There are a few candidates for inlining here, but we refrain from
such premature optimizations, letting the compiler decide.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

gracinet created this revision.Jan 6 2020, 2:25 PM
martinvonz added inline comments.
rust/hg-core/src/revlog/node.rs
242–244

Did you consider using a full u8 for each nibble instead? It seems like that would be simpler. I'd appreciate it if you could spend a few minutes to try that (if you haven't already).

@martinvonz in this code, we're in competition with the C implementation, which does something similar.

Switching to a full u8 per nybble is more than a few minutes, it means changing the ownership model completely. Also, it introduces a new allocation and a copy instead of merely taking a reference.

So, it's more work and has performance impact that we have no means to measure.

The odd/even thing is not *that* complicated. It's very bad for readability in the C code, but it's nicely encapsulated in the Rust case and fully tested. I'd be more comfortable trying what you're suggesting once we have real-life measurements.

martinvonz added inline comments.Jan 21 2020, 12:55 PM
rust/hg-core/src/revlog/node.rs
259

Why not use (len + 1) / 2 as capacity?

269

Is this lifetime parameter needed?

316

What does the &* do? Specifically, what's different if you drop that part?

@martinvonz in this code, we're in competition with the C implementation, which does something similar.
Switching to a full u8 per nybble is more than a few minutes, it means changing the ownership model completely. Also, it introduces a new allocation and a copy instead of merely taking a reference.

Depends on the definition of NodePrefixRef. I don't think there would be any extra allocation if you define it like this:

pub enum NodePrefixRef<'a> {
    Full(&'a Node),
    Prefix(&'a NodePrefix),
}

So, it's more work and has performance impact that we have no means to measure.
The odd/even thing is not *that* complicated. It's very bad for readability in the C code, but it's nicely encapsulated in the Rust case and fully tested. I'd be more comfortable trying what you're suggesting once we have real-life measurements.

Sure, please revisit it when you can do measurements.

gracinet edited the summary of this revision. (Show Details)Jan 27 2020, 10:49 AM
gracinet updated this revision to Diff 19630.

Depends on the definition of NodePrefixRef. I don't think there would be any extra allocation if you define it like this:

pub enum NodePrefixRef<'a> {
    Full(&'a Node),
    Prefix(&'a NodePrefix),
}

That's an interesting idea, another possibility would be to define a trait (essentially for get_nybble) and implement it for &Node and &NodePrefix. We'll see when we're pass preliminary tests, thanks.

rust/hg-core/src/revlog/node.rs
259

Just thought 20 would be the simplest one-size-fit-all. With the preparations for different hash length (and potentially even somewhat dynamic), it's really obsolete now (switched to actual len based, yes).

269

ah yes indeed the compiler seems to be able to infer that &self and Nodeprefix.buf have the same lifteime, and that it must then be equal to the lifetime parameter of NodePrefix itself.

316

These are conversions that need to be explicit when the compiler doesn't insert them magically. Usually, I try to avoid them, but it can happen that they become non necessary in the final form after some changes.

Not needed in the new form with a real struct.

martinvonz accepted this revision.Jan 27 2020, 1:24 PM
martinvonz added inline comments.
rust/hg-core/src/revlog/node.rs
226

Should we check here that i < self.len()? I'm especially thinking of the case of an odd-length prefix where we would otherwise silently return 0.

283

The lifetime parameter doesn't seem to be used, so make it anonymous? (I.e., remove it from impl and use <'_> on the type.

This revision is now accepted and ready to land.Jan 27 2020, 1:24 PM
This revision was automatically updated to reflect the committed changes.
gracinet added inline comments.Jan 27 2020, 2:04 PM
rust/hg-core/src/revlog/node.rs
226

yes, indeed.

current callers from nodemap work either on full Nodes or (the visitor) by explicitely bounding with prefix.len(), but it's safer to protect it inconditonally.

I think a simple assert! is enough though: that's already what slicing does anyway.

martinvonz added inline comments.Jan 27 2020, 2:09 PM
rust/hg-core/src/revlog/node.rs
226

Yes, an assert! is probably what I had in mind. I've already queued this, so please send in a follow-up.