This is an archive of the discontinued Mercurial Phabricator instance.

rust-node: binary Node ID and conversion utilities
ClosedPublic

Authored by gracinet on Jan 6 2020, 2:25 PM.

Details

Summary

The choice of type makes sure that a Node has the exact
wanted size. We'll use a different type for prefixes.

Added dependency: hexadecimal conversion relies on the
hex crate.

The fact that sooner or later Mercurial is going to need
to change its hash sizes has been taken strongly in
consideration:

  • the hash length is a constant, but that is not directly exposed to callers. Changing the value of that constant is the only thing to do to change the hash length (even in unit tests)
  • the code could be adapted to support several sizes of hashes, if that turned out to be useful. To that effect, only the size of a given Node is exposed in the public API.
  • callers not involved in initial computation, I/O and FFI are able to operate without a priori assumptions on the hash size. The traits FromHex and ToHex have not been directly implemented, so that the doc-comments explaining these restrictions would stay really visible in cargo doc

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

gracinet created this revision.Jan 6 2020, 2:25 PM
gracinet updated this revision to Diff 19323.Jan 15 2020, 6:47 PM
kevincox accepted this revision.Jan 16 2020, 5:50 AM
kevincox added inline comments.
rust/hg-core/src/revlog/node.rs
15

I would recommend making this a proper type because almost no one should be poking at the bytes of the hash.

struct Node([u8; 20]);

You can make the field pub if necessary.

32

I find progressing through the string easier to understand than this slicing. WDYT.

for byte in node.iter_mut() {
    *byte = u8::from_str_radix(&hex[..2], 16)?;
    hex = &hex[2..];
}
40

If we want to avoid a number of allocations we can do:

pub fn node_to_hex(n: &Node) -> String {
    let mut r = String::with_capacity(n.len() * 2);
    for byte in n {
        write!(r, "{:02x}", byte).unwrap();
    }
    debug_assert_eq!(r.len(), n.len() * 2);
    r
}

The generated code for write!() doesn't look great but if hex printing shows up in benchmarks it would be trivial to write a custom hex-formatter.

martinvonz requested changes to this revision.Jan 17 2020, 8:24 PM
martinvonz added a subscriber: martinvonz.
martinvonz added inline comments.
rust/hg-core/src/dirstate/dirs_multiset.rs
111 ↗(On Diff #19323)

please revert unrelated change

rust/hg-core/src/revlog/node.rs
32

Or even use hex::decode() from the hex crate? Can also use hex::encode() in node_to_hex if we're okay with that.

42–52

This feels like something that will only be used in nodemap.rs, so put it there instead? Or do you think (or know) that it will be used elsewhere soon?

This revision now requires changes to proceed.Jan 17 2020, 8:24 PM
martinvonz added inline comments.Jan 18 2020, 5:57 PM
rust/hg-core/src/revlog/node.rs
15

Oh, I think @durin42 and others were trying to remove the assumption that nodeids are 20 bytes (so we can use other hashes and hash lengths).

Yes, I'd appreciate not adding any assumptions that nodes are 20 bytes.

@durin42 @martinvonz we might have a nice path forward with this, with the suggestion of @kevincox to make Node a truly defined type, not just an alias, with a private member for the bytes.

Indeed I find it important to know if a node information is complete or merely a prefix. With a truly defined type, the assumption that it's 20 bytes long would be completely encapsulated in the Node type. It would then be very easy (much easier than with Python code) to change it to something else, whether that'd be a longer bag of bytes, or a struct with two values, an enum with alternatives for sha-1 or sha-2 etc.

@durin42 @martinvonz we might have a nice path forward with this, with the suggestion of @kevincox to make Node a truly defined type, not just an alias, with a private member for the bytes.

Sounds good. I think I would have preferred that anyway.

gracinet retitled this revision from rust-node: binary Node and conversion utilities to rust-node: binary Node ID and conversion utilities.Jan 22 2020, 3:11 PM
gracinet edited the summary of this revision. (Show Details)
gracinet updated this revision to Diff 19521.
gracinet marked 6 inline comments as done.Jan 22 2020, 3:27 PM

As the new commit description explains, I've done all I could to make the change of hash format easier

I should add that my other colleagues at Octobus seem also to be involved in the improvement of hashing, there's no risk we would forget this one.

rust/hg-core/src/dirstate/dirs_multiset.rs
111 ↗(On Diff #19323)

ah yes, sorry must have been a rebase artifact

rust/hg-core/src/revlog/node.rs
32

Yes, hex is nice, it's able to produce arrays without further copies and does length checks. Its error type does not repeat the input string, though, which in my experience is really very useful to understand problems when they occur.

hex does not support hexadecimal string representations with odd numbers of
digits, which will be easy to work around in the next patch.

40

Finally used the hex crate here too.

42–52

Well it's become a public method in the new Node struct.
NodePrefix will be another user.

I tend to prefer encapsulation over where it's used for these choices. This way, there's no tempation for nodemap.rs to play with the binary data directly.

gracinet marked 3 inline comments as done.Jan 22 2020, 3:52 PM
This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.