This is an archive of the discontinued Mercurial Phabricator instance.

nodemap: implement nodemap in rust
ClosedPublic

Authored by quark on Nov 20 2017, 7:11 PM.
Tags
None
Subscribers

Details

Reviewers
durham
Group Reviewers
Restricted Project
Commits
rFBHGX5cb0d9198ff9: nodemap: implement nodemap in rust
Summary

A simple nodemap index (node -> rev, and prefix -> node). See comment in
the code for the actual format.

Note this does not cover every features of a Mecurial nodemap. For example,
it does not have __setitem__, __delitem__, and does not treat rev -1
specially. Those could be implemented in a higher level.

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Nov 20 2017, 7:11 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 20 2017, 7:11 PM
quark edited the summary of this revision. (Show Details)Nov 22 2017, 9:31 PM
quark updated this revision to Diff 3792.
quark updated this revision to Diff 3795.Nov 22 2017, 9:42 PM
durham requested changes to this revision.Nov 30 2017, 7:41 PM
durham added a subscriber: durham.

Mostly questions.

rust/indexes/src/nodemap.rs
41

Maybe mention that the side index is in-memory only. It says it in the diagram above, but not here.

70

So this code doesn't currently support inlined revlogs? Are are we just relying on the caller to turn an inlined revlog into a inmemory non-inlined [u8]? If it's just not supported anywhere, we should make that restriction obvious somewhere.

Also, s/NodeMap/NodeRevMap/ ?

81

from_le here?

118

Why 'cloned()' here?

136

Might be worth making the root node position a constant, instead of having magic '1' variables all over. Same for 0 for the side index root.

147

from_le, in line 156 too

167

Probably use a constant for the revlog entry width

172

I wonder if it's worth it to start putting these revlog accessor functions in their own file. To start building our revlog handling rust code for reuse outside of individual cases like this. Maybe by defining a RevlogIndex type as just a [u8] for now and putting a bunch of these function on it?

175

What does "forked" mean in this case?

196

more magic numbers

201

noob question: I thought this would return a reference to the slice of the buffer. But the lifetime isn't tied to the buffer, so that doesn't seem right. What am I missing?

215

Should the return None one be a panic or something? Since it should never happen right?

This revision now requires changes to proceed.Nov 30 2017, 7:41 PM
quark added inline comments.Nov 30 2017, 8:04 PM
rust/indexes/src/nodemap.rs
70

clindex does the inline check. It falls back to the old index if the revlog is inlined.

In theory we can convert inline revlog to non-inlined on the fly but I think it's cleaner to just skip supporting them.

81

Will do a scan for the entire series switching to be (for catching errors) and use from correctly.

118

The Rust iter() by default return references (&u8) for its elements. .cloned() converts those references to actual values (u8). It does not clone the iterator. It's like:

.map(|&x| x)
136

Good point.

172

A later patch does that. But the code here is in theory somehow "de-coupled" from the actual revlog format. i.e. other append-only format should also work.

175

Since main_index is read-only. It's copied to a Vec and then modifications are done to it. Maybe "copied on write".

201

The lifetime is implicitly tied. It's equivalent to:

fn rev_to_node<'a, K: AsRef<[u8]>>(changelog: &'a K, rev: KeyId) -> rerrors::Result<&'a [u8]>

The compiler knows that because there is only one reference type (changelog) in input arguments. If there are many, the compiler will complain.

215

IIRC, the Python code might feed all kinds of strange things like a revset string to paritalmatch. So it's also nicer to support that.

I checked the revlog.c behavior, it returns None for partialmatch('xxxx'). I also think it makes sense to return None (no match).

durham added inline comments.Dec 1 2017, 12:03 PM
rust/indexes/src/nodemap.rs
175

Hmm, just seems odd to mention that in the function description, since the function doesn't care if it's a fork or not. It just puts new entries in the index it was given.

201

So it will associate any return reference with whichever argument is a reference? That doesn't seem safe. Or is only because this slice is a slice of the changelog as_ref that it knows its related to the changelog?

quark added inline comments.Dec 1 2017, 1:29 PM
rust/indexes/src/nodemap.rs
175

This is O(len(index)) time complexity in theory, although memcpy without parsing is relatively fast. But there are no other places with that time complexity so I'd like to label that.

Shall we have more advanced ways to deal with filesystem confidently, that time complexity can be reduced.

201

It does both. First, desugar the code to the explicit lifetime version. Then run the borrow checker which knows the returned slice comes from changelog.

quark marked 4 inline comments as done.Dec 8 2017, 9:32 PM
quark updated this revision to Diff 4283.
quark added inline comments.Dec 8 2017, 9:54 PM
rust/indexes/src/nodemap.rs
175

I made a mistake. I thought the comment was at build_incrementally. Will change.

quark updated this revision to Diff 4369.Dec 11 2017, 7:20 PM
durham accepted this revision.Dec 14 2017, 7:21 PM
durham added inline comments.
rust/indexes/src/nodemap.rs
222

s/64/CHANGELOG_ENTRY_SIZE/

This revision is now accepted and ready to land.Dec 14 2017, 7:21 PM
quark updated this revision to Diff 4451.Dec 14 2017, 9:21 PM
quark updated this revision to Diff 4460.Dec 14 2017, 10:15 PM
This revision was automatically updated to reflect the committed changes.