Page MenuHomePhabricator

treedirstate: add Tree

Authored by mbthomas on Nov 14 2017, 12:39 PM.


Group Reviewers
Restricted Project
rFBHGX95cb381f49cd: treedirstate: add Tree

Adds Tree, an implementation of a dirstate tree.

Diff Detail

rFBHGX Facebook Mercurial Extensions
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mbthomas created this revision.Nov 14 2017, 12:39 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 14 2017, 12:39 PM
mbthomas updated this revision to Diff 3596.Nov 17 2017, 9:44 AM
jsgf added a subscriber: jsgf.Nov 17 2017, 2:31 PM
jsgf added inline comments.

This could be Copy too.


This is a unicode character. Do you mean u8? What are the actual meaningful values this can have?

Edit: looks like this should be its own enum type that implements Storable.


What does -ve mean for these? Is 31 bits enough for mtime?


Could you use a serde-derived implementation? Or do you not want to take a dependency on serde?


This is imported by the prelude.


I found this a little confusing. Would it be more accurate to say that implementations of this trait do no storing directly, but they're responsible for serializing/deserializing to/from a byte stream, and its the Store that's doing the actual storing?


List seems misleading for a map.


Does this need two independent lifetimes? If one is always the same as or shorter than the other, then you can just use one.


This is equiv aside from handling the last key byte being '/', which would need a second check to return (key, None) if the result is (_, []).

  let mut bits = key.splitn(2, |c| *c == b'/');
  ("empty bits!?"),

&key[..index + 1] is fine. rustfmt might want to put spaces around ...


Looks like a name should have its own distinct type and implement Storable.


This would be cleaner if BlockId were a newtype and had its own implementation of Storable.


I'd push this up to the top:

let id ="Node must have valid ID...");

then just put the rest of the code at the function level.


I'd still .expect().


Same comment as above - push the non-populated check to the top, then make the rest of the code at function level.


I'd flip this around by using .is_none(), since that's the interesting case here.




Rather than duplicating the Ok(...) everywhere, which I think obscures the real logic, it would be better to have something like:

let res = if let Some(path) = path {

jsgf requested changes to this revision.Nov 17 2017, 2:32 PM
This revision now requires changes to proceed.Nov 17 2017, 2:32 PM
jsgf added a comment.Nov 17 2017, 2:33 PM

I stopped following the code around PathRecurse handling - I assume its implementing a chunk of Mercurial-defined logic.

mbthomas added inline comments.Nov 20 2017, 12:19 PM

Mercurial uses a character to represent the state of the file (actually it uses a string, but it only uses one character in those strings). Current values are 'r', 'n', 'a' and 'm', but future changes to Mercurial may add new characters, so I don't want to create an enum here that might cause compatibility problems with those changes.

I'm limiting it to a single ASCII character, so maybe u8 would be better here.


Mercurial uses -ve sizes as sentinel values for special meanings. Unix time is signed, so -ve mtimes just mean before 1970, which is valid even if it is unusual. I switch to VLQ values in a later diff, so the 32-bit limitation won't be a problem.


I looked at serde, but it seemed like it did way more than I needed, and then didn't really support this use-case (small chunks of binary data, lazily loaded). I don't think it's really needed here, as this isn't an interchange format.



Really the trait is supposed to represent the constraints on things that can be the Value type in the tree. The name is supposed to suggest "something that can be put in a Store".

Would TreeValue be a better name?


They do need to be separate lifetimes. One is the lifetime of the nodes within the tree, the other is the lifetime of the KeyRef we get passed in by various functions, which are usually different objects with competing requirements on their lifetime constraints. If I try to make them the same lifetime, the compiler complains that there are mutually exclusive conditions that can't be satisfied.


It's not quite right as I need to include the '/' in the first return value. The reason being that we include the trailing / in directory names so that file/directory clashes are allowed to exist (the dirstate, particularly during merges, can include information about both a file called foo and a file called foo/bar, and I need to handle this).


Name's type is always Vec[u8], as we're interested in filesystem paths, and functions like path_recurse expect and require this. I could type parameterise it, but it seemed like excessive genericness to me.


Storable is meant to encapsulate "can be the right hand side of the map type", rather than just a generic "can be serialized" trait. I'll try to make that clearer in the comments.

I've made BlockId a newtype, but arguably what I *should* be doing here is extending Read and Write to add new methods write_block_id, read_block_id. I've not done that, as it seems like lots of work for what is largely an internal implementation detail. Let me know if you think it would be valuable.

mbthomas updated this revision to Diff 3651.Nov 20 2017, 12:20 PM
mbthomas marked 18 inline comments as done.Nov 20 2017, 12:28 PM
durham added a subscriber: durham.Nov 20 2017, 6:22 PM
durham added inline comments.

If we load a node, then immediately write it, are we rewriting it's id in place? Does that mean that node instance is no longer useful for reading against the old store?

mbthomas added inline comments.Nov 21 2017, 6:10 AM

Yes, a node is only valid in one store at a time. The write_full method has the side-effect of moving all entries in the tree from the old store to the new store. This is how we repack.

I shall beef up the comments on Tree::write_full to make it clear that it has this effect.

stash added a subscriber: stash.Nov 22 2017, 5:58 AM
stash added inline comments.

But all the compatibility problems should be caught by unittests, shouldn't they?


What -ve are you talking about :) ?


Why not i64?

mbthomas added inline comments.Nov 22 2017, 12:00 PM

-ve is "negative". Sorry for the obscure abbreviation.


Memory usage, mostly. Mercurial currently uses an i32, so we're no worse with this limit. The file format will later be changed to VLQ, so the on-disk format is safe for future expansion should we need to store >2GB source files.

mbthomas added inline comments.Nov 22 2017, 12:03 PM

Possibly not. If someone adds an obscure state in core Hg, there may not be a test with treedirstate enabled that triggers it. I think it's safer to stick to the established dirstate API and implement it fully.

jsgf accepted this revision.Nov 22 2017, 6:33 PM

I think the encode/decode should be factored out more, but I'll leave that up to you.


My main concern is mixing the serialization format details with other logic. If you're hand-coding what the on-disk format looks like in the first place, it would be best to have that in completely separate functions from the other logic that operates on those types/fields.

In practice I think using a trait to represent the property of "can be serialized/stored/etc" and hiding those details in the implementations of its methods is a nice way to do that, which composes well if you have well-defined types for each part of your structure which can implement that trait (even if they're just very thin newtype wrappers around more mundane types).

Since you already have the Storable trait which seems to be the right thing, I think you may as well take advantage of it.

This revision is now accepted and ready to land.Nov 22 2017, 6:33 PM
mbthomas updated this revision to Diff 3821.Nov 24 2017, 11:52 AM
mbthomas added inline comments.Nov 24 2017, 12:07 PM

I've refactored this out in D1510. Can you take a look at that and see if it addresses your concerns?

mbthomas updated this revision to Diff 3840.Nov 24 2017, 3:18 PM
mbthomas updated this revision to Diff 3851.Nov 24 2017, 3:35 PM
This revision was automatically updated to reflect the committed changes.