Page MenuHomePhabricator

treedirstate: add FileStore

Authored by mbthomas on Nov 14 2017, 12:39 PM.


Group Reviewers
Restricted Project
rFBHGXbbb86cb7c3c5: treedirstate: add FileStore

Adds FileStore, an implementation of the Store and StoreView traits that uses a
file on disk to store the data, and reads and writes blocks using file I/O.

Diff Detail

rFBHGX Facebook Mercurial Extensions
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mbthomas created this revision.Nov 14 2017, 12:39 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 14 2017, 12:39 PM
mbthomas updated this revision to Diff 3595.Nov 17 2017, 9:44 AM
jsgf requested changes to this revision.Nov 17 2017, 1:15 PM
jsgf added a subscriber: jsgf.
jsgf added inline comments.

Ditto comment from D1397 that usize isn't the right type for non-memory things.


This is not safe for multiple current writers, is it?


It's a pity there doesn't seem to be a standard equiv of pread/pwrite.


Does BufWriter gain you much here? I could imagine that if you have a workload of lots of back-to-back appends of small objects it could help, but otherwise it seems to just complicate things with the seeking.


I'm not very keen on keeping separate redundant state - this can always be computed fetching the file size, right?

At the very least you should access it via a method so that you can change the underlying implementation, and/or debug_assert that its always consistent with the file size.


Would it make sense to put this in the same refcell as the file?


It might be nice to have a way to explicitly open it RO, rather than just make it a fallback. (open_ro/open_rw/open)


Is .append(true) useful here? Probably not because you still need to track the filelen to get the keys and explicitly seek around for reads.


Check for truncated write?


Because you're doing things with set_len, I think it would be good to be explicit that it's Vec<u8>:

let mut buffer: Vec<u8> = ...

I tend to go with &mut buffer[..]

This revision now requires changes to proceed.Nov 17 2017, 1:15 PM
jsgf added inline comments.Nov 17 2017, 1:59 PM

Assert that data.len() fits into u32?

mbthomas added inline comments.Nov 20 2017, 12:15 PM

It makes writes a *lot* faster, particularly initial writes of the full dirstate which is a lots of back-to-back appends. I benchmarked it both with and without, and it's definitely worth the extra compexity.


Fetching the file size is a syscall, which is actually quite slow, and involves either seeking to the end, which messes up the caching, or stating the file, which is a bunch of extra work. The posix file I/O interface is actually pretty terrible.


I'm not sure. I assume that would make the definition:

file_data: RefCell<(BufWriter<File>, bool)>,

and then I'd need to unpack it each time as:

let (ref mut file, ref mut at_end) = self.file_data.get_mut();

I'm not sure that's any clearer, and it loses the name-based documentation of what at_end is.


Nothing needs that right now, so I don't think we should do it. It should be fairly simple to add if/when the need arises.


Yes, I decided to go for normal random-access I/O and control appending myself for this reason.


Actually, I should be using write_all to make sure it all gets written.

mbthomas updated this revision to Diff 3650.Nov 20 2017, 12:20 PM
mbthomas marked 15 inline comments as done.Nov 20 2017, 12:26 PM
mbthomas added inline comments.

No, but at the moment it's only accessed by Python code with the GIL, so it's single-threaded anyway.

durham added a subscriber: durham.Nov 20 2017, 3:16 PM
durham added inline comments.

Does rust have a way of making this a function of MAGIC? So we don't have to hard code the length.


Why pass open(&path) here, but open(path) in create?

stash added a subscriber: stash.Nov 21 2017, 5:21 AM
stash added inline comments.

Why not use Vec.resize() method and get rid of unsafe block?

mbthomas added inline comments.Nov 21 2017, 6:05 AM

Someone more familiar with Rust might have a better idea, but I think this won't be possible until const fn is stabilized (and as long as slice::len is marked as const).


Good spot. It should be &path in both places. It's required here because I use path twice. In create I only use it once so I get away with passing ownership.


Is the compiler smart enough to optimize the implicit (and unnecessary) memset away?

mbthomas updated this revision to Diff 3726.Nov 21 2017, 1:35 PM
stash added inline comments.Nov 22 2017, 4:49 AM

Just curious: what's the difference between description() and display() methods?


Have you decided not to add debug_asserts?


What surprises me is that read_only is set inside the open() function. I thought that user should set this as a parameter to open() function. And if this parameter read_only is false, then open method should fail if it can't open it for writing.

Or even have two separate methods - open_read_only() and open_read_write().

What do you think?


Not sure

mbthomas added inline comments.Nov 22 2017, 5:19 AM

The behaviour for open is to open read-write if possible, and read-only if the file is read-only. This gives the right behaviour for treedirstate, where it's possible to run commands that only read the dirstate in repos that you don't have write permission for. (There is a test for this, which is how I found it was necessary).

I'm not opposed to open_rw and open_ro methods, but I don't need them for the treedirstate implementation, so I haven't added them.

I'll make some notes for this in the doc comment for the function.

mbthomas added inline comments.Nov 22 2017, 5:25 AM

I will add one in append like this:

debug_assert!(self.position ==;
mbthomas updated this revision to Diff 3768.Nov 22 2017, 1:14 PM
jsgf accepted this revision.Nov 22 2017, 6:05 PM
jsgf added inline comments.

description() can only return &'static str, but display() returns String so can have variable bits in it.

It's pretty confusing and not at all clear that both are useful. It's one of the things that the new error handling package failure does away with (it only has Display).


You could go the other way and do something like:

const MAGIC_LEN: usize = 12;
const MAGIC: [u8; MAGIC_LEN] = *b"appendstore\n";

which will get a compilation error if they get out of sync.


You could define a little FileState structure.


Hm, I'd implement the "open tries RW then falls back to RO" explicitly as:

match open_rw(...) {
Ok(file) =>...,
Err(err) if err == EACCESS => open_ro(...),
Err(err) => return Err(err),

even if you don't need it externally.

This revision is now accepted and ready to land.Nov 22 2017, 6:05 PM
mbthomas updated this revision to Diff 3820.Nov 24 2017, 11:52 AM
mbthomas updated this revision to Diff 3839.Nov 24 2017, 2:50 PM
mbthomas updated this revision to Diff 3850.Nov 24 2017, 3:35 PM
durham accepted this revision.Nov 27 2017, 12:27 PM
This revision was automatically updated to reflect the committed changes.