Page MenuHomePhabricator

treedirstate: cache dirstate data when iterating all files
ClosedPublic

Authored by mbthomas on Nov 14 2017, 12:39 PM.

Details

Summary

Iterations over all files will cause most of the file to be read in a piecemeal
fashion. This will be inefficient on disks with slow seek times. Instead,
read the whole file into memory before iterating.

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

mbthomas created this revision.Nov 14 2017, 12:39 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 14 2017, 12:39 PM
mbthomas updated this revision to Diff 3599.Nov 17 2017, 9:45 AM
mbthomas updated this revision to Diff 3654.Nov 20 2017, 12:21 PM
durham added a subscriber: durham.Nov 21 2017, 12:44 AM
durham added inline comments.
rust/treedirstate/src/filestore.rs
105

Do we need to set at_end = false right after the seek to 0? So that if any of the lines below (in particular line 118) exit earlier, we aren't left with seek at 0 and at_end == true?

Or do we consider the whole store invalid if this returns error?

mbthomas added inline comments.Nov 21 2017, 6:38 AM
rust/treedirstate/src/filestore.rs
105

I think in practice if any of this fails we end up with an exception in Python and hg will exit, but we should keep it consistent in case someone later on decides to handle these errors.

durham added inline comments.Nov 21 2017, 10:45 AM
rust/treedirstate/src/filestore.rs
105

I think we want to explicitly not depend on process exit for clean up in the design of these new structures. We want daemon support as a first class citizen. Perhaps in this case it means if a caller ever receives an error, it should consider the structure invalid.

Is there any convenient short hand for something that can set self.is_invalid = true if a function returns an error? Kinda like what try/catch would allow.

mbthomas updated this revision to Diff 3729.Nov 21 2017, 1:35 PM
mbthomas added inline comments.Nov 21 2017, 1:50 PM
rust/treedirstate/src/filestore.rs
105

Yes, I think errors most likely invalidate the structure. If we're in the middle of doing a write_full for example, half the in-memory nodes will have IDs from the old store, and half will have IDs from the new store.

Result::map_err and Result::or_else can be used to perform clean-ups like this. I think at a higher level (maybe in Dirstate?) we'll eventually want to have a map_err(|e| { self.invalidate(); e } on all the public APIs.

mbthomas updated this revision to Diff 3770.Nov 22 2017, 1:14 PM
jsgf requested changes to this revision.Nov 22 2017, 6:45 PM
jsgf added a subscriber: jsgf.
jsgf added inline comments.
rust/treedirstate/src/filestore.rs
28–46

What about using BufReader to do the buffering. If you only want to do buffering sometimes you could switch file between being BufReader<BufWriter<File>> and BufWriter<File> with an enum.

That way you could be reasonably sure that the read and write sides are in sync all the time, rather than having to maintain a separate cache.

121–136

Should this invalidate the cache?

This revision now requires changes to proceed.Nov 22 2017, 6:45 PM
jsgf added a comment.Nov 22 2017, 6:48 PM

Hm, on second thoughts, does BufReader<BufWriter<File>> actually work? Suspect not.

In D1402#25140, @jsgf wrote:

Hm, on second thoughts, does BufReader<BufWriter<File>> actually work? Suspect not.

No, it doesn't. BufReader and BufWriter seem mutually exclusive (and are more about streaming I/O than files), which is a bit of a shame.

mbthomas added inline comments.Nov 23 2017, 7:20 AM
rust/treedirstate/src/filestore.rs
121–136

No need. The cache is still valid for all previous blocks (they're immutable), so we can keep it around.

mbthomas updated this revision to Diff 3823.Nov 24 2017, 11:53 AM
mbthomas updated this revision to Diff 3841.Nov 24 2017, 3:18 PM
durham accepted this revision.Nov 27 2017, 12:39 PM
This revision was automatically updated to reflect the committed changes.