Iterations over all files will cause most of the file to be read in a piecemeal
fashion. This will be inefficient on disks with slow seek times. Instead,
read the whole file into memory before iterating.
Details
- Reviewers
jsgf durham - Group Reviewers
Restricted Project - Commits
- rFBHGX3d30572a3984: treedirstate: cache dirstate data when iterating all files
Diff Detail
- Repository
- rFBHGX Facebook Mercurial Extensions
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
rust/treedirstate/src/filestore.rs | ||
---|---|---|
114 | Do we need to set at_end = false right after the seek to 0? So that if any of the lines below (in particular line 118) exit earlier, we aren't left with seek at 0 and at_end == true? Or do we consider the whole store invalid if this returns error? |
rust/treedirstate/src/filestore.rs | ||
---|---|---|
114 | I think in practice if any of this fails we end up with an exception in Python and hg will exit, but we should keep it consistent in case someone later on decides to handle these errors. |
rust/treedirstate/src/filestore.rs | ||
---|---|---|
114 | I think we want to explicitly not depend on process exit for clean up in the design of these new structures. We want daemon support as a first class citizen. Perhaps in this case it means if a caller ever receives an error, it should consider the structure invalid. Is there any convenient short hand for something that can set self.is_invalid = true if a function returns an error? Kinda like what try/catch would allow. |
rust/treedirstate/src/filestore.rs | ||
---|---|---|
114 | Yes, I think errors most likely invalidate the structure. If we're in the middle of doing a write_full for example, half the in-memory nodes will have IDs from the old store, and half will have IDs from the new store. Result::map_err and Result::or_else can be used to perform clean-ups like this. I think at a higher level (maybe in Dirstate?) we'll eventually want to have a map_err(|e| { self.invalidate(); e } on all the public APIs. |
rust/treedirstate/src/filestore.rs | ||
---|---|---|
29–47 | What about using BufReader to do the buffering. If you only want to do buffering sometimes you could switch file between being BufReader<BufWriter<File>> and BufWriter<File> with an enum. That way you could be reasonably sure that the read and write sides are in sync all the time, rather than having to maintain a separate cache. | |
131–148 | Should this invalidate the cache? |
No, it doesn't. BufReader and BufWriter seem mutually exclusive (and are more about streaming I/O than files), which is a bit of a shame.
rust/treedirstate/src/filestore.rs | ||
---|---|---|
131–148 | No need. The cache is still valid for all previous blocks (they're immutable), so we can keep it around. |
What about using BufReader to do the buffering. If you only want to do buffering sometimes you could switch file between being BufReader<BufWriter<File>> and BufWriter<File> with an enum.
That way you could be reasonably sure that the read and write sides are in sync all the time, rather than having to maintain a separate cache.