( )⚙ D11097 dirstate-v2: Add heuristic for when to create a new data file

This is an archive of the discontinued Mercurial Phabricator instance.

dirstate-v2: Add heuristic for when to create a new data file
ClosedPublic

Authored by SimonSapin on Jul 15 2021, 11:28 AM.

Details

Summary

… instead of appending to the existing one. This is based on keeping track
of how much of the existing data is not used anymore.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

SimonSapin created this revision.Jul 15 2021, 11:28 AM
SimonSapin retitled this revision from dirstate-v2: Add heuristic for when to a new data file to dirstate-v2: Add heuristic for when to create a new data file.Jul 15 2021, 1:11 PM
SimonSapin updated this revision to Diff 29292.
baymax updated this revision to Diff 29322.Jul 16 2021, 5:39 AM

✅ refresh by Heptapod after a successful CI run (🐙 💚)

marmoute requested changes to this revision.Jul 16 2021, 6:19 AM
marmoute added a subscriber: marmoute.
marmoute added inline comments.
rust/hg-core/src/dirstate_tree/dirstate_map.rs
52–53

Maybe point out in the commentthat this is an estimate, not an exact value ?

131

We should document this, (and point out this is an estimate)

488

We should put this ratio in some constant (at least) to make its value clearer and easier to configure.

rust/hg-core/src/dirstate_tree/on_disk.rs
76

same feedback about being clear this is an estimate.

This revision now requires changes to proceed.Jul 16 2021, 6:19 AM

At the moment I *think* unreachable_bytes should be exact. This might change if we change the serializer to reuse paths in more than one node each (within the same "version" of the tree): then when removing a node we wouldn’t know if that path is still reachable from somewhere else. (Unless we add some kind of reference-counting for paths, but I think it’s not worth it.)

SimonSapin updated this revision to Diff 29337.Jul 16 2021, 8:37 AM

At the moment I *think* unreachable_bytes should be exact. This might change if we change the serializer to reuse paths in more than one node each (within the same "version" of the tree): then when removing a node we wouldn’t know if that path is still reachable from somewhere else. (Unless we add some kind of reference-counting for paths, but I think it’s not worth it.)

Probably let's just say that it might become an estimate later and that it's not that important anyway. The change looks good to me otherwise

baymax updated this revision to Diff 29354.Jul 16 2021, 5:45 PM

✅ refresh by Heptapod after a successful CI run (🐙 💚)

baymax updated this revision to Diff 29613.Jul 20 2021, 3:42 PM

✅ refresh by Heptapod after a successful CI run (🐙 💚)

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.