( )⚙ D11094 dirstate-v2: Support appending to the same data file

This is an archive of the discontinued Mercurial Phabricator instance.

dirstate-v2: Support appending to the same data file
ClosedPublic

Authored by SimonSapin on Jul 15 2021, 11:28 AM.

Details

Summary

For now we’re still writing the entire data every time, so appending is not
useful yet. Later we’ll have new nodes pointing to some existing data for
nodes and paths that haven’t changed.

The decision whether to append is pseudo-random in order to make tests exercise
both code paths. This will be replaced by a heuristic based on the amount
of unused existing data.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

SimonSapin created this revision.Jul 15 2021, 11:28 AM
marmoute added inline comments.
mercurial/dirstatemap.py
670

I don't think we should use 'a' here:

  1. it is buggy on some version of python // os and still requires a seel
  2. if the file is too long, we should care seek at the right location and simply overwrite existing data.
SimonSapin added inline comments.Jul 16 2021, 3:52 AM
mercurial/dirstatemap.py
670

I inlined append from vfs.py in order to be able to add the assert with .tell(). Is vfs.py also buggy?

Silently overwriting extra data seems very wrong to me. If someone else wrote it, overwriting it would interfere with them. However due to the combination of repofilecache + wlock, my understanding is that there should never be extra data.

baymax updated this revision to Diff 29319.Jul 16 2021, 5:38 AM

✅ refresh by Heptapod after a successful CI run (🐙 💚)

marmoute added inline comments.Jul 16 2021, 6:10 AM
mercurial/dirstatemap.py
670

I am refering to the issue pointed here: https://www.mercurial-scm.org/repo/hg/file/stable/mercurial/revlog.py#l2427

So vfs.append is possibly buggy :-/

There should never be extra data, except if we decide this is a good idea in some case to being friendlier to mmap user. For example, we might decide to do so in hg rollback or decide that truncate less transaction rollback is useful.
So, even if this is corner case, having code robust in this case seems better.

rust/hg-core/src/dirstate_tree/on_disk.rs
666

spotted on the go "curent_offset" instead of "current_offset"

SimonSapin updated this revision to Diff 29334.Jul 16 2021, 8:37 AM

On top of what @marmoute said and just so it's said somewhere, I think it'd be nice to be explicit in documenting the new format by saying that the "append-only" nature of the format, even when looking at a single file, must not be relied on by code outside of the read/write core dirstate, since it's not always append-only.

baymax updated this revision to Diff 29351.Jul 16 2021, 5:45 PM

✅ refresh by Heptapod after a successful CI run (🐙 💚)

baymax updated this revision to Diff 29610.Jul 20 2021, 3:41 PM

✅ refresh by Heptapod after a successful CI run (🐙 💚)

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.