This is an archive of the discontinued Mercurial Phabricator instance.

merge: fix race that could cause wrong size in dirstate
ClosedPublic

Authored by valentin.gatienbaron on Jun 3 2019, 9:00 AM.

Details

Summary

The problem is that hg merge/update/etc work the following way:

  1. figure out what files to update
  2. apply the update to disk
  3. apply the update to in-memory dirstate
  4. write dirstate

where step3 looks at the filesystem and assumes it sees the result of
step2. If a file is changed between step2 and step3, step3 will record
incorrect information in the dirstate.

I avoid this by passing the size step3 needs directly from step2, for
the common path (not implemented for change/delete conflicts for
instance).

I didn't fix the same race for the exec bit for now, because it's less
likely to be problematic and I had trouble due to the fact that the
dirstate stores the permissions differently from the manifest (st_mode
vs '' 'l' 'x'), in combination with tests that pretend that symlinks
are not supported.

However, I moved the lstat from step3 to step2, which should tighten
the race window markedly, both for the exec bit and for the mtime.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I'm broadly happy with this, but had one point where I need clarification and I'd appreciate splitting the worker.py bit into its own change.

mercurial/merge.py
1805

I'm a little confused by this - it looks like we unconditionally set None for some filedatas? why?

mercurial/worker.py
86 ↗(On Diff #15327)

Could we make the worker.py change discrete? I think it merits its own commit even if it's in service of this change.

valentin.gatienbaron edited the summary of this revision. (Show Details)Jun 12 2019, 1:25 PM
valentin.gatienbaron updated this revision to Diff 15463.
martinvonz added inline comments.Jun 12 2019, 2:05 PM
mercurial/context.py
1769

Does overlayworkingfilectx also need to implement this function?

1806–1808

Was this necessary for this patch? Why?

Do you need to update the metadata in Phabricator to tell it that D6515 is its new parent?

Do you need to update the metadata in Phabricator to tell it that D6515 is its new parent?

Looks like that is needed. Note that as of today, @ on hg-committed's phabricator extension fixes this up for you when you phabsend ;)

mercurial/context.py
1806–1808

This change is needed to get the size of the written data in merge.batchget.

mercurial/merge.py
1805

It's simpler to have the invariant that every action has a filedata, so that's why I set something. Now the actual filedata information is not available here, because it's not that easy to make it available. These ACTION_GET can come from:

  • delete/change conflicts. In this case, the file contents is computed in filemerge, so one would need to pass around the filedata until the mergestate, or perhaps recompute it in the mergestate class
  • an extension calling the mergestate api to declare a file as clean

Putting None in the list makes the filedata be determined from disk as it always was at the base. So this is simply not fixing the race for these cases.

mercurial/context.py
1769

This is used only by rebase, and given that rebase seems to be very well tested and that the tests pass, empirically, the answer seems to be no. I think what's happening is that this method is called only when isinmemory is false (otherwise step 3 from the description doesn't happen and so we'd be adding calls to lstat where before there were none). The size returned by write is also used only when isinmemory is false.

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.

Try this:

hg co -C 4.9; hg debugrebuilddirstate; hg co 5.0; hg st

Before this patch, that produced an empty output as it should. After this patch, it lists hundreds of modified files (536 to be specific). A workaround is to add --config worker.enabled=no. Can you look into it? It's obviously a pretty serious bug that needs to be fixed before we release a new version.

Try this:
hg co -C 4.9; hg debugrebuilddirstate; hg co 5.0; hg st
Before this patch, that produced an empty output as it should. After this patch, it lists hundreds of modified files (536 to be specific). A workaround is to add --config worker.enabled=no. Can you look into it? It's obviously a pretty serious bug that needs to be fixed before we release a new version.

I forgot to say that what seems wrong in the dirstate is the size (probably other fields too, but size is probably the important one).

Try this:
hg co -C 4.9; hg debugrebuilddirstate; hg co 5.0; hg st
Before this patch, that produced an empty output as it should. After this patch, it lists hundreds of modified files (536 to be specific). A workaround is to add --config worker.enabled=no. Can you look into it? It's obviously a pretty serious bug that needs to be fixed before we release a new version.

I forgot to say that what seems wrong in the dirstate is the size (probably other fields too, but size is probably the important one).

Oops. I looked and the problem is clearly that the size/lstat for the dirstate are getting swapped between files, when over the parallelism threshold. I thought worker.py would split input arguments [0,1,2,3,4,5] into [[0,1,2], [3,4,5]] but it's actually [[0,2,4], [1,3,5]], so the lists of results are jumbled with parallelization.