This is an archive of the discontinued Mercurial Phabricator instance.

sparse: lock the store when updating requirements config
ClosedPublic

Authored by aalekseyev on Nov 29 2021, 7:27 AM.

Diff Detail

Repository
rHG Mercurial
Branch
stable
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

aalekseyev created this revision.Nov 29 2021, 7:27 AM
This revision is now accepted and ready to land.Nov 29 2021, 7:47 AM
Alphare requested changes to this revision.Nov 30 2021, 10:06 AM
This revision now requires changes to proceed.Nov 30 2021, 10:06 AM

Wow, looks I just didn't run all tests here. Sorry!
Investigating this, I went pretty deep into a rabbit hole and may need rescuing.
The bug seems to go like this:

When cloning:

  • hg.py:868: lock is obtained and stored into destlock variable
  • a few lines later, the repo is actually created and the python object representing it is made by destpeer = peer(srcrepo, peeropts, dest) (which makes it unaware of the lock being held)
  • hg.py:1038: _update is called on a repo that still doesn't know that it's locked; since _update is where sparse hooks in, updateconfig runs and deadlocks

Note that a few lines later (hg.py:1041-1047) the repo actually "learns" that it's locked (in a racy way).

Intuitively it seems good to "tell" the repo that it's locked from the very start, when calling peer or immediately after. But since this is the first time I see all of this code I can't make that call.

Another confusing thing is that during the clone we don't take wlock at all (other than in sparse), even though it seems reasonable to expect everything to be locked during the clone.
Taking wlock after the lock is supposed to be wrong (deadlock-prone), and yet here we are effectively doing it (clone takes the lock from the very start, and then sparse does it during clone).

Any advice would be appreciated.

marmoute added a subscriber: marmoute.EditedDec 3 2021, 11:58 AM
  1. I agree that we should take the wlock in addition to the lock as we will be running an upgrade. (this should be a different commit, prior to the other one)
  1. the best way to get out of your trouble is to make the "new" peer (created after status) aware of the locking. Something like the logic below seems "suitable", Since this only happens in this specific cases, it seems like a bad idea to alter too much of the main peer API (we could still have a dedicated method on the repository for that). Same would have to be done for the wlock too

(I did not run this code, and it is not intended to be the final state.)

diff --git a/mercurial/hg.py b/mercurial/hg.py
--- a/mercurial/hg.py
+++ b/mercurial/hg.py
@@ -873,6 +873,13 @@ def clone(
             # we need to re-init the repo after manually copying the data
             # into it
             destpeer = peer(srcrepo, peeropts, dest)
+
+            # make the peer aware that is it already locked
+            #
+            # important:
+            #
+            #    We still need to release that lock at the end of the function
+            destpeer.local()._lockref = weakref.ref(destlock)
             srcrepo.hook(
                 b'outgoing', source=b'clone', node=srcrepo.nodeconstants.nullhex
             )
aalekseyev updated this revision to Diff 31323.Dec 6 2021, 2:05 PM
aalekseyev added a comment.EditedDec 6 2021, 2:07 PM

I tried this, and ran into another problem: dirstate is linked to wlock, so if you share a wlock then you must also share the dirstate.
I wrote a patch that does this, but I can't tell how safe that is. All tests pass, at least.

@marmoute, please have a look at the patch.

aalekseyev updated this revision to Diff 31324.Dec 6 2021, 2:09 PM
marmoute accepted this revision.Dec 16 2021, 8:38 AM

I tried this, and ran into another problem: dirstate is linked to wlock, so if you share a wlock then you must also share the dirstate.

I see, the wlock's releasefn refer to the original dirstate.

I wrote a patch that does this, but I can't tell how safe that is. All tests pass, at least.

That seems "fine", it test are happy, lets try this route.

@marmoute, please have a look at the patch.

Alphare accepted this revision.Dec 17 2021, 7:59 AM

We all came to the same conclusion then. This is a hack, but it's very localized and clearly documented.

This revision is now accepted and ready to land.Dec 17 2021, 7:59 AM