This is an archive of the discontinued Mercurial Phabricator instance.

nodemap: test that concurrent process don't see the pending transaction
ClosedPublic

Authored by marmoute on Feb 28 2020, 1:55 PM.

Details

Summary

We don't want other client to read uncommitted data, until the transaction is
really committed.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

marmoute created this revision.Feb 28 2020, 1:55 PM

The parent patch (D8189) was a bit controversial. We don't need that patch if we apply the following patch on top of this one (making this test sleep-free):

diff --git a/tests/test-persistent-nodemap.t b/tests/test-persistent-nodemap.t
--- a/tests/test-persistent-nodemap.t
+++ b/tests/test-persistent-nodemap.t
@@ -320,17 +320,20 @@ An up to date nodemap should be availabl

 Another process does not see the pending nodemap content during run.

-  $ PATH=$RUNTESTDIR/testlib/:$PATH
+  $ PATH=$PWD:$RUNTESTDIR/testlib/:$PATH
+  $ cat >> log_nodemap <<EOS
+  > unset HG_PENDING
+  > hg debugnodemap --metadata > "\$1"
+  > EOS
+  $ chmod +x log_nodemap
   $ echo qpoasp > a
   $ hg ci -m a2 \
-  > --config "hooks.pretxnclose=wait-on-file 20 sync-repo-read sync-txn-pending" \
-  > --config "hooks.txnclose=touch sync-txn-close" > output.txt 2>&1 &
+  > --config "hooks.pretxnclose=log_nodemap pretxn-output" \
+  > --config "hooks.txnclose=log_nodemap posttxn-output"

 (read the repository while the commit transaction is pending)

-  $ wait-on-file 20 sync-txn-pending && \
-  > hg debugnodemap --metadata && \
-  > wait-on-file 20 sync-txn-close sync-repo-read
+  $ cat pretxn-output
   uid: ???????????????? (glob)
   tip-rev: 5004
   tip-node: ba87cd9559559e4b91b28cb140d003985315e031
@@ -340,7 +343,7 @@ Another process does not see the pending
   data-unused: 192 (pure !)
   data-unused: 192 (rust !)
   data-unused: 0 (no-pure no-rust !)
-  $ hg debugnodemap --metadata
+  $ cat posttxn-output
   uid: ???????????????? (glob)
   tip-rev: 5005
   tip-node: bae4d45c759e30f1cb1a40e1382cf0e0414154db
@@ -350,6 +353,3 @@ Another process does not see the pending
   data-unused: 448 (pure !)
   data-unused: 448 (rust !)
   data-unused: 0 (no-pure no-rust !)
-
-  $ cat output.txt
-

@durin42, what do you think?

@durin42, what do you think?

That's *inspired* and looks like an awesome solution to me.

The parent patch (D8189) was a bit controversial. We don't need that patch if we apply the following patch on top of this one (making this test sleep-free):

diff --git a/tests/test-persistent-nodemap.t b/tests/test-persistent-nodemap.t
--- a/tests/test-persistent-nodemap.t
+++ b/tests/test-persistent-nodemap.t
@@ -320,17 +320,20 @@ An up to date nodemap should be availabl
 Another process does not see the pending nodemap content during run.
-  $ PATH=$RUNTESTDIR/testlib/:$PATH
+  $ PATH=$PWD:$RUNTESTDIR/testlib/:$PATH
+  $ cat >> log_nodemap <<EOS
+  > unset HG_PENDING
+  > hg debugnodemap --metadata > "\$1"
+  > EOS
+  $ chmod +x log_nodemap
   $ echo qpoasp > a
   $ hg ci -m a2 \
-  > --config "hooks.pretxnclose=wait-on-file 20 sync-repo-read sync-txn-pending" \
-  > --config "hooks.txnclose=touch sync-txn-close" > output.txt 2>&1 &
+  > --config "hooks.pretxnclose=log_nodemap pretxn-output" \
+  > --config "hooks.txnclose=log_nodemap posttxn-output"
 (read the repository while the commit transaction is pending)
-  $ wait-on-file 20 sync-txn-pending && \
-  > hg debugnodemap --metadata && \
-  > wait-on-file 20 sync-txn-close sync-repo-read
+  $ cat pretxn-output
   uid: ???????????????? (glob)
   tip-rev: 5004
   tip-node: ba87cd9559559e4b91b28cb140d003985315e031
@@ -340,7 +343,7 @@ Another process does not see the pending
   data-unused: 192 (pure !)
   data-unused: 192 (rust !)
   data-unused: 0 (no-pure no-rust !)
-  $ hg debugnodemap --metadata
+  $ cat posttxn-output
   uid: ???????????????? (glob)
   tip-rev: 5005
   tip-node: bae4d45c759e30f1cb1a40e1382cf0e0414154db
@@ -350,6 +353,3 @@ Another process does not see the pending
   data-unused: 448 (pure !)
   data-unused: 448 (rust !)
   data-unused: 0 (no-pure no-rust !)
-
-  $ cat output.txt
-

This new patch works, but I don't like the idea of not testing two fully independant process. Running a process through hooks is quite different and who know how that semantic could evolve.

The feature provided by the wait-on-file script is useful and the approach have been sucessfully used for a while in the Mercurial codebase (eg: test-push-race.t, test-bookmarks-corner-case.t, …). and extensions. The need for such fine grained synchronisation will appears again, and that script will be needed again. So I would rather use that script with the current semantic (and whaterver reasonable update are requested to the implementation) and keep two fully independant processes in that tests.

As a note, I have been fixing a lot of test flackyness in the past years, maybe of them create by sleep only based synchronisation (so not the current approach). The existing example of "synchronisation of file with much longer timeout" (see previous list) have never been an issue.

If anyone has a more elegant solution to this *general* problem, I would be happy to use it instead, but so far, this it the best reasonable option I have up my sleeve.

This new patch works, but I don't like the idea of not testing two fully independant process. Running a process through hooks is quite different and who know how that semantic could evolve.

Can we wrap the hg invocation in the script in enough env -u VAR business that we can know it won't see the pending transaction? That seems like it ought to be eminently doable. In fact, it might even merit an environment variable of its own, eg HG_IGNORE_PENDING_TXN=1 hg ... for use in other scripts.

The feature provided by the wait-on-file script is useful and the approach have been sucessfully used for a while in the Mercurial codebase (eg: test-push-race.t, test-bookmarks-corner-case.t, …). and extensions. The need for such fine grained synchronisation will appears again, and that script will be needed again. So I would rather use that script with the current semantic (and whaterver reasonable update are requested to the implementation) and keep two fully independant processes in that tests.

I'm not terribly swayed by this argument: I've invested nonzero time in *fixing* flakes that use this (anti-)pattern, so I'd like to avoid proliferating it. Encoding it as a helper script pushes it towards being an acceptable pattern that folks will use without critical thought. :(

As a note, I have been fixing a lot of test flackyness in the past years, maybe of them create by sleep only based synchronisation (so not the current approach). The existing example of "synchronisation of file with much longer timeout" (see previous list) have never been an issue.

I'm pretty sure I've spent time (see above) in fixing flakes of this nature.

If anyone has a more elegant solution to this *general* problem, I would be happy to use it instead, but so far, this it the best reasonable option I have up my sleeve.

That's kind of the counter-argument though: rather than a one-size-fits-all approach, I'd rather we thought more about getting the right synchronization primitives in place on a case-by-case basis until a non-sleepy pattern emerges that we can really live with.

This new patch works, but I don't like the idea of not testing two fully independant process. Running a process through hooks is quite different and who know how that semantic could evolve.

Can we wrap the hg invocation in the script in enough env -u VAR business that we can know it won't see the pending transaction? That seems like it ought to be eminently doable. In fact, it might even merit an environment variable of its own, eg HG_IGNORE_PENDING_TXN=1 hg ... for use in other scripts.

Invocation of independant process will always be easier to trust dans something spawn by the process we are racing with. In addition, hooks are not as fine grained as we needs for some of the race condition we have to checks. So inter-process signaling will be needed and so far doing this through the file system has proven a solid and reliable approach. So having a generic solution for this kind of synchronisation and getting an official helper for it seems valuable. Since the approach is suitable here, I don't see a reason not to use it.

The feature provided by the wait-on-file script is useful and the approach have been sucessfully used for a while in the Mercurial codebase (eg: test-push-race.t, test-bookmarks-corner-case.t, …). and extensions. The need for such fine grained synchronisation will appears again, and that script will be needed again. So I would rather use that script with the current semantic (and whaterver reasonable update are requested to the implementation) and keep two fully independant processes in that tests.

I'm not terribly swayed by this argument: I've invested nonzero time in *fixing* flakes that use this (anti-)pattern, so I'd like to avoid proliferating it. Encoding it as a helper script pushes it towards being an acceptable pattern that folks will use without critical thought. :(

I am not aware of case where this pattern (communication between concurrent process using the file system) was problematic. Do you have any example to point to?

As far as I understand, you main concern with the current script was the use of some (quite long) timeout to catch up stall situation. You suggested to use the global test timeout, a solution currently unsuitable because of issue6125 (and probably less developper friendly overall). Having a standard script to do process synchronisation through the FS (An approach perfectly fine as far as I know) mean we could adjust all users in one go (eg: timeout adjustement depending on the global test timeout, removal of the timeout, etc…).

I think I'm coming around on this. I've poked a few reviewers in private to look at the first three patches in this series to see how they feel, but if I don't hear back in a day or two I think I'll just push the whole stack...

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.