This is an archive of the discontinued Mercurial Phabricator instance.

pull: fix inconsistent view of bookmarks during pull (issue4700)
ClosedPublic

Authored by valentin.gatienbaron on Dec 18 2018, 8:58 AM.

Details

Summary

I had a share where a pull apparently pulled a bookmark but not the
revision pointed to by the bookmark, which I suspect is due to this
(and if not, we might as well remove known issues in this area).

I do this by combining doing all the queries that could read the
bookmarks in one round trip.

I had to change the handling of the case where the server doesn't
support the lookup query, because if it fails, it would otherwise make
fremotebookmark.result() block forever. This is due to
wireprotov1peer.peerexecutor.sendcommands's behavior (it fills a
single future if any query fails synchronously and leaves all other
futures unchanged), but I don't know if the fix is to cancel all other
futures, or to keep going with the other queries.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

revs = [] # actually, nodes

+ if other.capable('lookupns'):
+ def lookupns(e, r):
+ return e.callcommand('lookupns', {'key': r}).result()
+ else:
+ def lookupns(e, r):
+ return e.callcommand('lookup', {'key': r}).result(), ''

for r in oldrevs:
    with other.commandexecutor() as e:
  • node = e.callcommand('lookup', {'key': r}).result() -

+ node, ns = lookupns(e, r)
+ if ns == 'bookmarks':
+ if r in remotebookmarks():
+ node = remotebookmarks()[r]

I'm not an expert, but I feel it's wrong to rely on client for data
consistency. Can't we somehow make the peer serve a "snapshot" of the
repository for the entire session? @indygreg Any updates in the v2 protocol
regarding this?

I went this way in part because it's fairly simple change, and might be useful in other circumstances (to interpret pulling a tag as actually pulling the tag itself perhaps, though I don't know concretely how that would work).

AFAIK, the protocol is never stateful (not sure how that'd work with http), so you can't rely on a notion of session-level state on the server to guarantee consistency.
I can see two plausible alternatives:

  1. doing all the lookup queries and the bookmarks query in a single roundtrip. Is it guaranteed that a group of batchable queries will be sent in one go and won't be split up (if there are many of them for instance)?
  2. tweak the new lookup query to be instead lookup(name, bookmark=node), which would mean: lookup(name) as if there exist a bookmark called name pointing to node. So kind of the snapshot idea, where you give the server the relevant part of the snapshot.
yuja added a comment.Dec 20 2018, 9:12 AM
AFAIK, the protocol is never stateful (not sure how that'd work with http), so you can't rely on a notion of session-level state on the server to guarantee consistency.

True. I think the new v2 protocol will eventually get around the issue
since it will be more properly batched. @indygreg ?

I can see two plausible alternatives:
1. doing all the lookup queries and the bookmarks query in a single roundtrip. Is it guaranteed that a group of batchable queries will be sent in one go and won't be split up (if there are many of them for instance)?
2. tweak the new lookup query to be instead `lookup(name, bookmark=node)`, which would mean: `lookup(name)` as if there exist a bookmark called `name` pointing to `node`. So kind of the snapshot idea, where you give the server the relevant part of the snapshot.
  1. do lookup() again at the end of the transaction to detect race, and abort

If the race doesn't occur frequently, I think it's okay to discard the pulled
data.

valentin.gatienbaron edited the summary of this revision. (Show Details)Dec 20 2018, 10:49 PM
valentin.gatienbaron updated this revision to Diff 12938.

I'd prefer not to have unpredictable aborts where, when they happen, the solution is "try again". A year or two ago, the "remote heads changed during push" errors were like that, and they were annoying.

I tried the batching approach, and it seems to work fine, and doesn't require protocol changes. Based on reading the code, batch of rpcs never get split up. Perhaps one concern is whether this limits the amount of -r arguments that can be passed when using http.

yuja added a comment.Dec 23 2018, 10:08 PM

Queued, thanks.

I have a concern about compatibility with ancient Mercurial. Can you check it
and send a follow-up as needed?

I had to change the handling of the case where the server doesn't
support the lookup query, because if it fails, it would otherwise make
fremotebookmark.result() block forever. This is due to
wireprotov1peer.peerexecutor.sendcommands's behavior (it fills a
single future if any query fails synchronously and leaves all other
futures unchanged), but I don't know if the fix is to cancel all other
futures, or to keep going with the other queries.

@indygreg

+ if opts['bookmark'] or revs:
+ # The list of bookmark used here is the same used to actually update
+ # the bookmark names, to avoid the race from issue 4689 and we do
+ # all lookup and bookmark queries in one go so they see the same
+ # version of the server state (issue 4700).
+ nodes = []
+ fnodes = []
+ revs = revs or []
+ if revs and not other.capable('lookup'):
+ err = _("other repository doesn't support revision lookup, "
+ "so a rev cannot be specified.")
+ raise error.Abort(err)
+ with other.commandexecutor() as e:
+ fremotebookmarks = e.callcommand('listkeys', {
+ 'namespace': 'bookmarks'
+ })
+ for r in revs:
+ fnodes.append(e.callcommand('lookup', {'key': r}))

IIRC, listkeys is a newer command than lookup. If the peer doesn't support
listkeys, I suspect this batch query would fail. In that case, maybe listkeys
has to be skipped if the peer doesn't support it and if --bookmark is not
specified.

This revision was automatically updated to reflect the committed changes.

Thanks!

IIRC, listkeys is a newer command than lookup. If the peer doesn't support listkeys, I suspect this batch query would fail. In that case, maybe listkeys has to be skipped if the peer doesn't support it and if --bookmark is not specified.

listkeys shouldn't need compatibility check in the caller, because it's defined like this in wireprotov1peer.py (https://www.mercurial-scm.org/repo/hg-committed/file/tip/mercurial/wireprotov1peer.py#l381):

 @batchable
 def listkeys(self, namespace):
     if not self.capable('pushkey'):
         yield {}, None
...
yuja added a comment.Dec 24 2018, 3:54 AM
> IIRC, listkeys is a newer command than lookup. If the peer doesn't support listkeys, I suspect this batch query would fail. In that case, maybe listkeys has to be skipped if the peer doesn't support it and if --bookmark is not specified.
listkeys shouldn't need compatibility check in the caller, because it's defined like this in wireprotov1peer.py (https://www.mercurial-scm.org/repo/hg-committed/file/tip/mercurial/wireprotov1peer.py#l381):
   @batchable
   def listkeys(self, namespace):
       if not self.capable('pushkey'):
           yield {}, None

Yeah, but the batch executor appears not handle such cases.

https://www.mercurial-scm.org/repo/hg-committed/file/tip/mercurial/wireprotov1peer.py#l241