( )⚙ D1351 changegroup: use any node, not min(), in treemanifest's generatemanifests

This is an archive of the discontinued Mercurial Phabricator instance.

changegroup: use any node, not min(), in treemanifest's generatemanifests
ClosedPublic

Authored by spectral on Nov 8 2017, 10:17 PM.

Details

Summary

This is fixing quadratic behavior, which is probably not noticeable in the
common case, but if a very large directory gets added here, it can get pretty
bad. This was noticed because we had some pushes that spent >25s in changegroup
generation calling min() here, according to profiling.

The original reasoning for min() being used in 829d369fc5a8 was that, at that
point in the series, we were adding almost everything to tmfnodes during the
first iteration through the loop , so we needed to avoid sending child
directories before parents. Later changes made it so that the child directories
were added only when we visited the parent directory (not all of them on the
first iteration), so this is no longer necessary - there won't be any child
directories in tmfnodes before the parents have been sent.

This does mean that the manifests are now exchanged unordered, whereas
previously we would essentially do [a, b, b/c, b/c/d, e], we now can send a, b,
and e in any order; b/c must still follow b, and b/c/d must still follow b/c.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

spectral created this revision.Nov 8 2017, 10:17 PM

As the author of that line of code, this looks good to me. I'll let someone else queue it since I was involved in the internal discussion and I don't want to feel like I'm queuing my own code (even though I wasn't involved in the actual coding).

Should I be concerned about the lack of test fallout? This new behavior is non-deterministic. Do we not have testing for this code or is existing testing not low-level enough to uncover behavior changes resulting from this change?

mercurial/changegroup.py
737

Can we use `dict.popitem()` instead? That will pop a random key-value pair. I just don't know if the key needs to remain in the dict until later in the function...

Should I be concerned about the lack of test fallout? This new behavior is non-deterministic. Do we not have testing for this code or is existing testing not low-level enough to uncover behavior changes resulting from this change?

I'm not worried about it. It only makes a difference for treemanifests, and the change is just about the order in changegroup and (therefore) the order in which we write revlogs. We don't have much testing of treemanifests and we rarely check the exact format in a changegroup. We do print out a debug message on line 482 that could be used to see a difference before and after this patch, but I just checked that test-treemanifest.t doesn't pass --verbose. Still, I wouldn't mind if some Facebooker tried to run their treemanifest tests (in hgexperimental) against a version with this patch applied.

mercurial/changegroup.py
737

Good idea. I'm pretty sure that should be safe (but perhaps tests will prove me wrong).

spectral added inline comments.Nov 10 2017, 12:47 AM
mercurial/changegroup.py
737

I think that it needs to remain, makelookupmflinknode(dir) relies on it (L716). I haven't attempted to popitem and pass that to makelookupmflinknode instead, let me try that out now...

martinvonz added inline comments.Nov 10 2017, 12:51 AM
mercurial/changegroup.py
737

Oh, yuck. Can perhaps pass "nodes" into makelookupmflinknode() along with "dir"? It will still update it on line 721, but at least one of those unclear sideeffects go away.

spectral updated this revision to Diff 3388.Nov 10 2017, 1:32 AM
spectral marked 4 inline comments as done.Nov 10 2017, 1:41 AM

run-tests.py found no issues with this version, my manual testing (using PYTHONHASHSEED to adjust the ordering) also encountered no issues

mercurial/changegroup.py
737

profiling seems to indicate this is a little faster as well as being a bit cleaner, so thanks for making me check :)

indygreg accepted this revision.Nov 10 2017, 1:46 AM
This revision is now accepted and ready to land.Nov 10 2017, 1:46 AM
This revision was automatically updated to reflect the committed changes.