Page MenuHomePhabricator

branchmap: avoid ancestor computations in absence of non-continous branches

Authored by joerg.sonnenberger on Dec 18 2020, 8:45 AM.



The branchhead computation is one of the more heavy operations for
bigger repositories as it has to scan all changesets and potentially
involves the expensive computation of the ancestor sets. Redo the
computation to handle the common cases directly and use tighter
conditions for when the ancestor scan is necessary. Most importantly,
avoid it completely if the non-continous branches are processed in one
update as seen in the initial computation after a clone.

For the Mercurial repository, it gives a small 2-3% performance boost.
For the NetBSD test repository, it cuts the time in half.

Diff Detail

rHG Mercurial
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

joerg.sonnenberger retitled this revision from branchmap: avoid ancestor computations for absent non-continous branches to branchmap: avoid ancestor computations in absence of non-continous branches.Dec 18 2020, 8:52 AM
joerg.sonnenberger updated this revision to Diff 24393.
marmoute requested changes to this revision.Dec 18 2020, 9:01 AM
marmoute added a subscriber: marmoute.

I did not had time to actually look at the code, but from IRC discussion with Joerg. I suspect this code breaks on the following graph:

| \
|  B1
|  |
A2 A3
| /

We need to make sure they are properly tested (which could be the case already)

This revision now requires changes to proceed.Dec 18 2020, 9:01 AM

The logic looks sound.

I think we can push this optimization further around topological heads.

  • Topological heads does not need to be put in the uncertain set (alternatively → they can be filtered before we try to resolve the uncertain set).
  • Topological heads are "certain" to be heads to any topo-heads < to the first uncertain heads can be "skipped" as it won't contribut to the filtering. That can significatly increase the "floor" we use for ancestors computation speeding this up.

I am fine with such optimisation to happens as follow up.


Not that in theory, turning cl.ancestors into a set can be quite expensive. The return of ancestors is a expected to be lazy ancestors object expected to be good at membership test. So I would expect something like that to be faster:

ancestors = cl.ancestors(newheadrevs, floorrev) #floorrev is non-necessary
certain = {r for r in uncertain if r not in ancestors}

(and if it is not, we should fix lazy ancestors)

marmoute accepted this revision.Wed, Jan 13, 10:35 AM

The meaning of the uncertain set has changed somewhat. It is the set of changesets that might have a redundant branchhead as ancestor. As such, the topological heads don't help to reduce the uncertain set.


At least the Python version doesn't seem to cache the computed ancestors, so it would recompute the set when len(uncertain) > 1?

note: I a seeing weird number when checking this against mozilla-unified.

old time: ! wall 1.913970 comb 1.910000 user 1.870000 sys 0.040000 (median of 5)
new_time: ! wall 2.269365 comb 2.260000 user 2.240000 sys 0.020000 (median of 5)

I will rerun more test on a more stable machine and come back with more number

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.