Page MenuHomePhabricator

rhg: stop manifest traversal when no more files are needed
ClosedPublic

Authored by aalekseyev on Tue, Oct 5, 10:16 AM.

Details

Summary

Stopping the traversal early can skip a significant part
of the manifest traversal, to avoid some of its cost.

The worst-case benchmarks are favorable, as well.
Running [hg cat] on the last file in the manifest of
a large repo, I'm seeing a ~4ms improvement (150ms -> 146ms),
so this time is now almost indistinguishable from the
baseline ("brute force") implementation.

Running [hg cat] on ~220 files together with the last file
of the repo is further improved by ~5ms or so.

I suspect the raw performance improvements are caused by splitting
the manifest search and the file data access into separate phases,
instead of interleaving them.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

aalekseyev created this revision.Tue, Oct 5, 10:16 AM
aalekseyev edited the summary of this revision. (Show Details)Tue, Oct 5, 10:44 AM
aalekseyev updated this revision to Diff 30679.Tue, Oct 5, 10:50 AM
martinvonz requested changes to this revision.Fri, Oct 8, 8:17 PM
martinvonz added a subscriber: martinvonz.

I think this can be done more simply. I've sent D11622 to show what I mean. Does that make sense? How does it perform in your test repo?

This revision now requires changes to proceed.Fri, Oct 8, 8:17 PM
aalekseyev updated this revision to Diff 30700.Mon, Oct 11, 6:51 AM

I think this can be done more simply. I've sent D11622 to show what I mean. Does that make sense? How does it perform in your test repo?

Yeah, using a hashtable was the first thing I tried, but hashtable lookup is slow compared to plain comparison, and since it's done in a tight loop it really does add up.
I've run some benchmarks in different scenarois, ordered roughly in the order of complexity.

orig: 1b0f8aafedea
hashtbl: 8143112baaa4 (your patch, minus the early termination)
merge-itertools: 030cbc80308a
hashtbl-stop-early: e0e7ca2b9334 (your patch)
merge-manual d83c7db2f91f (my patch)

one:
                orig    139 (+ 0.0%) (err 0.7%)
             hashtbl    148 (+ 6.6%) (err 0.4%)
     merge-itertools    143 (+ 3.2%) (err 0.5%)
  hashtbl-stop-early    144 (+ 3.8%) (err 0.6%)
        merge-manual    135 (- 3.0%) (err 1.7%)

two:
                orig    141 (+ 0.0%) (err 1.8%)
             hashtbl    150 (+ 6.0%) (err 0.5%)
     merge-itertools    145 (+ 2.6%) (err 0.4%)
  hashtbl-stop-early    146 (+ 3.4%) (err 1.7%)
        merge-manual    136 (- 4.0%) (err 1.0%)

last:
                orig    140 (+ 0.0%) (err 0.7%)
             hashtbl    149 (+ 6.7%) (err 0.8%)
     merge-itertools    145 (+ 3.8%) (err 1.4%)
  hashtbl-stop-early    156 (+11.8%) (err 0.6%)
        merge-manual    142 (+ 1.4%) (err 0.6%)

many:
                orig    213 (+ 0.0%) (err 0.6%)
             hashtbl    183 (-14.5%) (err 0.4%)
     merge-itertools    175 (-17.9%) (err 0.4%)
  hashtbl-stop-early    178 (-16.6%) (err 0.6%)
        merge-manual    166 (-22.2%) (err 0.9%)

many+last:
                orig    215 (+ 0.0%) (err 0.8%)
             hashtbl    184 (-14.4%) (err 0.9%)
     merge-itertools    177 (-17.7%) (err 0.8%)
  hashtbl-stop-early    191 (-10.8%) (err 1.0%)
        merge-manual    174 (-18.9%) (err 1.8%)

large:
                orig    149 (+ 0.0%) (err 0.7%)
             hashtbl    158 (+ 6.5%) (err 0.9%)
     merge-itertools    156 (+ 4.7%) (err 0.7%)
  hashtbl-stop-early    157 (+ 5.4%) (err 0.6%)
        merge-manual    148 (- 0.8%) (err 1.2%)

These show that hashtbl loses to merge-itertools in pretty much all cases, with or without the early termination.
I can hand-craft a case where early termination gives it some extra advantage, but the existing one and many benchmarks are already terminating at ~5% of the traversal, so we're already in an optimistic case.

I think this can be done more simply. I've sent D11622 to show what I mean. Does that make sense? How does it perform in your test repo?

Yeah, using a hashtable was the first thing I tried, but hashtable lookup is slow compared to plain comparison, and since it's done in a tight loop it really does add up.
I've run some benchmarks in different scenarois, ordered roughly in the order of complexity.
orig: 1b0f8aafedea
hashtbl: 8143112baaa4 (your patch, minus the early termination)
merge-itertools: 030cbc80308a
hashtbl-stop-early: e0e7ca2b9334 (your patch)
merge-manual d83c7db2f91f (my patch)

one:
                orig    139 (+ 0.0%) (err 0.7%)
             hashtbl    148 (+ 6.6%) (err 0.4%)
     merge-itertools    143 (+ 3.2%) (err 0.5%)
  hashtbl-stop-early    144 (+ 3.8%) (err 0.6%)
        merge-manual    135 (- 3.0%) (err 1.7%)
two:
                orig    141 (+ 0.0%) (err 1.8%)
             hashtbl    150 (+ 6.0%) (err 0.5%)
     merge-itertools    145 (+ 2.6%) (err 0.4%)
  hashtbl-stop-early    146 (+ 3.4%) (err 1.7%)
        merge-manual    136 (- 4.0%) (err 1.0%)
last:
                orig    140 (+ 0.0%) (err 0.7%)
             hashtbl    149 (+ 6.7%) (err 0.8%)
     merge-itertools    145 (+ 3.8%) (err 1.4%)
  hashtbl-stop-early    156 (+11.8%) (err 0.6%)
        merge-manual    142 (+ 1.4%) (err 0.6%)
many:
                orig    213 (+ 0.0%) (err 0.6%)
             hashtbl    183 (-14.5%) (err 0.4%)
     merge-itertools    175 (-17.9%) (err 0.4%)
  hashtbl-stop-early    178 (-16.6%) (err 0.6%)
        merge-manual    166 (-22.2%) (err 0.9%)
many+last:
                orig    215 (+ 0.0%) (err 0.8%)
             hashtbl    184 (-14.4%) (err 0.9%)
     merge-itertools    177 (-17.7%) (err 0.8%)
  hashtbl-stop-early    191 (-10.8%) (err 1.0%)
        merge-manual    174 (-18.9%) (err 1.8%)
large:
                orig    149 (+ 0.0%) (err 0.7%)
             hashtbl    158 (+ 6.5%) (err 0.9%)
     merge-itertools    156 (+ 4.7%) (err 0.7%)
  hashtbl-stop-early    157 (+ 5.4%) (err 0.6%)
        merge-manual    148 (- 0.8%) (err 1.2%)

These show that hashtbl loses to merge-itertools in pretty much all cases, with or without the early termination.
I can hand-craft a case where early termination gives it some extra advantage, but the existing one and many benchmarks are already terminating at ~5% of the traversal, so we're already in an optimistic case.

Wow, thanks for taking the time to check so carefully! I had only tried it in the hg repo and I couldn't measure any difference at all there. I tried just now in the mozilla-unified repo. The differences are not measurable there either (see below). I built with HGWITHRUSTEXT=cpython make local and benchmarked with hyperfine. Do you think your repo is different or did I not test correctly?

browser/moz.build:
yours: 204.8 ms ±   5.5 ms
mine:  203.2 ms ±   3.5 ms


xpfe/appshell/moz.build:
yours: 204.6 ms ±   5.3 ms
mine:  203.1 ms ±   2.6 ms

set:**/moz.build
yours: 649.9 ms ±   8.0 ms
mine:  647.5 ms ±   7.4 ms

I tried just now in the mozilla-unified repo. The differences are not measurable there either (see below).

Interesting. I'll try to reproduce that.

I built with HGWITHRUSTEXT=cpython make local

I was building with [make build-rhg], but I'm guessing that's the same.

Do you think your repo is different?

Mine has ~260k files. I don't know how many mozilla repo has, but I'll re-run my benchmarks on that tomorrow to see if I'm getting the same results.

set:**/moz.build

I think rhg does not support the set language, so it must be falling back to python in this case?

set:**/moz.build

I think rhg does not support the set language, so it must be falling back to python in this case?

That was just a lazy description by me. What I actually did was files=$(hg files 'set:**/moz.build'); hyperfine "rhg cat $files".

...
Mine has ~260k files. I don't know how many mozilla repo has, but I'll re-run my benchmarks on that tomorrow to see if I'm getting the same results.

Actually the clone finished before I left, so posting the results now:

all-moz.build:
                orig    152 (+ 0.0%) (err 10.2%)
             hashtbl    156 (+ 2.4%) (err 2.8%)
     merge-itertools    148 (- 2.6%) (err 1.0%)
  hashtbl-stop-early    163 (+ 7.2%) (err 2.8%)
        merge-manual    143 (- 5.8%) (err 1.1%)

browser:
                orig    140 (+ 0.0%) (err 1.0%)
             hashtbl    154 (+10.1%) (err 1.1%)
     merge-itertools    145 (+ 4.0%) (err 0.9%)
  hashtbl-stop-early    112 (-20.0%) (err 1.6%)
        merge-manual    111 (-20.5%) (err 1.1%)

xpfe:
                orig    139 (+ 0.0%) (err 1.9%)
             hashtbl    153 (+ 9.6%) (err 0.8%)
     merge-itertools    146 (+ 4.5%) (err 1.0%)
  hashtbl-stop-early    161 (+15.4%) (err 2.1%)
        merge-manual    142 (+ 1.8%) (err 0.6%)

...
Mine has ~260k files. I don't know how many mozilla repo has, but I'll re-run my benchmarks on that tomorrow to see if I'm getting the same results.

Actually the clone finished before I left, so posting the results now:

all-moz.build:
                orig    152 (+ 0.0%) (err 10.2%)
             hashtbl    156 (+ 2.4%) (err 2.8%)
     merge-itertools    148 (- 2.6%) (err 1.0%)
  hashtbl-stop-early    163 (+ 7.2%) (err 2.8%)
        merge-manual    143 (- 5.8%) (err 1.1%)
browser:
                orig    140 (+ 0.0%) (err 1.0%)
             hashtbl    154 (+10.1%) (err 1.1%)
     merge-itertools    145 (+ 4.0%) (err 0.9%)
  hashtbl-stop-early    112 (-20.0%) (err 1.6%)
        merge-manual    111 (-20.5%) (err 1.1%)
xpfe:
                orig    139 (+ 0.0%) (err 1.9%)
             hashtbl    153 (+ 9.6%) (err 0.8%)
     merge-itertools    146 (+ 4.5%) (err 1.0%)
  hashtbl-stop-early    161 (+15.4%) (err 2.1%)
        merge-manual    142 (+ 1.8%) (err 0.6%)

I wonder why we see so different results. Are those the times reported by hyperfine in your case?

Possibly relevant info about my setup:
rustc version: rustc 1.57.0-nightly (8f8092cc3 2021-09-28)
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Debian

I guess we can also ask @Alphare and @SimonSapin which version they prefer. (I'm personally unlikely to have to maintain this code.)

I wonder why we see so different results. Are those the times reported by hyperfine in your case?

No, I collected the times with a custom OCaml script, but I've been seeing similar results when measuring by hand with time.
I haven't tried using hyperfine.

rustc version: rustc 1.57.0-nightly (8f8092cc3 2021-09-28)
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Debian

I've got:

rustc 1.55.0 (Red Hat 1.55.0-1.el7)
CPU: AMD EPYC 7702P
OS: CentOS 7

I found one more difference: my all-moz.build benchmark is only concatenating 42 files, instead of 1897, because my glob was not recursive. Whoops.
Correcting for that, I'm still seeing a 15-20ms advantage of merge-based algorithm over a hashtable-based.
The absolute time is 220ms vs 240ms, by the way, which is surprisingly far off from 600+ms you're seeing.

I'm getting ~630ms on **/moz.build when I run the original version (1b0f8aafedea), by the way. Could it be that your ~650ms measurement corresponds to that?

I'm getting ~630ms on **/moz.build when I run the original version (1b0f8aafedea), by the way. Could it be that your ~650ms measurement corresponds to that?

Yeah, it's very possible I messed up my measurements somehow. I'll check again tomorrow, then I'll probably queue your version.

martinvonz accepted this revision.Thu, Oct 14, 11:08 AM

I'm getting ~630ms on **/moz.build when I run the original version (1b0f8aafedea), by the way. Could it be that your ~650ms measurement corresponds to that?

Yeah, it's very possible I messed up my measurements somehow. I'll check again tomorrow, then I'll probably queue your version.

Actually, I'll skip checking that. The important things is that you and others working on this are happy.

This revision is now accepted and ready to land.Thu, Oct 14, 11:08 AM