This is an archive of the discontinued Mercurial Phabricator instance.

revlog: don't cache parsed tuples in the C module
ClosedPublic

Authored by joerg.sonnenberger on Oct 6 2020, 7:35 AM.

Details

Summary

A cached entry creates ~8 Python objects per cached changeset, which
comes to around 200 Bytes per cached changeset on AMD64. Especially for
operations that touch a lot of changesets, that can easily sum up to
more than a 100MB of memory. Simple tests on large repositories show
<2% runtime penalty for ripping out the cache, even for cache heavy
operations like "hg log" for all revisions.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pulkit added a subscriber: pulkit.Oct 9 2020, 5:07 AM

Needs some explanation about why not to cache in commit description.

Basically, because the cache in the current form is very expensive in terms of memory use. For a large repository, it can easily consume 100MB+ for something like 2% runtime gain, which doesn't seem a good trade-off.

pulkit added a comment.Oct 9 2020, 9:09 AM

Basically, because the cache in the current form is very expensive in terms of memory use. For a large repository, it can easily consume 100MB+ for something like 2% runtime gain, which doesn't seem a good trade-off.

Okay. I mean that this description should be in commit message.

joerg.sonnenberger retitled this revision from [WIP] revlog: don't cache parsed tuples in the C module to revlog: don't cache parsed tuples in the C module.Oct 14 2020, 4:28 PM
joerg.sonnenberger edited the summary of this revision. (Show Details)
martinvonz accepted this revision.Oct 27 2020, 7:46 PM
martinvonz added a subscriber: martinvonz.

Seems fine to me. We can always roll it back if we notice unacceptable slowdown. Is it safe to apply this patch without the two before it?

This revision is now accepted and ready to land.Oct 27 2020, 7:46 PM

No, the two parents are necessary because they store invalid data in the index, i.e. they violate the interface contract, but it wasn't enforced so far.

Sorry, let me retract that. Wrong review, this one can be pulled out in isolation.

I was skeptical when I saw this patch because complexity in C code in the Mercurial codebase tends to exist for good reasons since we tend to treat C code as toxic and a CVE waiting to happen. Anyway, I traced this cache to https://www.mercurial-scm.org/repo/hg-committed/log/a6fde9d789d9/mercurial/cext/revlog.c?patch=&linerange=349:358 / https://www.mercurial-scm.org/repo/hg-committed/rev/2cdd7e63211b.

The cache is essentially preventing a redundant Py_BuildValue(). Rapid construction of PyObject in the revlog index or obsolesence marker C code has been identified as a bottleneck. So I decided to benchmark this against the Firefox repo's changelog. First line is before. Second is after. Using CPython 3.8.6.

$ ./hg perfrevlogindex -R ~/src/firefox -c
! revlog constructor
! wall 0.011219 comb 0.010000 user 0.000000 sys 0.010000 (best of 256)
! wall 0.014012 comb 0.020000 user 0.000000 sys 0.020000 (best of 210)
! read
! wall 0.011319 comb 0.010000 user 0.000000 sys 0.010000 (best of 253)
! wall 0.013859 comb 0.010000 user 0.000000 sys 0.010000 (best of 212)
! create index object
! wall 0.000001 comb 0.000000 user 0.000000 sys 0.000000 (best of 716721)
! wall 0.000001 comb 0.000000 user 0.000000 sys 0.000000 (best of 713905)
! retrieve index entry for rev 0
! wall 0.000282 comb 0.000000 user 0.000000 sys 0.000000 (best of 9580)
! wall 0.000001 comb 0.000000 user 0.000000 sys 0.000000 (best of 678673)
! look up missing node
! wall 0.003180 comb 0.000000 user 0.000000 sys 0.000000 (best of 712)
! wall 0.002469 comb 0.000000 user 0.000000 sys 0.000000 (best of 768)
! look up node at rev 0
! wall 0.004028 comb 0.000000 user 0.000000 sys 0.000000 (best of 713)
! wall 0.003714 comb 0.000000 user 0.000000 sys 0.000000 (best of 780)
! look up node at 1/4 len
! wall 0.003274 comb 0.000000 user 0.000000 sys 0.000000 (best of 874)
! wall 0.003121 comb 0.000000 user 0.000000 sys 0.000000 (best of 918)
! look up node at 1/2 len
! wall 0.002540 comb 0.010000 user 0.010000 sys 0.000000 (best of 1125)
! wall 0.002539 comb 0.000000 user 0.000000 sys 0.000000 (best of 1121)
! look up node at 3/4 len
! wall 0.001798 comb 0.000000 user 0.000000 sys 0.000000 (best of 1547)
! wall 0.001916 comb 0.000000 user 0.000000 sys 0.000000 (best of 1467)
! look up node at tip
! wall 0.000873 comb 0.000000 user 0.000000 sys 0.000000 (best of 2898)
! wall 0.001107 comb 0.000000 user 0.000000 sys 0.000000 (best of 2362)
! look up all nodes (forward)
! wall 0.154415 comb 0.150000 user 0.150000 sys 0.000000 (best of 63)
! wall 0.157524 comb 0.160000 user 0.160000 sys 0.000000 (best of 62)
! look up all nodes 2x (forward)
! wall 0.267256 comb 0.260000 user 0.260000 sys 0.000000 (best of 37)
! wall 0.271308 comb 0.270000 user 0.270000 sys 0.000000 (best of 37)
! look up all nodes (reverse)
! wall 0.080962 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)
! wall 0.081913 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)
! look up all nodes 2x (reverse)
! wall 0.191335 comb 0.190000 user 0.190000 sys 0.000000 (best of 51)
! wall 0.195113 comb 0.190000 user 0.190000 sys 0.000000 (best of 51)
! retrieve all index entries (forward)
! wall 0.173264 comb 0.170000 user 0.120000 sys 0.050000 (best of 56)
! wall 0.110747 comb 0.110000 user 0.110000 sys 0.000000 (best of 89)
! retrieve all index entries 2x (forward)
! wall 0.198171 comb 0.200000 user 0.160000 sys 0.040000 (best of 50)
! wall 0.221631 comb 0.220000 user 0.220000 sys 0.000000 (best of 45)
! retrieve all index entries (reverse)
! wall 0.165797 comb 0.170000 user 0.140000 sys 0.030000 (best of 59)
! wall 0.102428 comb 0.100000 user 0.100000 sys 0.000000 (best of 97)
! retrieve all index entries 2x (reverse)
! wall 0.179876 comb 0.170000 user 0.130000 sys 0.040000 (best of 55)
! wall 0.206158 comb 0.200000 user 0.200000 sys 0.000000 (best of 49)

There's some odd behavior in there and I'm suspecting CPU power throttling is at play. But I don't see any obvious regressions that would raise objections from me.