Page MenuHomePhabricator

[POC] revset: on-disk cache for children queries
Needs ReviewPublic

Authored by joerg.sonnenberger on Apr 16 2019, 6:43 PM.

Details

Reviewers
marmoute
Group Reviewers
hg-reviewers
Summary

This is a proof of concept for further discussion.

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

joerg.sonnenberger retitled this revision from revset: on-disk cache for children queries to [POC] revset: on-disk cache for children queries.Apr 16 2019, 6:49 PM

Do you have performance numbers to share? Substantial wins would definitely pique my interest :)

For the NetBSD repository, a trivial test with the new cache:

time hg log -r '1000~-400' -T {node} > /dev/null
real	0m1.898s
time hg log -r '1000~-400' -T {node} > /dev/null
real	0m0.170s
time hg log -r '1000~-1' -T {node} > /dev/null
real	0m0.166s
time hg log -r '440000~-1' -T {node} > /dev/null
real	0m0.170s

First one includes the time to initially warmup the cache.

Without the cache:

time hg log -r '1000~0' -T {node} > /dev/null
real	0m0.196s
time hg log -r '1000~-1' -T {node} > /dev/null
real	0m0.825s
time hg log -r '1000~-2' -T {node} > /dev/null
real	0m1.288s
time hg log -r '1000~-400' -T {node} > /dev/null
real	3m23.201s
time hg log -r '440000~-1' -T {node} > /dev/null
real	0m0.182s

In other words, building the cache is amortised by one or two queries for children early up in the tree. The cache still provides a good benefit nearer to tip.

Perhaps we'd want this to not be specifically for revsets? We could also write to it when we write to the changelog to keep it up to date.

If we have one, we'll want it to be generic (so also for changelog). Do we have performance number of the time it takes to build it ?

marmoute requested changes to this revision.Apr 22 2020, 12:05 PM

This is POC, so moving out of review.

This revision now requires changes to proceed.Apr 22 2020, 12:05 PM
joerg.sonnenberger retitled this revision from [POC] revset: on-disk cache for children queries to revset: on-disk cache for children queries.Jul 12 2020, 9:52 PM
joerg.sonnenberger updated this revision to Diff 21871.
joerg.sonnenberger retitled this revision from revset: on-disk cache for children queries to [POC] revset: on-disk cache for children queries.Jul 12 2020, 10:07 PM
pulkit added a subscriber: pulkit.Jul 20 2020, 8:58 AM

@joerg.sonnenberger Is this still a POC? I see that you updated the code.

Yes, it's still a POC. I wanted to make sure that it works in the modern world, but I am still considering the idea in the context of larger changes for transactional caches.

The approach used by the nodemap is probably a good way forward for transactional case. Some kind of happen only index, that get recompacked on a regular basis. Combined with an append only data file (for example, children can be stored as linked list (data file size) with a pointer to the first entry in an index file.