( )⚙ D1208 basepack: add a lru cache for the pack files

This is an archive of the discontinued Mercurial Phabricator instance.

basepack: add a lru cache for the pack files
ClosedPublic

Authored by singhsrb on Oct 23 2017, 3:15 PM.
Tags
None
Subscribers

Details

Reviewers
durham
Group Reviewers
Restricted Project
Commits
rFBHGXf5784ad7d9e9: basepack: add a lru cache for the pack files
Summary

To speed up pack lookups (especially when there are lots of packs), we
should maintain an lru ordering of the packs and perform searches in that
order, since it's likely the next entry we search for will be in the same pack
file as the last entry we searched for. This commit achieves the same.

Test Plan
  • Ran all the tests.
  • Created ~2k pack files in a large repo.
    • Time taken without the cache:
      • hg update b while at a: ~18 minutes.
      • hg update a while at b: ~23 seconds.
    • Time taken with the cache:
      • hg update b while at a: ~14 seconds.
      • hg update a while at b: ~9 seconds.

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

singhsrb created this revision.Oct 23 2017, 3:15 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptOct 23 2017, 3:15 PM
singhsrb added a comment.EditedOct 23 2017, 3:18 PM

I have verified that the cache behaves as expected. I will add more tests for the caching anyways. I wanted to put this out so that there are some eyes on it before I go too far.

Please see D1220 for some idea of how I tested the improvements due to the cache.

singhsrb edited the test plan for this revision. (Show Details)Oct 23 2017, 3:20 PM
durham requested changes to this revision.Oct 23 2017, 10:44 PM
durham added a subscriber: durham.
durham added inline comments.
remotefilelog/basepack.py
105

Why not put the default value on the basepackstore instead of putting the same value on both derived types?

129

Why do we need the notion of a path in this class? I would kinda expect this class to just have an add(pack) and __iter__ functions? No need to implement the rest of the apis if we never use them. Then this could just be named self._packs and the member on basepackstore could go back to being self.packs?

130

If the reason for making _pathtopack a dictionary was because lrucachedict is a dictionary, you can just make lrucachedict a dictionary of pack->True

167

This sort could still save on the initial lookup right? Seems like a reasonable win for such a small amount of code.

This revision now requires changes to proceed.Oct 23 2017, 10:44 PM
singhsrb planned changes to this revision.Oct 23 2017, 11:41 PM

@durham I am assuming you are good with the overall approach. I will make the appropriate fixes.

remotefilelog/basepack.py
105

The value happens to be the same but in general, I wanted the child classes to decide on the default size. Or, at least have the option of overriding the default in the child classes.

129

I considered this and then was pretty much lazy because I had already written the code based on the dict. Wanted to get the idea out there more than get it out perfectly :). I will fix this!

130

Sure!

167

I remember I put this TODO here because the caching I added doesn't use the most recent files anyways (and I wanted to bring that to attention of the reviewers) and it seemed to not make too much of a difference. I have a better idea to actually load the cache with the most recent entries instead of a cold start.

singhsrb updated this revision to Diff 3077.Oct 24 2017, 2:04 AM
singhsrb added inline comments.Oct 24 2017, 2:06 AM
remotefilelog/basepack.py
243–244

Hmm, the _getavailablepackfiles returns the most recent packs but conversion to set here can possibly throw away that information. We can probably also refactor/fix this code.

durham requested changes to this revision.Oct 26 2017, 4:12 PM

Sorry for the late review. I think it looks good, but it just needs a test.

remotefilelog/basepack.py
47

You can still put this on the basepackstore class, so subclasses can override it if they want.

129

I'd probably avoid nesting the classes. Just make this a top level class. Nested classes introduce some possible confusion around scope and what the inner class has access to.

141

Generally I would check if self._lastpack is not None: instead of relying on the default handler. A pack implementation might implement something like len which python would then use to check if the pack was empty here, which could be expensive.

170

This pattern is pretty clever!

This revision now requires changes to proceed.Oct 26 2017, 4:12 PM
durham accepted this revision.Oct 26 2017, 4:13 PM

Ah I see the test is another diff. Probably worth fixing some of the comments, but accepting so you're not blocked after.

This revision is now accepted and ready to land.Oct 26 2017, 4:13 PM
singhsrb updated this revision to Diff 3157.Oct 30 2017, 2:09 PM
singhsrb marked 2 inline comments as done.Oct 30 2017, 2:13 PM

The test is still pending but I guess this is in a good shape now.

remotefilelog/basepack.py
141

Neat! I wasn't aware so much can happen under the hood. Will keep this in mind for future!

singhsrb updated this revision to Diff 3300.Nov 6 2017, 6:35 PM
singhsrb planned changes to this revision.Nov 7 2017, 12:13 PM
singhsrb added inline comments.
tests/remotefilelog-datapack.py
305

This loop is unnecessary. I might have a better way to test this.

singhsrb updated this revision to Diff 3322.Nov 7 2017, 2:10 PM
This revision is now accepted and ready to land.Nov 7 2017, 2:10 PM
singhsrb requested review of this revision.Nov 7 2017, 2:12 PM

The test still needs to be reviewed.

singhsrb updated this revision to Diff 3323.Nov 7 2017, 2:15 PM
singhsrb updated this revision to Diff 3324.
durham added inline comments.Nov 7 2017, 7:59 PM
tests/remotefilelog-datapack.py
296

It's a bit confusing to call this packs since it's actually a bunch of revisions. I was pretty confused about why 'pack.index' worked below. Maybe rename this 'deltachains' or something?

durham accepted this revision.Nov 7 2017, 8:09 PM
This revision is now accepted and ready to land.Nov 7 2017, 8:09 PM
singhsrb updated this revision to Diff 3329.Nov 7 2017, 8:24 PM
singhsrb marked an inline comment as done.Nov 7 2017, 8:27 PM
This revision was automatically updated to reflect the committed changes.