This is an archive of the discontinued Mercurial Phabricator instance.

transaction: split new files into a separate set
ClosedPublic

Authored by joerg.sonnenberger on Nov 7 2020, 4:32 PM.

Details

Summary

Journal entries with size 0 are common as they represent new revlog
files. Move them from the dictionary into a set as the former is more
dense. This reduces peak RSS by 70MB for the NetBSD test repository with
around 450k files under .hg/store.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Why do we have to maintain the in-memory records at all? Don't we write these to the journal and journal.backupfiles files? What's wrong with reading this file handle? We only need to perform a findoffset() when incurring a split revlog or during rollback, right? Is there a noticeable performance overhead to performing that linear scan?

I've been wondering about that question as well. In theory, we could run into O(n^2) behavior when there are many files that end up being split. I've decided to take the step back from the full dropping in D9237 as most transactions are much smaller than the initial clone and therefore are not as sensitive to the memory use.

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.