Journal entries with size 0 are common as they represent new revlog
files. Move them from the dictionary into a set as the former is more
dense. This reduces peak RSS by 70MB for the NetBSD test repository with
around 450k files under .hg/store.
Details
Details
- Reviewers
- None
- Group Reviewers
hg-reviewers - Commits
- rHGec73a6a75985: transaction: split new files into a separate set
rHGfae02ffcbae8: transaction: split new files into a separate set
Diff Detail
Diff Detail
- Repository
- rHG Mercurial
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Comment Actions
Why do we have to maintain the in-memory records at all? Don't we write these to the journal and journal.backupfiles files? What's wrong with reading this file handle? We only need to perform a findoffset() when incurring a split revlog or during rollback, right? Is there a noticeable performance overhead to performing that linear scan?
Comment Actions
I've been wondering about that question as well. In theory, we could run into O(n^2) behavior when there are many files that end up being split. I've decided to take the step back from the full dropping in D9237 as most transactions are much smaller than the initial clone and therefore are not as sensitive to the memory use.