This is an archive of the discontinued Mercurial Phabricator instance.

rust: Refactor parse_dirstate() to take a callback
AbandonedPublic

Authored by SimonSapin on Apr 12 2021, 8:23 AM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

The callback is called each entry, with optional copy source,
instead of having parse_dirstate() return separate Vecs for copy mappings
and the rest of the data.

This avoids an intermediate heap allocation, but more importantly will allow
the dirstate tree experiment to store copy data next to the rest of data
instead of a separate mapping.

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

SimonSapin created this revision.Apr 12 2021, 8:23 AM
SimonSapin abandoned this revision.Apr 19 2021, 10:54 AM

In https://phab.mercurial-scm.org/D10362#159004 I was worried about potential perf impact, measured, and found none.

However when measuring the entire stack instead of just the first patch I found some regressions (7~12% more wall clock time for hg status or hg status -mard) and bisected them to this patch. This patch changes parse_dirstate from returning a pair of Vec that are then turned into HashMaps with Iterator::collect, to taking a callback that is called for each entry to directly insert into HashMaps. I expected that skipping the intermediate Vecs would be a slight improvement (or negligible), but this turns out slower because this starts with small maps and reallocate several times as they grow. Reallocating a HashMap is more expensive than reallocating a Vec because it involves re-hashing. Profiling one case shows ~10% time spend in RawTable::reserve_rehash in the hashbrown crate. Iterator::collect however can rely on Iterator::size_hint (which is accurate when starting from a Vec) and make a correctly-sized HashMap from the start, without re-hashing.

Therefore I’m rescinding most of this patch, in order to keep the intermediate Vec for the current DirstateMap and avoid the perf regression. The experimental tree-based DirstateMap still uses a callback, but that is small patch so I’ve folded it into D10367.

It might be possible to skip the intermediate Vec without adding re-hashing if we can pre-allocate a correctly-sized HashMap. Finding the accurate size would require parsing, but maybe an estimate would be good enough. The equivalent Python code does this: https://www.mercurial-scm.org/repo/hg-committed/file/5.7.1/mercurial/dirstate.py#l1660. However this unrelated to the initial goal of this patch and can be pursued separately.