( )⚙ D7116 rust-performance: introduce FastHashMap type alias for HashMap

This is an archive of the discontinued Mercurial Phabricator instance.

rust-performance: introduce FastHashMap type alias for HashMap
ClosedPublic

Authored by Alphare on Oct 16 2019, 9:44 AM.

Details

Summary

Rust's default hashing is slow, because it is meant for preventing collision
attacks.
For all of the current Rust code, we don't care about those attacks, because
if an person with bad intentions has write access to your repo, you have other
issues.

I've chosen to use the TwoXHash crate because it was made by a reputable member
of the Rust community and has very good benchmarks.

For now it does not seem to improve performance by much for the current code,
but it's something else to not worry about when benchmarking code: in a
previous experiment with copytracing in Rust, it accounted for more than 10%
of the time of the entire script.

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

Alphare created this revision.Oct 16 2019, 9:44 AM

OOC, have you compared this with the hashbrown crate for perf?

OOC, have you compared this with the hashbrown crate for perf?

std::collections::HashMap now uses hashbrown https://blog.rust-lang.org/2019/07/04/Rust-1.36.0.html

kevincox accepted this revision.Oct 16 2019, 7:03 PM

I've seen very good results with https://github.com/servo/rust-fnv in the past so it is probably worth including that in the comparison and possibly using it. It is especially good for small keys which seems like a common case in hg.

I've seen very good results with https://github.com/servo/rust-fnv in the past so it is probably worth including that in the comparison and possibly using it. It is especially good for small keys which seems like a common case in hg.

The following comparison shows that for input > 20 bytes, fnv has worse overall performance than xx: https://cglab.ca/~abeinges/blah/hash-rs/.
I think that keys bigger than 20 bytes are pretty common: mercurial/dirstate.py is nested once and is already over 20 bytes. Maybe we could use fnv for cases other than the dirstate where we have mostly small keys, but that seems pretty cumbersome to me.

The following comparison shows that for input > 20 bytes, fnv has worse overall performance than xx

Sounds good. We can always do benchmarks in the future and swap it.

I'll rebase this series once my other patches land, to make it easier to get all instances of HashMap in the codebase.

durin42 requested changes to this revision.Nov 11 2019, 4:18 PM

It sounds like I expect this to be rebased now that other patches landed. Let me know if that's wrong?

This revision now requires changes to proceed.Nov 11 2019, 4:18 PM

Any update on this ? I'll need it for my work on copy tracing. Reading the review thread, there is no remaining blocker to land this.

pulkit accepted this revision.Dec 10 2019, 10:08 AM
This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.