This is an archive of the discontinued Mercurial Phabricator instance.

transaction: encodes tuples in changes['phases'] as 4 bit ints
AbandonedPublic

Authored by joerg.sonnenberger on Dec 8 2017, 7:06 PM.

Details

Reviewers
baymax
quark
Group Reviewers
hg-reviewers
Summary

changes['phases'] contains tuples for the phase transitions, which
provides a non-trivial overhead in Python. Replace this with a Python
dictionary containing 4 bit integers. Provide accessors to centralize
changes['phases'] procsessing in one place.

On average, this improves user time for issue5691 by 1% and peak RSS by
3MB.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

This is a Proof-of-Concept and with a follow-up in D1622 to use a C implementation for the bitset.

Original code: 485s / 584MB
Set version: 501s / 582MB
intbitset: 478s / 563MB

quark added a subscriber: quark.Dec 8 2017, 7:22 PM
quark added inline comments.
mercurial/localrepo.py
1243

Since this hook is the only user of tr.changes['phases']. I'd suggest disabling calculating tr.changes['phases'] if the hook does not exist.

quark accepted this revision.Dec 8 2017, 7:25 PM

That said, I think the usage of list of sets is smart and this change looks good to me.

yuja added a subscriber: yuja.Dec 9 2017, 8:11 AM

Original code: 485s / 584MB
Set version: 501s / 582MB
intbitset: 478s / 563MB

I'm not sure how to read this. Do we only get -2/584MB better space
consumption against the original simplest implementation?

yuja retitled this revision from transaction: split changes['phases'] into sets for src and target phase to [PoC] transaction: split changes['phases'] into sets for src and target phase.Dec 10 2017, 6:09 AM
joerg.sonnenberger edited the summary of this revision. (Show Details)Dec 10 2017, 4:11 PM
joerg.sonnenberger retitled this revision from [PoC] transaction: split changes['phases'] into sets for src and target phase to transaction: encodes tuples in changes['phases'] as 4 bit ints.
joerg.sonnenberger updated this revision to Diff 4341.

This version should be committable. It introduces the necessary API for isolating further changes and gives a small improvement already. I'll be looking at provider a memory and time efficient sparce vector next, that's where the real benefits will be.

yuja added a comment.Dec 11 2017, 9:25 AM

This version should be committable. It introduces the necessary API for isolating further changes and gives a small improvement already. I'll be looking at provider a memory and time efficient sparce vector next, that's where the real benefits will be.

I think it's better to hold this change off until we can get a real performance or
memory-space win. The current number, "peak RSS by 3MB", won't buy much
compared to the API complexity this patch will introduce.

Regarding the API, I think tr.changes[key] is the source of the problem. It allows
free read/write access to arbitrary data, which is hard to gradually improve the
underlying data structure or algorithm.

Noob quesiton. Can't we completely switch of tr.changes['phases'] when it's
unnecessary?

The bitset version has shown already that optimizing this is worthwhile and can eliminate up to 10% of the total size of the transaction object.

The change as is primarily gets the interfaces in place, so that outside code can be adjusted.

yuja added a comment.Dec 12 2017, 8:21 AM

The bitset version has shown already that optimizing this is worthwhile and can eliminate up to 10% of the total size of the transaction object.

IIRC, Jun said a plain intbitset wouldn't be optimal for all scenarios, and I agree
with that. Also, it isn't a standard Python library.

I think the current way of recording phase change is too ad-hoc if it does iterate
all incoming nodes to build (old, new) phases pairs. Instead, can't we reconstruct
them from DAG + old/new phase roots, for example?

If the current tr.changes['phases'] is only used by hooks and third-party
extensions, it can be opt-in feature.

quark resigned from this revision.Dec 16 2017, 1:01 AM

When thinking about it again, I believe providing a way to disable tr.chanages['phases'] is a better way to solve this issue. If we cannot disable that automatically from what hooks are used, we can still provide a config option to disable that.

baymax requested changes to this revision.Jan 24 2020, 12:33 PM

There seems to have been no activities on this Diff for the past 3 Months.

By policy, we are automatically moving it out of the need-review state.

Please, move it back to need-review without hesitation if this diff should still be discussed.

:baymax:need-review-idle:

This revision now requires changes to proceed.Jan 24 2020, 12:33 PM

Obsoleted by storing the phase changes as range, which is normally even more compact.