This is an archive of the discontinued Mercurial Phabricator instance.

phases: drop the list with phase of each rev, always comput phase sets
ClosedPublic

Authored by joerg.sonnenberger on Dec 6 2017, 9:47 AM.

Details

Summary

Change the C implementation of phasecache.loadphaserevs to provide only
the sets for draft and secret phase as well as the number of revisions
seen.

Change the pure Python implementation of the same functino to compute
the sets instead of the list of phases for each revision.

Change phasecache.phase to check the phase sets and assume public if the
revision is in neither draft nor secret set. This is computationally
slightly more expensive.

Change phasecache.getrevset for public() based queries to compute the
set of non-matching revisions and return the result as filtered
fullreposet. A shortcut is taken when no draft or secret revision
exists.

Bump the module version for the changed interface contract.

Overall, this saves around 16 Bytes per revision whenever the phasecache
is used, for the test case in issue5691 it is around 3MB. getrevset()
for a large repository is around 13% slower here, that seems an
acceptable trade off. Performance impact for phase() should be similar.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark requested changes to this revision.EditedDec 6 2017, 9:02 PM
quark added a subscriber: quark.

I have been also wanting to remove this O(changelog) space usage for long. I noticed phasecache was the biggest memory consumer among all repo. components (dirstate, changelog, etc) when prototyping stateful chg early this year. So I'm excited about this change.

Requesting changes mainly because the if public in phases branch in getrevset has issues. Otherwise it looks good to me.

mercurial/cext/revlog.c
627

I guess the map here means a rev -> phase map, which was the list. Since we now return len(size). Maybe rename the function to compute_phases_len_sets.

1963

Also here, computephaseslensets

mercurial/phases.py
227–237

The assert is incorrect - repo could be "filtered". The old code support filtered repo implicitly by using for r in repo, the iterator of repo will remove filtered revisions.

227–238

Sorry - I made a mistake using repo.revs('public()') to test the logic here. Actually that exercises a different code path.

There are practically no users exercising this public in phases code path. So issues like fullreposet requires a repo argument is not exposed.

I think we can either:

  • Pass an optional subset here then replace the revset implementation to use it:
def getrevset(self, repo, phases, subset=None):
    if subset is None:
        subset = smartset.fullreposet(repo)
    if ...
        # fast path
        ...
        return subset & smartset.baseset(revs)
    else:
        ...
  • Remove support for calculating public() here by just raise error.ProgrammingError('public() is unsupported')

I guess the former is preferred eventually. But if you want to land this faster, the latter is easier. I can help writing a follow-up.

This revision now requires changes to proceed.Dec 6 2017, 9:02 PM
durin42 added a subscriber: durin42.Dec 6 2017, 9:24 PM
mercurial/phases.py
227–238

I think this can just follow the logic of the other branch, i.e. short cut if len(phases) == 1 and repo.changelog.filteredrevs, otherwise, add repo.changelog.filteredrevs to the union as well.
I added the assert to make sure that the check wasn't necessary, missed the implicit dependency on repo here.

quark added inline comments.Dec 7 2017, 3:43 PM
mercurial/phases.py
227–238

Agree. The only real difference with revset.py:public is the subset argument.

quark accepted this revision.Dec 7 2017, 10:00 PM

I think this is good enough. We can migrate revset.py public() implementation later.

yuja accepted this revision.Dec 8 2017, 9:54 AM
yuja added a subscriber: yuja.

Queued, thanks. Can you send follow-up patches to address trivial issues?

mercurial/cext/revlog.c
691–692

Removed unused variable in flight.

mercurial/phases.py
206

Nit: this isn't the "max" revision, but the size. Perhaps the initial
value should be 0.

237

Nit: could be written as fullrepoest(repo) - baseset(revs).

265–280

Nice, no dependency on repo. As a follow-up, maybe we
can extract a pure function as truly drop-in replacement for
the native function.

267

s/map/pycompat.maplist/ for Python 3 compatibility. Fixed in flight.

296

s/_phaserevs/_phasesets/. Fixed in flight.

This revision is now accepted and ready to land.Dec 8 2017, 9:54 AM
This revision was automatically updated to reflect the committed changes.
quark added inline comments.Dec 8 2017, 5:57 PM
mercurial/phases.py
237

baseset.__contains__ is slower than set.__contains__ so the current code is faster.

yuja added inline comments.Dec 8 2017, 7:58 PM
mercurial/phases.py
237

Really? It's aliased to self._set.__contains__.