This is an archive of the discontinued Mercurial Phabricator instance.

clindex: integrate Rust nodemap
ClosedPublic

Authored by quark on Nov 20 2017, 7:11 PM.
Tags
None
Subscribers

Details

Reviewers
durham
Group Reviewers
Restricted Project
Commits
rFBHGX2c843b93364b: clindex: integrate Rust nodemap
Summary

Change clindex so it can use the rust nodemap for node lookups and partial
match. By default, the code runs in a "verify" mode that compares the rust
nodemap against the original nodemap. Once we gain more confident, we can
drop the support of the original nodemap (so it does not build a radix
tree).

After this, there is no need to enable perftweaks.cachenoderevs or
fastpartialmatch.

Performance wise, the nodemap seems to be a bit faster than revlog.c trie.
The former takes 91% (+/-5%) of the time needed for the latter. That was
tested by running the following script 20 times on fbsource:

from mercurial.cext import parsers
from hgext3rd.rust import indexes
import contextlib, os
m = {}
@contextlib.contextmanager
def measure(name):
    utime, stime, cutime, cstime, elapsed = os.times()
    try:
        yield
    finally:
        utime2, stime2, cutime, cstime, elapsed2 = os.times()
        print('%14s: User %.3f  Sys %.3f  Real %.3f'
              % (name, utime2 - utime, stime2 - stime, elapsed2 - elapsed))
        m[name] = elapsed2 - elapsed
cl = open('00changelog.i').read()
with measure('nodemap'):
    nm = indexes.nodemap(cl, indexes.nodemap.emptyindexbuffer())
    nm.partialmatch('da0443cd533f4073e8d0e324d55')
with measure('revlog index'):
    idx = parsers.index(cl, False)
    idx.partialmatch('da0443cd533f4073e8d0e324d55')
print('nodemap: %.2f revlog' % (m['nodemap'] / m['revlog index']))
Test Plan

Checks clindex with rust nodemap can be a drop-in replacement

Change revlog.py temporarily to disable its inline feature so clindex will be used:

-_maxinline = 131072
+_maxinline = 0

Then run tests with the extension enabled:

./run-tests.py -l --extra-config-opt=extensions.clindex= --extra-config-opt=clindex.lagthreshold=1 --extra-config-opt=clindex.logpath=/tmp/l  -j `nproc`

Check the test output change makes sense (ex. extension list change, .hg
content change, help text change) and /tmp/l does not have contents like
"inconsistent" or "corrupted".

Check memory safety

Rebuild Python to be valgrind friendly (--without-pymalloc --with-valgrind),

Disable re2 by changing import re2 in util.py to import not_exist since it
will trigger a lot of "read uninitialized memory" reports.

Use valgrind to run a few hg commands:

valgrind python2 hg commit -m 1 --config extensions.clindex= --config clindex.lagthreshold=1

Especially, do a rebase with obsstore disabled to exercise the "strip" code
path and make sure valgrind does not report anything except for memory leak
(normal for Python).

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Nov 20 2017, 7:11 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 20 2017, 7:11 PM
quark added a comment.EditedNov 21 2017, 1:32 AM

I will split the diff.

EDIT: does not look that easy to split.

quark planned changes to this revision.Nov 21 2017, 12:14 PM
quark added inline comments.
hgext3rd/clindex.pyx
424

This method got missing when rebasing.

quark updated this revision to Diff 3742.Nov 21 2017, 3:25 PM
quark edited the test plan for this revision. (Show Details)Nov 22 2017, 8:45 PM
quark updated this revision to Diff 3791.
quark edited the test plan for this revision. (Show Details)Nov 22 2017, 8:52 PM
quark edited the summary of this revision. (Show Details)Nov 22 2017, 9:31 PM
durham requested changes to this revision.Dec 6 2017, 7:48 PM
durham added a subscriber: durham.

Small comments

hgext3rd/clindex.pyx
84

And we're relying on the mmap config to make this fast right?

89

I think this was covered in one of your other diffs, but what happens if the nodemap index is out of date? It builds an inmemory index of the new stuff at the end of the origindex file?

185

If config.nodemap is False, rustnodemap is undefined here right? Is that a problem?

334

Hmm, so this means any command that accesses to repositories changelogs in one process will write logs to the wrong place? And they'll probably read the wrong config as well?

Since the clindex class has access to a vfs, which I assume is the svfs, could we have these global objects at least be in a dictionary indexed by svfs root path? So multiple repositories don't collide?

393

Should we set _logpath equal to False here or something? So we don't repeatedly try to open the bad logpath?

This revision now requires changes to proceed.Dec 6 2017, 7:48 PM
quark added inline comments.Dec 7 2017, 3:38 PM
hgext3rd/clindex.pyx
84

The mmap config removes cost reading .i happening at revlog.__init__, not here. Constructing origindextype is fast regardless of the mmap option.

89

It builds the "side index" as a separate index. It's not mutating the main index since there is no way to copy-on-write plus appending the main index buffer. It's possible to implement such copy-on-write in user-space. But that might introduce too much overhead.

185

It's a problem introduced by multiple amends. Will change.

334

clindex uses cachevfs.

I think config and cachevfs will be right since when _wrapchangelog re-assign them, the re-assignment won't affect existing clindex objects.

Log paths could be wrong if repo changed, yes. But blackbox and commandserver.logpath have similar issues. Maybe we should just use a single global path like /var/log/mercurial/.... I noticed infinitepush is already using that.

quark updated this revision to Diff 4372.Dec 11 2017, 7:20 PM
quark added inline comments.Dec 11 2017, 7:22 PM
hgext3rd/clindex.pyx
393

Should be fine? blackbox does not have that logic and there were no complaints yet.

durham accepted this revision.Dec 14 2017, 7:38 PM
This revision is now accepted and ready to land.Dec 14 2017, 7:38 PM
quark updated this revision to Diff 4454.Dec 14 2017, 9:21 PM
quark edited the summary of this revision. (Show Details)Dec 14 2017, 10:15 PM
This revision was automatically updated to reflect the committed changes.