Page MenuHomePhabricator

branchmap: define a hasbranch() to find whether a branch exists or not
Needs RevisionPublic

Authored by pulkit on Nov 22 2018, 6:49 AM.

Details

Reviewers
baymax
Group Reviewers
hg-reviewers
Summary

This patch introduces a branchmap.hasbranch() function to find whether a branch
exists or not. I was going through profile of one of most used command
internally on server side and found that we are spending ~50% of time finding
whether a branch exists or not. We are reading the branchmap cache, building the
related object and then finding whether a branch exists or not which is quite
slow on repositories which have more than 10k named branches.

In the new function, we just iterate over the whole branchmap cache and find
that whether a branch name is present or not. This is essentially
branchmap.read() but without any hash validation, creation of the branchcache()
object.

Following were the improvements observed:

Before:

hg log -b . -l 1: 0.53 sec
hg log -b <last_branch_in_the_cache> -l 1: 1.12 sec

After:

hg log -b . -l 1: 0.32 sec
hg log -b <last_branch_in_the_cache> -l 1: 0.95 sec

There we no improvements observed in hg log -r <branch_name>, if the log limit
is greater than 1, and while using revsets.

I have added a TODO here that we can use binary search instead of plain
iteration for finding whether a branch name exists or not. I got to know that
Martijn Pieters is working on some related stuff to load branchmap in memory as
raw data and decided to implement binary search on top of that.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

pulkit created this revision.Nov 22 2018, 6:49 AM
yuja added a subscriber: yuja.Nov 24 2018, 11:04 PM

I don't think it's good idea to re-scan the cache file per hasbranch() call.
Instead, we'll probably need a lazy parser backed by a in-memory cache data.
The current cache file format is text-based, which wouldn't be easily bisected
without loading (or memmap) the whole content.

+def hasbranch(repo, branchname):
+ """check whether a branchname exists in the repo or not by reading the
+ branchmap cache"""
+
+ if not repo.cachevfs.exists(_filename(repo)):
+ # branchmap file is not present, let's go repo.branchmap() route which
+ # will create that file
+ return branchname in repo.branchmap()
+
+ # TODO: implement binary-search here for faster search
+ with repo.cachevfs(_filename(repo)) as f:
+ f = repo.cachevfs(_filename(repo))
+ lineiter = iter(f)
+ next(lineiter).rstrip('\n').split(" ", 2)

Need to check if the cache file is valid.

+ for l in lineiter:
+ l = l.rstrip('\n')
+ if not l:
+ continue
+ label = l.split(" ", 2)[2]
+ label = encoding.tolocal(label.strip())
+ if label == branchname:

And maybe need to check if the node exists.

+ return True
+ return False

baymax requested changes to this revision.Jan 24 2020, 12:32 PM

There seems to have been no activities on this Diff for the past 3 Months.

By policy, we are automatically moving it out of the need-review state.

Please, move it back to need-review without hesitation if this diff should still be discussed.

:baymax:need-review-idle:

This revision now requires changes to proceed.Jan 24 2020, 12:32 PM