This is an archive of the discontinued Mercurial Phabricator instance.

narrow: don't include the manifests while widening a narrow clone
AbandonedPublic

Authored by pulkit on Oct 5 2018, 11:40 AM.

Details

Reviewers
durin42
martinvonz
Group Reviewers
hg-reviewers
Summary

When a narrow clone is done initially the whole manifest is send. In case of
treemanifests, the manifest and the dirlogs of the paths matching the
narrowmatcher are send while clone.

So while widening, we don't need the manifests again. In case of treemanifests,
we just need the dirlogs.

This patch adds logic in cgpacker.generate() to skip adding the manifests if
they are not dirlogs and omit manifest from the required parts in call while
widening in narrow extension.

This saves a lot of time on big repos. It prevents downloading manifest which
can be in GB's on big repos and also since we don't get it, we don't try to
unpack and apply it saving more time. It saves around 3-4 minutes on our
internal repository while extending on the treemanifest repo. On flat manifest
repo, since the manifest is much much larger, I can guess this saves around ~10
mins.

I also checked the actual payload size of changegroup in bundle2 in
test-narrow-widen-no-ellipsis.t.

Before this patch

flat: 662
tree: 835

After this patch

flat: 157
tree: 333

These are the numbers from the test which has 7-8 commits and few dirs. On big
repos the improvement is *huge*.

The above all is the case when we are widening without ellipses.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

pulkit created this revision.Oct 5 2018, 11:40 AM

I don't think this is quite the right approach. If we're using treemanifest and widening from {foo/, bar/} to {foo/, bar/, baz/}, then we'd still be sending manifests for all those directories (and their subdirectories, right)? We should only have to send manifests for baz/ (and subdirectories). cgpacker current has a _filematcher. Perhaps it needs a _oldfilematcher as well? Note that a differencematcher won't be enough because it would (correctly) include the root manifest in this example (and pretty much always). Here we need to call visitdir() once on each matcher instead of calling it once on a differencematcher.

I'm still fine with queuing this for now if you need this patch somewhat urgently even though I think the code in this patch and in D4886 should eventually go away.

pulkit added a comment.Oct 5 2018, 1:39 PM

I'm still fine with queuing this for now if you need this patch somewhat urgently even though I think the code in this patch and in D4886 should eventually go away.

Well, we do want to optimize this manifest thing before the upcoming release because it adds 3-4 minutes to some narrow-copy extending. I also agree that all these hacks should go away. I was thinking to implement something like widen_changegroup() which will generate the changegroup which we want instead of hacking into the original changegroup code.

mercurial/changegroup.py
1051

I don't think this is quite the right approach. If we're using treemanifest and widening from {foo/, bar/} to {foo/, bar/, baz/}, then we'd still be
sending manifests for all those directories (and their subdirectories, right)? We should only have to send manifests for baz/ (and subdirectories).

@martinvonz IIUC because of this and condition in line 1041 above, we won't send the manifest for all those directories and we will be sending manifest for baz/. The filematcher here is the differencematcher so foo/ and bar/ does not match it.

pulkit updated this revision to Diff 11706.Oct 5 2018, 1:40 PM
pulkit abandoned this revision.Nov 13 2018, 9:06 AM

Abandoing this one and D4886 as @martinvonz's better fix landed as D4895.