This is an archive of the discontinued Mercurial Phabricator instance.

copytrace: move fb extension to core under flag experimental.fastcopytrace
AbandonedPublic

Authored by pulkit on Aug 11 2017, 7:26 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

The default copytrace implementation is very slow as it finds all the new files
that were added from merge base up to the head commit and for each file it
checks whether it this was copied or moved version of a different file.

copytrace extension in fb-hgext has a heuristic implementation of copy tracing
which is faster than the current copy tracing. The heuristic limits the search
of copies to just files that are either:

  1. Renames in the same directory
  2. Moved to other directory with same name

Stash@fb did analysis for the above heuristics and found that among 2,443,768
moves/copies there are only 32,234 moves/copies which does not fall under the
above heuristics which is approx. 0.013 of total copies.

If experimental.disablecopytrace = yes, then experimental.fastcopytrace flag
won't be considered as user explcitly disabled copytracing.

Elif experimental.disablecopytrace = no, then experimental.fastcopytrace flag will
be considered and if it's set to true, then the fastcopytrace heuristic
implementation will be used.

There are two more flags added by the implementation:

  1. experimental.fastcopytrace.sourcecommitlimmit

This flag limits the number of commits to be traveresed for the heuristics in
source branch i.e. the branch that is rebased or merged. copytracing can be
slow if there are too many commits in the source branch, so this flag can help
in limiting the number of commits.

  1. experimental.fastcopytrace.maxmovescandidatestocheck

This flag limits the number of heuristically found move candidates to check.

The extension also supports fast copytracing during amends which will be moved
in further patches.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

pulkit created this revision.Aug 11 2017, 7:26 PM

Again I am not sure whether the flag names are good. Since disablecopytrace was in experimental section, I went with the same for these ones.

pulkit edited the summary of this revision. (Show Details)Aug 15 2017, 3:44 PM
pulkit updated this revision to Diff 943.

Any updates on this? The function _fastmergecopies() is simply a port of _domergecopies() from fbhgext/copytrace.py. https://phab.mercurial-scm.org/diffusion/FBHGX/browse/default/hgext3rd/copytrace.py;75cfcc6fc62a4f172857beebda6c0e43f318ea87$290

For config options, I think it's cleaner if we can unify them:

experimental.copytrace = disable | default | fast
  • disable: no copy tracing
  • default: old copy tracing algorithm
  • fast: this algorithm

I'm fine with leaving fastcopytrace.sourcecommitlimit config options. But if we do want to not introduce them, they could be also folded into experimental.copytrace option, like fast:sourcecommitlimit=100,maxmovescandidatestocheck=200.

mercurial/copies.py
711–716

This is also used elsewhere. I think it's worthwhile to be defined in localrepository class.

cc @stash Maybe it's more accurate to check paths.default first and fallback to origroot. If a user clones ssh://x/repo-x to repo-x-2, we probably want repo-x as the repo name. What do you think?

quark added inline comments.Aug 20 2017, 4:51 PM
tests/test-fastcopytrace.t
1–9

nit: could execute the script directly

$ initclient() {
>   cat >> $1/.hg/hgrc << EOF
> ...
> EOF
> }
In D358#7173, @quark wrote:

For config options, I think it's cleaner if we can unify them:

experimental.copytrace = disable | default | fast
  • disable: no copy tracing
  • default: old copy tracing algorithm
  • fast: this algorithm

I'm fine with leaving fastcopytrace.sourcecommitlimit config options. But if we do want to not introduce them, they could be also folded into experimental.copytrace option, like fast:sourcecommitlimit=100,maxmovescandidatestocheck=200.

I like the idea of unifying the config, but I was thinking to add a copytrace section as there are couple of more flags introduced by copytracing during amend namely amendcopytrace and amendcopytracelimit. (This is when if we don't want to leave sourcecommitlimit and maxmovescandidatestocheck options).
If we don't want to introduce these limiting flags, then we can plug the existing things into experimental section only.

Also after this and copytracing while amend patches (not send yet) get in, I am thinking to have hg help -k copytrace to document these things so that user can understand them.

stash added a comment.Aug 21 2017, 1:34 PM

Quick heads-up: one of our bootcampers works on the task that does the following: if rebase/merge/graft happens only on the local stack (i.e. base commit is draft), then we fallback to the normal copytrace, because we are sure it's going to be fast enough

pulkit abandoned this revision.Aug 23 2017, 2:30 PM

There is work ongoing on improving copytrace in fb-hgext. I will resend patches with updated copytrace logic once the work is done.