Page MenuHomePhabricator

copies: add a config to limit the number of candidates to check in heuristics
ClosedPublic

Authored by pulkit on Oct 7 2017, 9:00 PM.

Details

Summary

The heuristics algorithm find possible candidates for move/copy and then check
whether they are actually a copy or move. In some cases, there can be lot of
candidates possible which can actually slow down the algorithm.

This patch introduces a config option
experimental.copytrace.movecandidateslimit using which one can limit the
candidates to check. The limit defaults to 100.

Thanks to Yuya for suggesting to skip copytracing for that file with a
warning.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pulkit created this revision.Oct 7 2017, 9:00 PM
yuja requested changes to this revision.Oct 9 2017, 9:26 AM
yuja added a subscriber: yuja.
  • ui.configint() ?
  • I don't think taking the first 5 candidates would do something sensible. Is it reasonable?
  • tests?
This revision now requires changes to proceed.Oct 9 2017, 9:26 AM
pulkit edited the summary of this revision. (Show Details)Oct 9 2017, 5:09 PM
pulkit updated this revision to Diff 2539.
yuja requested changes to this revision.Oct 11 2017, 10:55 AM

Oops, the default value (5 vs 100) wasn't my point. I doubt if taking the first n
candidates would be a good "heuristic."

If I understand correctly, it may miss the copy information only if the file is
out of the n-candidates list. I think it would be hard for users to understand
why the copy wasn't detected. That's my concern.

(marked as "request changes" to flag discussion is ongoing. no new version is requested.)

This revision now requires changes to proceed.Oct 11 2017, 10:55 AM
In D987#16792, @yuja wrote:

Oops, the default value (5 vs 100) wasn't my point. I doubt if taking the first n
candidates would be a good "heuristic."

Oh, sorry. I changed it to 100 as I think that will be a good maximum and a good default rather than 5.

If I understand correctly, it may miss the copy information only if the file is
out of the n-candidates list. I think it would be hard for users to understand
why the copy wasn't detected. That's my concern.

I have added a debug message saying more candidates than the limit which can give user a hint in this case. Shall I make that a ui.status() thing?

(marked as "request changes" to flag discussion is ongoing. no new version is requested.)

yuja added a comment.Oct 12 2017, 10:14 AM

I have added a debug message saying more candidates than the limit which can give user a hint in this case. Shall I make that a ui.status() thing?

Perhaps.

And I think the status/warning message should be displayed in a stable manner,
which is:

  • do not try to find copy from the first n candidates, and show the message saying copy tracing is disabled due to too many candidates,
  • or show the message only when copy cannot be detected.
In D987#17141, @yuja wrote:

I have added a debug message saying more candidates than the limit which can give user a hint in this case. Shall I make that a ui.status() thing?

Perhaps.
And I think the status/warning message should be displayed in a stable manner,
which is:

  • do not try to find copy from the first n candidates, and show the message saying copy tracing is disabled due to too many candidates,

Should we just disable copy trace or fallback to the full copytracing algorithm? I like falling back to full copytracing rather than disabling straight away.

  • or show the message only when copy cannot be detected.

Did you mean, "show a message only when copy cannot be detected and we have more candidates than the limit"?

yuja added a comment.Oct 13 2017, 10:00 AM
In D987#17308, @pulkit wrote:
In D987#17141, @yuja wrote:
  • do not try to find copy from the first n candidates, and show the message saying copy tracing is disabled due to too many candidates,

Should we just disable copy trace or fallback to the full copytracing algorithm? I like falling back to full copytracing rather than disabling straight away.

Is the full tracing faster than the heuristics in that case? If not, falling back
wouldn't be an option.

  • or show the message only when copy cannot be detected.

Did you mean, "show a message only when copy cannot be detected and we have more candidates than the limit"?

Yes, but I prefer the former option, "don't search copy source because there
are too many candidates", which seems easier to follow.

stash added a subscriber: stash.Oct 16 2017, 11:27 AM

@pulkit sorry for being late with the comments.
I agree with @yuja - falling back is not an option, because it's probably going to be slow.
As for these two options

  • do not try to find copy from the first n candidates, and show the message saying copy tracing is disabled due to too many candidates,
  • or show the message only when copy cannot be detected.

Do I understand correctly that

copy tracing is disabled

means that it's disabled only for this particular file and not for all files?

If yes, then I think that either option is fine.

yuja added a comment.Oct 16 2017, 11:38 AM
In D987#18592, @stash wrote:

copy tracing is disabled

means that it's disabled only for this particular file and not for all files?

Yes, this particular file which has an excessive number of candidates.

pulkit edited the summary of this revision. (Show Details)Oct 17 2017, 5:30 PM
pulkit updated this revision to Diff 2947.
yuja accepted this revision.Oct 18 2017, 8:35 AM
This revision is now accepted and ready to land.Oct 18 2017, 8:35 AM
This revision was automatically updated to reflect the committed changes.