This is an archive of the discontinued Mercurial Phabricator instance.

merge: cache unknown dir checks (issue5716)
ClosedPublic

Authored by mbthomas on Oct 24 2017, 11:02 AM.

Details

Summary

As mentioned in D1222, the recent pathconflicts change regresses update
performance in large repositories when many files are being updated.

To mitigate this, we introduce two caches of directories that have
already found to be either:

  • unknown directories, but which are not aliased by files and so don't need to be checked if they are files again; and
  • missing directores, which cannot cause path conflicts, and cannot contain a file that causes a path conflict.

When checking the paths of a file, testing against this caches means we can
skip tests that involve touching the filesystem.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mbthomas created this revision.Oct 24 2017, 11:02 AM

If this mitigates or fixes issue5716, please mention it in the summary line.

Code LGTM, but I haven't done any performance testing on this.

mbthomas retitled this revision from merge: cache unknown dir checks to merge: cache unknown dir checks (issue5716).Oct 24 2017, 1:59 PM

There is still some cost to doing the check, so it's not a complete reversal of issue5716, but it helps a lot.

sid0 added a subscriber: sid0.EditedOct 24 2017, 2:04 PM

Could you make this a class with unknowndircache and missingdircache being private fields?

How much do you think writing some of this code in native code (e.g. Rust) would help?

mbthomas updated this revision to Diff 3083.Oct 25 2017, 5:28 AM

Updated as requested.

The whole of _checkunknownfiles is probably a reasonable candidate for native code if we want to speed up update. It's the second most expensive part, after actually updating the files. The function is also pretty isolated - it accesses the dirstate (which may also be native code), and in some cases it checks the file contents on disk against the destination revision, but that's pretty rare. It might even be possible to do some of it in parallel.

Is it okay to punt on this until after 4.4?

Yes, fine to punt to after 4.4. With D1223 the checks are disabled, so we're back to the old behaviour for path conflicts, but the perf is ok.

durin42 requested changes to this revision.Nov 13 2017, 5:53 PM

needs rebased, otherwise looks fine

This revision now requires changes to proceed.Nov 13 2017, 5:53 PM
mbthomas updated this revision to Diff 3473.Nov 14 2017, 11:37 AM
krbullock accepted this revision.Dec 1 2017, 12:57 PM
This revision was automatically updated to reflect the committed changes.