Page MenuHomePhabricator

copies: avoid calling matcher if matcher.always()
ClosedPublic

Authored by martinvonz on May 21 2019, 8:32 PM.

Details

Summary

When storing copy information in the changesets
(experimental.copies.read-from=changeset-only), this patch speeds up

hg debugpathcopies FENNEC_58_0_2_BUILD1 FIREFOX_59_0b8_BUILD2

from 5.9s to 4.7s. At the start of this series (b162229e), that
command took 18min.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.May 21 2019, 8:32 PM
This comment was removed by marmoute.
This comment was removed by marmoute.

Can you indicate a summary of the total speedup of the series ? (from base to last changesets?). Also I am not sure for which case these number apply ? Is this the compatibility mode or after repository conversion ? Can we have number for both ?

To have a more diverse picture of the performacne of thes change, can you provide timing data for the following case?

mozilla-central: hg perfpathcopies 76caed42cf7cb7098aa0eb58242dd36054d06865 1daa622bbe42f8a85e0b4880c5c25df8ea60e95f
pypy:            hg perfpathcopies 3c8ac35c653afe108127ca75688e2f8278192512 d7746d32bf9d785bbc0c6afc9aa6015410a38c8f
mercurial:       hg perfpathcopies 7adb1274a4f930e13b35545ef23914ccae7d5534 0c6c600c03fddabcc45f1046e869f84b276fb467
netbeans:        hg perfpathcopies 588c2d1ced709885eb0bc6b88137efbadbb35b76 1aad62e59ddde2ce37882af12fbb202d3b7961dc
martinvonz edited the summary of this revision. (Show Details)May 22 2019, 3:15 PM

Can you indicate a summary of the total speedup of the series ? (from base to last changesets?).

Sure, done.

Also I am not sure for which case these number apply ? Is this the compatibility mode or after repository conversion ?

After repo conversion.

Can we have number for both ?

The compatibility number is going to be similar to before this series, since it won't benefit from having the removed set of files available cheaply. It would make sense with a follow-up for speeding up compatibility mode by not filtering out removed files. I'm not sure if that should be a separate option or not.

To have a more diverse picture of the performacne of thes change, can you provide timing data for the following case?

mozilla-central: hg perfpathcopies 76caed42cf7cb7098aa0eb58242dd36054d06865 1daa622bbe42f8a85e0b4880c5c25df8ea60e95f
pypy:            hg perfpathcopies 3c8ac35c653afe108127ca75688e2f8278192512 d7746d32bf9d785bbc0c6afc9aa6015410a38c8f
mercurial:       hg perfpathcopies 7adb1274a4f930e13b35545ef23914ccae7d5534 0c6c600c03fddabcc45f1046e869f84b276fb467
netbeans:        hg perfpathcopies 588c2d1ced709885eb0bc6b88137efbadbb35b76 1aad62e59ddde2ce37882af12fbb202d3b7961dc

Can you provide some tags you're curious about instead so it's easier to run the same command in both repos (the hashes are different)? I have the mozilla-unified repo and the hg repo converted.

The nodes in the above example have been selected by a script because they had interresting property. They are not based on a tag so I can't give you one. How did you converted the repo ? I think hg convert keeps a map somewhere, otherwise, using the commit message could work.

The nodes in the above example have been selected by a script because they had interresting property. They are not based on a tag so I can't give you one. How did you converted the repo ? I think hg convert keeps a map somewhere, otherwise, using the commit message could work.

Fair enough. For the mozilla repo, it takes 25s with copies in filelogs and 1m40s with copies in changesets (after this patch). For the mercurial repo, it takes 180ms with either format.

(I did some experiment, here seems a good spot to report them)

I build a crude cache (cbor based storage) for the data that needs caching after this series and tested it against my pypy test case.

filelog-based: 15s
compatibility mode; without cache: 75s
compatibility mode; caching copies without this series: 60s
compatibility mode; caching copies with this series: 40s
compatibility mode; caching all data with this series: 7s (65% spend parsing cbor cache data)

This is much promissing, even if need to check on more diverse cases (various factor can influence performance: number of considered file, number of changeset traversed, number of intermediate version, etc).

The timing above is enough motivation for me to look seriously into a caching/alternative storage plan.

(I did some experiment, here seems a good spot to report them)
I build a crude cache (cbor based storage) for the data that needs caching after this series and tested it against my pypy test case.

filelog-based: 15s
compatibility mode; without cache: 75s
compatibility mode; caching copies without this series: 60s
compatibility mode; caching copies with this series: 40s
compatibility mode; caching all data with this series: 7s (65% spend parsing cbor cache data)

This is much promissing, even if need to check on more diverse cases (various factor can influence performance: number of considered file, number of changeset traversed, number of intermediate version, etc).
The timing above is enough motivation for me to look seriously into a caching/alternative storage plan.

Nice :) Thanks for working on a way to get this stuff out to existing repos (which has not been a priority for me, since that is not Google's use case).

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.