This is an archive of the discontinued Mercurial Phabricator instance.

manifest: proxy to revlog instance instead of inheriting
ClosedPublic

Authored by indygreg on Aug 27 2018, 12:00 PM.

Details

Summary

Previously, manifestrevlog inherited revlog.revlog and therefore
exposed all its APIs. This inevitably resulted in consumers calling
low-level revlog APIs.

As part of abstracting storage, we want to formalize the interface
for manifest storage. The revlog API is much too large to define as
the interface.

Like we did for filelog, this commit divorces the manifest class
from revlog so that we can standardize on a smaller API surface.

The way I went about this commit was I broke the inheritance, ran
tests, and added proxies until all tests passed. Like filelog, there
are a handful of attributes that don't belong on the interface.
And like filelog, we'll tease these out in the future.

As part of this, we formalize an interface for manifest storage and
add checks that manifestrevlog conforms to the interface.

Adding proxies will introduce some overhead due to extra attribute
lookups and function calls. On the mozilla-unified repository:

$ hg verify
before: real 627.220 secs (user 525.870+0.000 sys 18.800+0.000)
after: real 628.930 secs (user 532.050+0.000 sys 18.320+0.000)

$ hg serve (for a clone)
before: user 223.580+0.000 sys 14.270+0.000
after: user 227.720+0.000 sys 13.920+0.000

$ hg clone
before: user 506.390+0.000 sys 29.720+0.000
after: user 513.080+0.000 sys 28.280+0.000

There appears to be some overhead here. But it appears to be 1-2%.
I think that is an appropriate price to pay for storage abstraction,
which will eventually let us have much nicer things. If the overhead
is noticed in other operations (whose CPU time isn't likely dwarfed by
fulltext resolution) or if we want to cut down on the overhead, we
could dynamically build up a type whose methods are effectively
aliased to a revlog instance's. I'm inclined to punt on that problem
for now. We may have to do it for the changelog. At which point it
could be implemented in a generic way and ported to filelog and
manifestrevlog easily enough I would think.

.. api:: manifest.manifestrevlog no longer inherits from revlog

The manifestrevlog class now wraps a revlog instance instead of
inheriting from revlog. Various attributes and methods on instances
are no longer available.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

indygreg created this revision.Aug 27 2018, 12:00 PM

Adding proxies can be costly, but maybe performance only matters for the changelog? I can imagine hg verify being the command most affected by this change. Does that get even measurably slower? Of course, it's not very important that hg verify is fast; the point is that if there's no measurable impact on that command, then I can't think of anything else where it would be measurable.

indygreg planned changes to this revision.Aug 27 2018, 12:36 PM

Adding proxies can be costly, but maybe performance only matters for the changelog? I can imagine hg verify being the command most affected by this change. Does that get even measurably slower? Of course, it's not very important that hg verify is fast; the point is that if there's no measurable impact on that command, then I can't think of anything else where it would be measurable.

Yes, this will add overhead. Whether that overhead is significant for any operations we care about is up in the air. I've been testing CPU utilization for hg clone on both server and client for the mozilla-unified repo. That's a good proxy for hg bundle and hg unbundle performance. I can add hg verify to my tests as well.

Honestly, for manifests I expect that I/O, decompression, and patch application to dwarf the overhead of an additional Python function call. There might be some tight loops calling into say rev() or node() that will be impacted. But many of those are later followed by something that does revision resolving, delta generation, etc and thus the regression is not noticed.

Anyway, I'll revise this commit message to include some performance numbers.

Adding proxies can be costly, but maybe performance only matters for the changelog? I can imagine hg verify being the command most affected by this change. Does that get even measurably slower? Of course, it's not very important that hg verify is fast; the point is that if there's no measurable impact on that command, then I can't think of anything else where it would be measurable.

Yes, this will add overhead. Whether that overhead is significant for any operations we care about is up in the air. I've been testing CPU utilization for hg clone on both server and client for the mozilla-unified repo. That's a good proxy for hg bundle and hg unbundle performance. I can add hg verify to my tests as well.
Honestly, for manifests I expect that I/O, decompression, and patch application to dwarf the overhead of an additional Python function call. There might be some tight loops calling into say rev() or node() that will be impacted. But many of those are later followed by something that does revision resolving, delta generation, etc and thus the regression is not noticed.

I agree. That's why I said hg verify is the only command I can imagine where the cost can possibly measurable.

Anyway, I'll revise this commit message to include some performance numbers.

Thanks.

indygreg edited the summary of this revision. (Show Details)Aug 27 2018, 1:21 PM
indygreg requested review of this revision.Aug 27 2018, 1:37 PM

I guess hg phabsend doesn't reset the Changed Planned state when a summary-only update occurs.

This revision was automatically updated to reflect the committed changes.