( )⚙ D2883 revlogstore: create and implement an interface for repo files storage

This is an archive of the discontinued Mercurial Phabricator instance.

revlogstore: create and implement an interface for repo files storage
AbandonedPublic

Authored by indygreg on Mar 16 2018, 7:07 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

In order to better support partial clones, we will need to overhaul
local repository storage. This will be a major effort, as many parts
of the code assume things like the existence of revlogs for storing
data.

To help support alternate storage implementations, we will create
interfaces for accessing storage. The idea is that consumers will
all code to an interface and any new interface-conforming
implementation can come along and be swapped in to provide new and
novel storage mechanisms.

This commit starts the process of defining those interfaces.

We define an interface for accessing files data. It has a single
method for resolving the fulltext of an iterable of inputs.

The interface is specifically defined to allow out-of-order responses.
It also provides a mechanism for declaring that files data is censored.
We *may* also want a mechanism to declare LFS or largefiles data.
But I'm not sure how that mechanism works or what the best way to
handle that would be, if any.

We introduce a new "revlogstore" module to hold the definitions of
these interfaces that use our existing revlog-based storage
mechanism.

An attribute pointing to the "files store" has been added to
localrepository.

No consumers of the new interface have been added. The interface
should still be considered highly experimental and details are
expected to change.

It was tempting to define the interface as one level higher than
file storage - in such a way to facilitate accessing changeset
and manifest data as well. However, I believe these 3 primitives -
changesets, manifests, and files - each have unique requirements
that will dictate special, one-off methods on their storage
interfaces. I'd rather we define our interfaces so they are
tailored to each type initially. If an implementation wants to
shoehorn all data into generic key-value blog store, they can
still do that. And we also reserve the right to combine interfaces
in the future. I just think that attempting to have the initial
versions of the interfaces deviate too far from current reality will
make it very challenging to define and implement them.

The reason I'm defining and implementing this interface now is to
support new (experimental) wire protocol commands to be used to
support partial clone. Some of these commands will benefit from
aggressive caching. I want to prove out the efficacy of the interfaces
approach by implementing cache-based speedups in the interface layer.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Mar 16 2018, 7:07 PM
indygreg updated this revision to Diff 7149.Mar 19 2018, 7:59 PM

It's probably too early to worry about for the experimenting that you're doing, but at some point, maybe this should also allow yielding the full text in chunks? As it stands now, there are a couple places where LFS has to read in the full file, and one of those places is the filelog/revlog. IIRC, largefiles manages to avoid that completely.

This dated paged is the only thing that I could find talking about the issues with that approach:

https://www.mercurial-scm.org/wiki/HandlingLargeFiles

I'm not sure what this should look like either, but it seemed worthwhile to point out that page, with the accompanying discussion of revlog limitations.

indygreg planned changes to this revision.Mar 22 2018, 11:21 AM

I'll rebase this on top of zope.interface (D2928 and friends). Please defer reviewing for now.

It's probably too early to worry about for the experimenting that you're doing, but at some point, maybe this should also allow yielding the full text in chunks? As it stands now, there are a couple places where LFS has to read in the full file, and one of those places is the filelog/revlog. IIRC, largefiles manages to avoid that completely.

Yes, we should definitely design the interface such that file fulltexts can be expressed as chunks. That doesn't mean we have to implement things to actually use chunks. But it will at least give us an escape hatch so we can do more reasonable things for large files in the future.

indygreg abandoned this revision.Sep 12 2018, 9:44 PM

I'm not actively working on this.