This is an archive of the discontinued Mercurial Phabricator instance.

verify: start to abstract file verification
ClosedPublic

Authored by indygreg on Sep 24 2018, 12:12 PM.

Details

Summary

Currently, the file storage interface has a handful of attributes
that are exclusively or near-exclusively used by repo verification
code. In order to support verification on non-revlog/alternate
storage backends, we'll need to abstract verification so it can
be performed in a storage-agnostic way.

This commit starts that process.

We establish a new verifyintegrity() method on revlogs and expose
it to the file storage interface. Most of verify.verifier.checklog()
has been ported to this new method.

We need a way to represent verification problems. So we invent an
interface to represent a verification problem, invent a revlog type
to implement that interface, and use it.

The arguments to verifyintegrity() will almost certainly change in
the future, once more functionality is ported from the verify code.
And the "revlogv1" version check is very hacky. (The code in verify
is actually buggy because it is comparing the full 32-bit header
integer instead of just the revlog version short. I'll likely fix
this in a subsequent commit.)

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

indygreg created this revision.Sep 24 2018, 12:12 PM

Not related to this patch specifically, but if you're redesigning verify APIs...

One of the unresolved issues in LFS is how to handle verification. Right now, verify wants the actual blob data, so it downloads it one at a time, as needed, as part of reading the filelog. I submitted a patch earlier this year to batch prefetch them[1], and pondered if we can just ask the server (assuming it's our hg code) to verify the missing blobs remotely without actually transferring them. Yuya rightly objected to trying to operate on the raw data and JSON en-mass before it's been verified.

I guess we could do a two pass process, where files marked for external storage simply have their JSON parsed in a try/catch (without the rawtext verification), and then once the filelogs are verified, try to process the blob data in the next phase. So I wonder if more of this needs to be moved to a wrappable function, so that the extension can customize the filelog verification.

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2018-April/116091.html

Not related to this patch specifically, but if you're redesigning verify APIs...

That's good context - and context I hadn't considered!

My immediate goal with reworking verify is to move low-level code out of verify and into storage. Getting the interface "correct" will be a later step.

From what you said, I could easily imagine us wanting to add different verify "modes." e.g. controls to specify whether fulltext data should be verified. And maybe something so LFS can tell revlog's verify method to ignore revisions with the extstored flag so LFS can do something reasonable. Tons of possibilities here. If you want to have a go at refactoring things after the patches in this stack land, I'll happily review those changes!

This revision was automatically updated to reflect the committed changes.