This is an archive of the discontinued Mercurial Phabricator instance.

changegroup: do not delta lfs revisions
ClosedPublic

Authored by quark on Feb 6 2018, 8:14 PM.

Details

Summary

There is no way to distinguish whether a delta base is LFS or non-LFS.

If the delta is against LFS rawtext, and the client trying to apply it has
the base revision stored as fulltext, the delta (aka. bundle) will fail to
apply.

This patch forbids using delta for LFS revisions in changegroup so bad
deltas won't be transmitted.

Note: this does not solve the problem entirely. It solves LFS delta applying
to non-LFS base. But the other direction: non-LFS delta applying to LFS base
is not solved yet.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Feb 6 2018, 8:14 PM
quark updated this revision to Diff 5269.Feb 6 2018, 9:13 PM
quark updated this revision to Diff 5273.Feb 6 2018, 10:18 PM
indygreg requested changes to this revision.Feb 7 2018, 5:33 PM

This looks mostly good. I would like a change to address a future footgun though.

I would also appreciate someone familiar with censor and narrow to weigh in on the implications of disabling delta generation for revisions that have the censor and ellipsis flags set. I'm pretty sure narrow will cope since it reimplements changegroup generation. Not sure how censor will react. (But I know we already have random code for detecting censored nodes during changegroup generation.)

mercurial/revlog.py
722

This logic assumes that revision flags will only ever be used to influence the presence of content. That is true today because our flags are for REVIDX_ISCENSORED, REVIDX_ELLIPSIS, and REVIDX_EXTSTORED. But if we ever introduced a new revision flag for e.g. compression strategy, then testing for non-empty revision flags would be wrong.

I think we want to capture the bitmask of revision flags influencing delta generation explicitly. Or we want a big comment by the revision flags constants at the top of the file telling people to audit candelta() when adding new revision flags.

This revision now requires changes to proceed.Feb 7 2018, 5:33 PM
quark updated this revision to Diff 5327.Feb 7 2018, 7:35 PM

This looks mostly good. I would like a change to address a future footgun though.
I would also appreciate someone familiar with censor and narrow to weigh in on the implications of disabling delta generation for revisions that have the censor and ellipsis flags set. I'm pretty sure narrow will cope since it reimplements changegroup generation. Not sure how censor will react. (But I know we already have random code for detecting censored nodes during changegroup generation.)

Hmm, I could swear we already don't generate deltas for ellipsis nodes, and I'm having trouble defending the idea that maybe we should. Censor I'm 99% sure disables deltas. I'll ask adgar, but if you don't hear from me assume I was right.

This looks mostly good. I would like a change to address a future footgun though.
I would also appreciate someone familiar with censor and narrow to weigh in on the implications of disabling delta generation for revisions that have the censor and ellipsis flags set. I'm pretty sure narrow will cope since it reimplements changegroup generation. Not sure how censor will react. (But I know we already have random code for detecting censored nodes during changegroup generation.)

I think narrow would be functionally fine with it disabled, but it would be terrible for storage space. We don't set the ellipsis flag on files, but we do on manifests (and we use tree manifests). Some directories have tens of thousands of entries and tens of thousands of entries and tens of thousands of revisions. Storing them all in full would be bad. We get a compression of close to 2000x (not a typo) on these directories. I just created a pretty large clone for testing this and one directory's revlog there ended up being 9MB instead of 15GB.

This looks mostly good. I would like a change to address a future footgun though.
I would also appreciate someone familiar with censor and narrow to weigh in on the implications of disabling delta generation for revisions that have the censor and ellipsis flags set. I'm pretty sure narrow will cope since it reimplements changegroup generation. Not sure how censor will react. (But I know we already have random code for detecting censored nodes during changegroup generation.)

Hmm, I could swear we already don't generate deltas for ellipsis nodes, and I'm having trouble defending the idea that maybe we should. Censor I'm 99% sure disables deltas. I'll ask adgar, but if you don't hear from me assume I was right.

quark updated this revision to Diff 5650.Feb 13 2018, 2:41 PM
ryanmce accepted this revision.Mar 3 2018, 2:36 PM
indygreg accepted this revision.Mar 3 2018, 2:40 PM
This revision is now accepted and ready to land.Mar 3 2018, 2:40 PM
ryanmce added inline comments.Mar 3 2018, 2:50 PM
mercurial/revlog.py
720

prev should be baserev?

quark added inline comments.Mar 3 2018, 2:53 PM
mercurial/revlog.py
720

Should be baserev.

quark updated this revision to Diff 6469.Mar 3 2018, 3:27 PM
indygreg accepted this revision.Mar 6 2018, 1:01 PM
indygreg added inline comments.
tests/test-lfs.t
352

Where did this change come from?

quark added inline comments.Mar 6 2018, 1:10 PM
tests/test-lfs.t
352

The number grows from 3 digits to 4 digits because of increased size, and one of the space becomes a digit.

This revision was automatically updated to reflect the committed changes.