This is an archive of the discontinued Mercurial Phabricator instance.

commands: implement `export --format=X` with support for CBOR
AbandonedPublic

Authored by indygreg on Apr 6 2018, 6:10 PM.

Details

Reviewers
yuja
Group Reviewers
hg-reviewers
Summary

What's better than having to parse patch files? Not having to parse
them.

The current text-based patch file format used by hg export is good
for humans to exchange. But for machines, it is better to use a
data format that is more structured.

We recently introduced support for CBOR. CBOR is a great, binary
preserving data format (unlike JSON), and I think we should use it
heavily for data interchange.

This commit teaches hg export to write data to CBOR. It adds a
--format argument. Hopefully this is the most controversial part of
this patch. I thought about using --template/-T. We already have
-Tjson. -Tcbor seems like a reasonable feature addition. That might
be the right way forward. I'm not sure. I didn't want to scope bloat
the patch. At least not initially. (The code for exporting a patch
is a bit wonky and I'm not very comfortable with the templating
layer.)

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Apr 6 2018, 6:10 PM

This is more of an RFC patch. My actual goal is support for ingesting CBOR patches via hg import. I figured it would be easier to test that if we had support for CBOR with hg export. And the reason I want CBOR support for patch ingestion is because it is safer: it reduces the surface area for injecting badness via patch parsing. See e.g. http://rachelbythebay.com/w/2018/04/05/bangpatch/. This implementation is still only partially there: I think a better approach would be to have structured data for the diffs so we don't need to parse those either. That would allow us to use binary data without escaping, not have to use inline text metadata for copy/renames, not have to worry about encoding of filenames, etc. All the problems with "parse a patch" go away and you are left with commit metadata and a series of splicing instructions, which should be pretty generic.

The I/O in the export code is pretty wonky. I'm kinda sad that cbor2 insists on binding an encoder/decoder to a file object. I *really* wish you could get straight bytes out of it without having to use io.BytesIO().

I suspect someone is going to tell me that hg export should use the templating layer. If someone does, I would appreciate help implementing that. Of course, for us to get CBOR with the templating layer, we'd have to teach the templating layer to emit CBOR. Since we can emit JSON, CBOR seems reasonable. I'm just not too familiar with that code though.

durin42 added a subscriber: durin42.Apr 9 2018, 6:32 PM

I'm broadly in favor. Agree it might make sense to just add generic templating to hg export.

yuja requested changes to this revision.Apr 12 2018, 10:52 AM
yuja added a subscriber: yuja.

OK, hg export -Tjson appears mostly working. Perhaps we can add
cborfromatter for -Tcbor.

This revision now requires changes to proceed.Apr 12 2018, 10:52 AM
indygreg abandoned this revision.Apr 12 2018, 9:41 PM