( )⚙ D205 crdump: introduce extension to dump data for code review tools

This is an archive of the discontinued Mercurial Phabricator instance.

crdump: introduce extension to dump data for code review tools
ClosedPublic

Authored by mitrandir on Aug 1 2017, 10:49 AM.
Tags
None
Subscribers

Details

Summary

We've spent a lot of time hacking around the outputs of hg in jf (a new
tool we use at FB to interact with our code-review system). We've figured out that it
could be all avoided and made cleaner with simple hg extension that dumps all
the data we need in a format that we can easily consume.

Test Plan

see attached test

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Branch
default
Lint
Lint OK
Unit
Unit Test Errors

Event Timeline

mitrandir created this revision.Aug 1 2017, 10:49 AM
Herald added a reviewer: Restricted Project. · View Herald TranscriptAug 1 2017, 10:49 AM
mitrandir updated this revision to Diff 474.Aug 1 2017, 10:55 AM

refine help

mitrandir updated this revision to Diff 478.Aug 1 2017, 2:20 PM

tested and fixed the case of binary file removal

quark added a subscriber: quark.Aug 1 2017, 2:35 PM

If it's not end-user facing, maybe make it a debug command? Note some fields (ex. desc) would have encoding issues if it's not UTF-8.

ryanmce requested changes to this revision.Aug 1 2017, 2:51 PM

Looks super solid overall. Just a few issues and nitpicks. Nice work!

hgext3rd/crdump.py
18–21

Shouldn't this use extensions.find()?

26

this could be a revset, so it might be worth making "revision" plural: "revisions to dump"

43

full path or relative to output_directory?

48

base_tmp_dir isn't a thing according to this doc

77

nit: prefer cdata: it's clearly data about commits, not the commits themselves.

92

I'd feel happier if we used phases.CONSTANT rather than hardcoding the "public" name here.

104–105

Comment about the format of the data here and why splitting on @ is the right thing please.

115

What does except None do?

Seems weird and not what you want: https://stackoverflow.com/questions/19327320/python-except-none

120

We want full context -- as much as possible at any rate, right?

128

So this would create paths named like foo/bar/baz_HASH, right? That looks pretty weird to me. I'd rather do something like: HASH.bin/foo/bar/baz (I acknowledge that this is bike-shedding so feel free to ignore)

138

you're mixing styles here. Prefer binaryfiles = []

140–156

What happens when a binary file becomes non-binary, or visa-versa? What does the diff look like? Is this tested?

What happens on the phabricator side?

160–163

No, let's not replicate this logic please. Let's reuse what we already have in this repo.

Also, I don't think it will always necessarily have facebook.com -- what about our inevitable switch to fb.com again?

166

Is this the same revset we use in jf?

Should this be passed in in case we need to change it in jf?

This revision now requires changes to proceed.Aug 1 2017, 2:51 PM
mitrandir added inline comments.Aug 2 2017, 8:40 AM
hgext3rd/crdump.py
18–21

That's how phrevset is doing it.

115

My bad.

120

It's using the diff.unified from the config. But I suppose we can hardcode it here. I'll add also git=True and binary=False.

140–156

no idea what happens on phabricator side, we need to figure it out. I'll add a test.

160–163

I don't want this extension to have to rely on phrevset extension being enabled just to use this little regex from it. I'll copy it to a constant.

166

In jf we don't have "public base" revset. And I think for the function name it's the only revset that's correct :)

mitrandir edited edge metadata.Aug 2 2017, 8:42 AM
mitrandir updated this revision to Diff 500.

responded to all the issues except enconding issue mentioned by @quark (will do it later)

https://www.mercurial-scm.org/wiki/EncodingStrategy says that commit messages and usernames are stored in UTF8 encoding.

ryanmce accepted this revision.Aug 2 2017, 12:30 PM

Just a few nitpicks, land when responded to or addressed.

hgext3rd/crdump.py
18–21

Does that mean it's right?

29

Comment about this number?

Or pass it in via -U?

(we can do this later though)

51

for clarity and consistency with below, let's say "path to file containing..."

56

for clarity and consistency with the below, let's use "path to file relative to repo root"

120

That sounds good for now.

166

We do have a public base revset for the bundle upload, actually. Can you check what we use there? I optimized it to be fast and correct iirc.

183

jf uses:

const base = `last(::ancestor(${revs}) & public())`;

I believe this may be faster than what you have; can you test?

Note: you don't need the ancestor bit because there's only one rev at this point.

tests/test-crdump.t
220

What?

This revision is now accepted and ready to land.Aug 2 2017, 12:30 PM
mitrandir marked 20 inline comments as done.Aug 2 2017, 3:20 PM
mitrandir added inline comments.
hgext3rd/crdump.py
128

it's hash of the filenode

One final thing: I think it still might make sense to split up the information collection calls into two: one for the "heavy" data (diff, binary files, list of filenames), and one for the changelog-only data (so we can reserve revisions earlier, especially if the diff generation is slow). It doesn't prevent us from keeping crdump this way though.

Regardless, let's ship this and start testing it out!

hgext3rd/crdump.py
128

Ah, in that case let's just drop the path part.

ryanmce retitled this revision from crdump: introduce extension to crdump: introduce extension to dump data for code review tools.Aug 3 2017, 4:50 AM
ryanmce edited the summary of this revision. (Show Details)
mitrandir marked 10 inline comments as done.Aug 3 2017, 7:40 AM
mitrandir added inline comments.
hgext3rd/crdump.py
18–21

fixed.

tests/test-crdump.t
220

facepalm

mitrandir marked 2 inline comments as done.Aug 3 2017, 7:47 AM
mitrandir updated this revision to Diff 522.

fixes for the rest of the comments

This revision was automatically updated to reflect the committed changes.