This is an archive of the discontinued Mercurial Phabricator instance.

fastexport: committer name should not be quoted
Needs RevisionPublic

Authored by roy on Jan 21 2021, 9:28 PM.

Details

Reviewers
baymax
Group Reviewers
hg-reviewers

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

roy created this revision.Jan 21 2021, 9:28 PM

@joerg.sonnenberger is there any way that this is not right?

The output is entirely valid according to the git format specification. It is also entirely valid under the formatting rules for email addresses, which was supposed to be the origin of this identifiers. As such, I see no reason to change anything.

roy added a comment.Jan 22 2021, 9:50 AM

Using git to create a fast-import, the comitter is never quoted.
Again, using git to import a quoted comitter name, preserves the quotes.
Using git log to display the import, the comitters name appears as "Roy Marples" <roy@marples.name> which just looks wrong.

Other systems that creat a fast-import file such as Fossil don't quote either.

While you might not see any reason to change this as per the specification which I can't argue with, no existing tooling other than hg fastexport actually quotes or understands quotes.

So @joerg.sonnenberger I respectfully ask why we need to quote here? I see no reason to.

roy added a comment.Jan 22 2021, 10:06 AM

So just for kicks I did hg fastexport on my dhcpcd repo where the author names look fine.
I then did hg fastimport (3rd party plugin) on the resultant fast-import file generated from the above.

And hg also displayed "Roy Marples" <roy@marples.name> which again looks wrong.

Should we add a test that, if git-fast-import is available, actually tries the import and verifies that things look reasonable?

The basic concern I have here is that I have stepped on git's toes here before and the field is nowhere near as free text as it is supposed to be according to the specification. So any change should at least double check what the git code is actually doing, because it doesn't do the same as the documentation from what I have seen. Case in point is the data handling, where git does distingiush between +0 and -0.

roy added a comment.EditedJan 22 2021, 3:45 PM

Finds the author command
https://github.com/git/git/blob/7f7ebe054af6d831b999d6c2241b9227c4e4e08d/builtin/fast-import.c#L2689

Which sends the text after the command to parse_ident() here.
https://github.com/git/git/blob/7f7ebe054af6d831b999d6c2241b9227c4e4e08d/builtin/fast-import.c#L1938

First, it verifies there is an email address in angled brackets.
Then it creates a string buffer based structure where it places the exact text from the start upto and including closing angled bracket of the email address.
It then verifies the date and appends it's interpretation of that as well.

Now we go back to continue in the first function to actual use this data in the commit:
https://github.com/git/git/blob/7f7ebe054af6d831b999d6c2241b9227c4e4e08d/builtin/fast-import.c#L2764

So basically it just stores what it's given from the parse_ident function.
There is no attempt to parse the users name to remove any quotation added.

As to the specification - where do you see the need to quote?
All I can find is this

Here <name> is the person’s display name (for example “Com M Itter”) and <email> is the person’s email address (“cm@example.com”). LT and GT are the literal less-than (\x3c) and greater-than (\x3e) symbols. These are required to delimit the email address from the other fields in the line. Note that <name> and <email> are free-form and may contain any sequence of bytes, except LT, GT and LF. <name> is typically UTF-8 encoded.

The quotes here show the value, not that it's quoted - after all, email address uses <>, not "".

Horrible parsing code, but not relevant for here. I gave the reasons why the original code used "", but I don't care either way if git doesn't fall apart anymore than already does when an address is not pretending to be a mail.

roy added a comment.Jan 24 2021, 6:14 PM

Horrible parsing code, but not relevant for here. I gave the reasons why the original code used "", but I don't care either way if git doesn't fall apart anymore than already does when an address is not pretending to be a mail.

You gave your reasons, but I still cannot find any reference to git fast-import spec respecting any RFC formatting as per email address quoting.
I quoted the spec where it says it takes it free-form, which describes what the code does.

The output is entirely valid according to the git format specification.

Yes, the output is valid in that it's free form, but you have forced quoted where none should be.
You might as well put more garbage in there and it's still valid, but it's not correct.

roy added a comment.Jan 24 2021, 6:16 PM

So if I send an email, using thunderbird, 9 folders or gmail and look at the email source the email name is not quoted.
All I ask is that hg behaves the same way. That should not be took much to ask?

baymax requested changes to this revision.May 25 2021, 5:32 AM

There seems to have been no activities on this Diff for the past 3 Months.

By policy, we are automatically moving it out of the need-review state.

Please, move it back to need-review without hesitation if this diff should still be discussed.

:baymax:need-review-idle:

This revision now requires changes to proceed.May 25 2021, 5:32 AM