phabricator: convert conduit response JSON unicode to bytes inside callconduit
ClosedPublic

Authored by Kwan on Sat, Mar 2, 1:51 PM.

Details

Summary

Previously the byte conversion was happening piecemeal in callers, and in the
case of createdifferentialrevision not at all, leading to UnicodeEncodeErrors
when trying to phabsend a commit with a description containing characters not
representable in ascii. (issue6040)

Remove all the scattered encoding.unitolocal calls and perform it once, inside
callconduit, on the entire response dict recursively before returning it, in
keeping with the strategy of converting at the earliest opportunity. Convert all
keys used on returned object to bytes.

Modify the phabsend tests to test this by adding a € to the commit message of
alpha.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
Kwan created this revision.Sat, Mar 2, 1:51 PM
Kwan added a comment.Sat, Mar 2, 1:55 PM

Apologies for the lack of a test, but I can't seem to write one that works. Either the test harness isn't capable of tests with unicode in them, or I don't know how to represent it in the test files in such a way that works.

Kwan added a comment.Sat, Mar 2, 6:08 PM

Okay, making progress on the test now, just need to figure out how to update the vcr files cleanly.
Slightly complicated by the fact that there was a small error in the original creation of the phabsend-update-alpha-create-beta.json recording. It must have been made after the updated diff for alpha was already on here, because it hasn't recorded the alpha update, since the query reported the diff to already be the updated version.

Kwan updated this revision to Diff 14306.Sat, Mar 2, 7:25 PM
Kwan edited the summary of this revision. (Show Details)
Kwan updated this revision to Diff 14307.Sat, Mar 2, 8:08 PM
yuja added a subscriber: yuja.Sun, Mar 3, 4:47 AM
  • parsed = json.loads(body) + parsed = pycompat.rapply(lambda x: encoding.unitolocal(x) + if isinstance(x, unicode) else x, json.loads(body))

s/unicode/pycompat.unicode/

Perhaps some of r'' would have to be changed to b'' since dict keys
are now byte strings. See the wiki page for our Py3 hacks.

https://www.mercurial-scm.org/wiki/Python3

Kwan added a comment.Mon, Mar 4, 12:12 PM
In D6044#88262, @yuja wrote:
  • parsed = json.loads(body) + parsed = pycompat.rapply(lambda x: encoding.unitolocal(x) + if isinstance(x, unicode) else x, json.loads(body))

s/unicode/pycompat.unicode/

Thanks, done.

Perhaps some of r'' would have to be changed to b'' since dict keys
are now byte strings. See the wiki page for our Py3 hacks.

https://www.mercurial-scm.org/wiki/Python3

Ah, good point, thanks. Would it be worth keeping the keys as r'' strings? rapply can fairly easily be extended with an optional notkeys boolean that would allow doing so.

yuja added a comment.Tue, Mar 5, 8:45 AM
> Perhaps some of `r''` would have to be changed to `b''` since dict keys
>  are now byte strings. See the wiki page for our Py3 hacks.
> 
> https://www.mercurial-scm.org/wiki/Python3

Ah, good point, thanks.  Would it be worth keeping the keys as r'' strings?  rapply can fairly easily be extended with an optional notkeys boolean that would allow doing so.

I prefer converting everything to bytes so we won't have to handle both
unicode and byte strings.

This revision was automatically updated to reflect the committed changes.