This is an archive of the discontinued Mercurial Phabricator instance.

pycompat: prevent encoding or decoding values if not required
ClosedPublic

Authored by pulkit on Feb 27 2018, 4:36 AM.

Details

Summary

pycompat.py has functions strurl and bytesurl which decodes and encodes the url
passed on Python 3 respectively. In some cases, strurl gets a url which is
already str and bytesurl gets a url which is already bytes. Let's prevent
encoding or decoding the values again if not required.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pulkit created this revision.Feb 27 2018, 4:36 AM
yuja requested changes to this revision.Feb 27 2018, 7:51 AM
yuja added a subscriber: yuja.

I'm 0 for this move.

I'm not a fan of making function inputs and outputs obscure because we'll
be likely to have a hard time to fix it in future. Mixing bytes and unicodes is
horrible. But if it's painful to be strict on url type, I'll say go for it.

mercurial/pycompat.py
198

Need to return an unmodified value.

This revision now requires changes to proceed.Feb 27 2018, 7:51 AM
pulkit updated this revision to Diff 6176.Feb 27 2018, 12:09 PM

I agree with @yuja here. I think we should strive to convert URLs to bytes (like most other internal types). Although I do realize that lots of stdlib code has... expectations. Is there a particular test failure you are tracking down with this fix? I've kind of got a lot of the wire protocol code paged in right now. I could potentially offer some advice.

I agree with @yuja here. I think we should strive to convert URLs to bytes (like most other internal types). Although I do realize that lots of stdlib code has... expectations. Is there a particular test failure you are tracking down with this fix? I've kind of got a lot of the wire protocol code paged in right now. I could potentially offer some advice.

test-http-bundle1.t failures. I think this current patch is probably the "least bad" option (barring getting pytype working, which I think would let us try and constrain some types in more principled ways) since some of these URL-relevant functions are called by both hg and urllib guts. The former wants to use bytes, and the latter can only use unicodes.

I've poked at this on and off a few times and something like this fix is all I've come up with. I'm not happy about it either.

The other option is probably to always treat URLs as unicodes, though that seems like it'll probably come with its own bag of hurt throughout the codebase.

durin42 accepted this revision.Mar 3 2018, 1:53 PM
This revision was automatically updated to reflect the committed changes.