This is an archive of the discontinued Mercurial Phabricator instance.

global: use raw string for setlocale() argument
ClosedPublic

Authored by indygreg on Mar 2 2019, 4:26 PM.

Details

Summary

Otherwise Python 2 will coerce a unicode to str, which fails
on HGUNICODEPEDANTRY=1.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Mar 2 2019, 4:26 PM
indygreg added a subscriber: yuja.Mar 2 2019, 4:32 PM

The context for this series is that HGUNICODEPEDANTRY is completely broken without it.

For those not aware, on Python 2, if you call sys.setdefaultencoding('undefined'), it loads a special codec (https://docs.python.org/2/library/codecs.html#python-specific-encodings) which makes automatic coercion raise an exception instead of silently proceeding. It essentially finds all the places where you are using str/unicode improperly.

After this series, the number of test failures in that mode is ~70. Most of the remaining failures are in stringutil.wrap(). Essentially textwrap from the standard library wants to operate on native str. Our version is feeding in unicode. We get into trouble when the standard library code does a ''.join() and some unicode is coerced into str. TBH I'm not sure how we're not seeing bugs due to this on Python 2. I'm guessing we're lacking test coverage for something passing non-ASCII into stringutil.wrap()?

@yuja you may be interested in the text wrapping issue.

This revision was automatically updated to reflect the committed changes.
yuja added a comment.Mar 3 2019, 5:01 AM
After this series, the number of test failures in that mode is ~70. Most of the remaining failures are in `stringutil.wrap()`. Essentially `textwrap` from the standard library wants to operate on native `str`. Our version is feeding in `unicode`. We get into trouble when the standard library code does a `''.join()` and some `unicode` is coerced into `str`. TBH I'm not sure how we're not seeing bugs due to this on Python 2. I'm guessing we're lacking test coverage for something passing non-ASCII into `stringutil.wrap()`?

Can you share some example of the test failure?

I'm not sure what's wrong with the textwrap. AFAIK, it's string-neutral.
''.join() would return a unicode string if one of the inputs is a unicode,
and the others are all ASCII bytes.

Yeah, ''.join([u'']) will coerce the str to unicode, so we're fine on that front. The problem is only with HGUNICODEPEDANTRY=1, which will raise during implicit str <-> unicode coercion.

One can reproduce a failure with HGUNICODEPEDANTRY=1 ./run-tests.py -l test-record.t.

I doubt the test suite has been remotely close to passing with HGUNICODEPEDANTRY for possibly years. I'm not sure it is even worth keeping around, since Python 3 has very similar functionality for free. I dunno.