This is an archive of the discontinued Mercurial Phabricator instance.

chg: fix chg to work with py3.7+ "coercing" the locale
ClosedPublic

Authored by spectral on Dec 5 2019, 6:46 PM.

Details

Summary

When the environment is empty (specifically: it doesn't contain LC_ALL,
LC_CTYPE, or LANG), Python will "coerce" the locale environment variables to be
a UTF-8 capable one. It sets LC_CTYPE in the environment, and this breaks chg,
since chg operates by:

  • start hg, using whatever environment the user has when chg starts
  • hg stores a hash of this "original" environment, but python has already set LC_CTYPE even though the user doesn't have it in their environment
  • chg calls setenv over the commandserver. This clears the environment inside of hg and sets it to be exactly what the environment in chg is (without LC_CTYPE).
  • chg calls validate to ensure that the environment hg is using (after the setenv call) is the one that the chg process has - if not, it is assumed the user changed their environment and we should use a different server. This will *never* be true in this situation because LC_CTYPE was removed.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

spectral created this revision.Dec 5 2019, 6:46 PM
yuja added a subscriber: yuja.Dec 6 2019, 10:57 PM
When the environment is empty (specifically: it doesn't contain LC_ALL,
LC_CTYPE, or LANG), Python will "coerce" the locale environment variables to be
a UTF-8 capable one. It sets LC_CTYPE in the environment, and this breaks chg,
since chg operates by:
- start hg, using whatever environment the user has when chg starts
- hg stores a hash of this "original" environment, but python has already set LC_CTYPE even though the user doesn't have it in their environment
- chg calls setenv over the commandserver. This clears the environment inside of hg and sets it to be exactly what the environment in chg is (without LC_CTYPE).
- chg calls validate to ensure that the environment hg is using (after the setenv call) is the one that the chg process has - if not, it is assumed the user changed their environment and we should use a different server. This will *never* be true in this situation because LC_CTYPE was removed.

Sigh. Can we work around this weird behavior by making chg do
putenv("PYTHONCOERCECLOCALE=0")? I think it's simple and more desired
behavior than the default of Python 3.

In D7550#111235, @yuja wrote:
When the environment is empty (specifically: it doesn't contain LC_ALL,
LC_CTYPE, or LANG), Python will "coerce" the locale environment variables to be
a UTF-8 capable one. It sets LC_CTYPE in the environment, and this breaks chg,
since chg operates by:
- start hg, using whatever environment the user has when chg starts
- hg stores a hash of this "original" environment, but python has already set LC_CTYPE even though the user doesn't have it in their environment
- chg calls setenv over the commandserver. This clears the environment inside of hg and sets it to be exactly what the environment in chg is (without LC_CTYPE).
- chg calls validate to ensure that the environment hg is using (after the setenv call) is the one that the chg process has - if not, it is assumed the user changed their environment and we should use a different server. This will *never* be true in this situation because LC_CTYPE was removed.

Sigh. Can we work around this weird behavior by making chg do
putenv("PYTHONCOERCECLOCALE=0")? I think it's simple and more desired
behavior than the default of Python 3.

I had considered that and was concerned it would create an observable, surprising/confusing difference between chg and non-chg: if chg sets PYTHONCOERCECLOCALE=0, hg won't have LC_CTYPE in the environment, and it WILL have PYTHONCOERCECLOCALE in the environment. When it starts external tools (like merge tools), this may change behavior in some observable fashion, and if the user stops using chg and uses just plain hg, it will have LC_CTYPE in the environment. This would probably be difficult to debug - users (at least the ones I interact with) often don't tell us they're using chg, if they even know that they are. (Sometimes users don't even know they're using chg, such as via their IDE's Mercurial integration, but that's probably not actually a problem here - the IDE would be responsible for making this work, not end users).

I don't know the reason why Python is doing this at all, so maybe my concern is purely hypothetical and not really a problem?

yuja added a comment.Dec 12 2019, 9:07 AM
> Sigh. Can we work around this weird behavior by making chg do
> `putenv("PYTHONCOERCECLOCALE=0")`? I think it's simple and more desired
> behavior than the default of Python 3.
I had considered that and was concerned it would create an observable, surprising/confusing difference between chg and non-chg: if chg sets PYTHONCOERCECLOCALE=0, hg won't have LC_CTYPE in the environment, and it WILL have PYTHONCOERCECLOCALE in the environment. When it starts external tools (like merge tools), this may change behavior in some observable fashion, and if the user stops using chg and uses just plain hg, it will have LC_CTYPE in the environment.

Yeah, that could happen. I checked the CPython code, but there's no easy way
to disable PYTHONCOERCECLOCALE at all without writing a C wrapper or rebuilding
Python itself with --without-c-locale-coercion.

I don't care much about the pollution of subprocess environments since Python
does pollute LC_CTYPE by default, which is IMHO worse, but I agree it isn't
nice to introduce behavior change between pure hg and chg. So we'll have to
take this patch, sigh.

Queued, thanks.

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.