This is an archive of the discontinued Mercurial Phabricator instance.

chg: read CHGORIG_ values from env and handle these appropriately
AbandonedPublic

Authored by spectral on Jan 27 2020, 7:57 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

Python3.7+ will "coerce" the locale (specifically LC_CTYPE) in many situations,
and this can cause chg to not start. My previous fix in D7550 did not cover all
situations correctly, but hopefully this fix does.

The C side of chg will set CHGORIG_LC_CTYPE in its environment before starting
the command server and before calling setenv on the command server.

When calculating the environment hash, we use the value from CHGORIG_LC_CTYPE to
calculate the hash - intentionally ignoring the modifications that Python may
have done during command server startup.

When chg calls setenv on the command server, the command server will see
CHGORIG_LC_CTYPE in the environment-to-set, and NOT modify LC_CTYPE to be the
same as in the environment-to-set. This preserves the modifications that Python
has done during startup. We'll still calculate the hash using the
CHGORIG_LC_CTYPE variables appropriately, so we'll detect environment changes
(even if they don't cause a change in the actual value). Example:

  • LC_CTYPE=invalid_1 chg cmd
    • Py3.7 sets LC_CTYPE=C.UTF-8 on Linux
    • CHGORIG_LC_CTYPE=1invalid_1
    • Environment hash is as-if 'LC_CTYPE=invalid_1', even though it really is LC_CTYPE=C.UTF-8
  • LC_CTYPE=invalid_2 chg cmd
    • Connect to the existing server, call setenv
    • Calculate hash as-if 'LC_CTYPE=invalid_2', even though it is identical to the other command server (C.UTF-8)

This isn't a huge issue in practice. It can cause two separate command servers
that are functionally identical to be executed. This should not be considered an
observable/intentional effect, and is something that may change in the future.

This is hopefully a more future-proof fix than the original one in D7550: we
won't have to worry about behavior changes (or incorrect readings of the current
behavior) in future versions of Python. If more environment variables end up
being modified, it's a simple one line fix in chg.c to also preserve those.

Important Caveat: if something causes one of these variables to change *inside*
the hg serve process, we're going to end up persisting that value. Example:

  • Command server starts up, Python sets LC_CTYPE=C.UTF-8
  • Some extension sets LC_CTYPE=en_US.UTF-8 in the environment
  • The next invocation of chg will call setenv, saying via CHGORIG_LC_CTYPE that the variable should not be in the environment
  • chgserver.py will preserve LC_CTYPE=en_US.UTF-8

This is quite unlikely and would previously have caused a different problem:

  • Command server starts up, let's assume py2 and so LC_CTYPE is unmodified
  • Some extension sets LC_CTYPE=en_US.UTF-8 in the environment
  • The next invocation of chg will call setenv, saying LC_CTYPE shouldn't be in the environment
  • chgserver.py will say that the environment hash doesn't match and redirect chg to a new server
  • chg will create that server and use that, but it'll have an identical hash to the previous one (since at startup LC_CTYPE isn't modified by the extension yet). This should be fine, it'll then run the command like normal.
  • Every time chg is run, it restarts the command server due to this issue, slowing everything down :)

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage