This looks like it also fixes the phabricator test on Windows, which diverged by creating a different hash for the create alpha for phabricator test € commit.
The test harness *should* match existing \n output as a fallback, which got me to wondering if it was the (esc) at the end screwing it up. I tried this patch:
This is a dupe of D8339.
As scary as this patch sounds, I'm pretty sure it is safe, as I believe it restores compatibility with Python 2. Changing sys.std* to be binary streams instead of text streams would be a bigger BC break. And that is not a change I want to make, as this would invalidate assumptions in 3rd party code about the behavior of these streams on Python 3!
This commit isn't strictly required. I performed this refactoring anticipating needing to add sys.std* fixups as part of this function. But it turns out that the SSH protocol server handles I/O redirection via a different mechanism. There actually appear to be redundant mechanisms for intercepting stdio as part of the wire protocol. This is potentially an area that we could clean up. But I'm not inclined to do so at this time.
- # Since Python 3 converts argv to wchar_t type by Py_DecodeLocale() on Unix,
- # we can use os.fsencode() to get back bytes argv.
- # https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55
- # On Windows, the native argv is unicode and is converted to MBCS bytes
- # since we do enable the legacy filesystem encoding. if getattr(sys, 'argv', None) is not None:
- sysargv = list(map(os.fsencode, sys.argv))
+ # On POSIX, the char argv array is converted to Python str using
+ # Py_DecodeLocale(). The inverse of this is Py_EncodeLocale(), which isn't
+ # directly callable from Python code. So, we need to emulate it.
+ # Py_DecodeLocale() calls mbstowcs() and falls back to mbrtowc() with
+ # surrogateescape error handling on failure. These functions take the
+ # current system locale into account. So, the inverse operation is to
+ # .encode() using the system locale's encoding and using the
+ # surrogateescape error handler. The only tricky part here is getting
+ # the system encoding correct, since locale.getlocale() can return
+ # None. We fall back to the filesystem encoding if lookups via locale
+ # fail, as this seems like a reasonable thing to do.
+ # On Windows, the wchar_t argv is passed into the interpreter as-is.
+ # Like POSIX, we need to emulate what Py_EncodeLocale() would do. But
+ # there's an additional wrinkle. What we really want to access is the
+ # ANSI codepage representation of the arguments, as this is what
+ # int main() would receive if Python 3 didn't define int wmain()
+ # (this is how Python 2 worked). To get that, we encode with the mbcs
+ # encoding, which will pass CP_ACP to the underlying Windows API to
+ # produce bytes.
+ if os.name == r'nt':
+ sysargv = [a.encode("mbcs", "ignore") for a in sys.argv]
Could you add a comment about what happens if you hg split either the base or the tip of the range?
I think what happens is each newnode key maps to a single precursor, so it's like a regular amend case. It just happens that multiple newnode keys have the same oldnode in their value tuple. But all we care about is the first and last newnode in the range.
That said, I tried creating a simple test where I amended in a new file to the last commit of the last test, split it with the internal extension, and... Somehow I ended up with a weird state where the first half of the split is pruned...
Sat, Mar 28
@yuja I'd appreciate your eyes on this since you have a firm grasp on Windows/Unicode matters...
Fri, Mar 27
Looks good to me, thanks for the update. Sorry for the delay, it totally fell into the cracks.