The pickle module expects the input to be buffered and a whole
object to be available when pickle.load() is called, which is not
necessarily true when we send data from workers back to the parent
process (i.e., it seems like a bad assumption for the pickle module
to make). We added a workaround for that in
https://phab.mercurial-scm.org/D8076, which made read() continue
until all the requested bytes have been read.
As we found out at work after a lot of investigation (I've spent the
last two days on this), the native version of pickle.load() has
started calling readinto() on the input since Python 3.8. That
started being called in
https://github.com/python/cpython/commit/91f4380cedbae32b49adbea2518014a5624c6523
(and only by the C version of pickle.load())). Before that, it was
only read() and readline() that were called. The problem with that
was that readinto() on our _blockingreader was simply delegating
to the underlying, *unbuffered* object. The symptom we saw was that
hg fix started failing sometimes on Python 3.8 on Mac. It failed
very relyable in some cases. I still haven't figured out under what
circumstances it fails and I've been unable to reproduce it in test
cases (I've tried writing larger amounts of data, using different
numbers of workers, and making the formatters sleep). I have, however,
been able to reproduce it 3-4 times on Linux, but then it stopped
reproducing on the following few hundred attempts.
To fix the problem, we can simply remove the implementation of
readinto(), since the unpickler will then fall back to calling
read(). The fallback was added a bit later, in
https://github.com/python/cpython/commit/b19f7ecfa3adc6ba1544225317b9473649815b38. However,
that commit also added checking that what read() returns is a
bytes, so we also need to convert the bytearray we use into
that. I was able to add a test for that failure at least.
Can we add a comment the explain why readinto is missing from the object signature? That would avoid people re-introducing it carelessly in the future.