This is an archive of the discontinued Mercurial Phabricator instance.

worker: don't expose readinto() on _blockingreader since pickle is picky
ClosedPublic

Authored by martinvonz on Aug 15 2020, 1:40 AM.

Details

Summary

The pickle module expects the input to be buffered and a whole
object to be available when pickle.load() is called, which is not
necessarily true when we send data from workers back to the parent
process (i.e., it seems like a bad assumption for the pickle module
to make). We added a workaround for that in
https://phab.mercurial-scm.org/D8076, which made read() continue
until all the requested bytes have been read.

As we found out at work after a lot of investigation (I've spent the
last two days on this), the native version of pickle.load() has
started calling readinto() on the input since Python 3.8. That
started being called in
https://github.com/python/cpython/commit/91f4380cedbae32b49adbea2518014a5624c6523
(and only by the C version of pickle.load())). Before that, it was
only read() and readline() that were called. The problem with that
was that readinto() on our _blockingreader was simply delegating
to the underlying, *unbuffered* object. The symptom we saw was that
hg fix started failing sometimes on Python 3.8 on Mac. It failed
very relyable in some cases. I still haven't figured out under what
circumstances it fails and I've been unable to reproduce it in test
cases (I've tried writing larger amounts of data, using different
numbers of workers, and making the formatters sleep). I have, however,
been able to reproduce it 3-4 times on Linux, but then it stopped
reproducing on the following few hundred attempts.

To fix the problem, we can simply remove the implementation of
readinto(), since the unpickler will then fall back to calling
read(). The fallback was added a bit later, in
https://github.com/python/cpython/commit/b19f7ecfa3adc6ba1544225317b9473649815b38. However,
that commit also added checking that what read() returns is a
bytes, so we also need to convert the bytearray we use into
that. I was able to add a test for that failure at least.

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

martinvonz created this revision.Aug 15 2020, 1:40 AM
martinvonz planned changes to this revision.Aug 17 2020, 1:00 PM

I *may* have a lead on how to add a test case for this. I'll update this once I know more.

martinvonz edited the summary of this revision. (Show Details)Aug 17 2020, 1:53 PM
martinvonz updated this revision to Diff 22406.

I *may* have a lead on how to add a test case for this. I'll update this once I know more.

Nope, still wasn't able to test he readinto() issue, but I was able to test another case we ran into and I've added a test for that, along with the fix.

martinvonz updated this revision to Diff 22407.Aug 17 2020, 2:04 PM
marmoute accepted this revision.Aug 26 2020, 12:12 PM
marmoute added a subscriber: marmoute.

This looks overall good. However I recommend adding a comment to make this does not regress (see inline comment)

mercurial/worker.py
75

Can we add a comment the explain why readinto is missing from the object signature? That would avoid people re-introducing it carelessly in the future.

94

This seems like a correct, but unrelated change. Should we extract it in its own changesets?

martinvonz edited the summary of this revision. (Show Details)Aug 26 2020, 12:32 PM
martinvonz updated this revision to Diff 22463.
martinvonz marked an inline comment as done.Aug 26 2020, 12:35 PM
martinvonz added inline comments.
mercurial/worker.py
94

It's not needed without this patch (see commit message), and justifying without this patch is harder, so I prefer to leave it in this patch.

pulkit accepted this revision.Aug 27 2020, 4:01 AM
This revision is now accepted and ready to land.Aug 27 2020, 4:01 AM