Download Raw Diff

Details

Reviewers

marmoute
pulkit

Group Reviewers

hg-reviewers

Commits

rHG7d24201b6447: worker: don't expose readinto() on _blockingreader since pickle is picky
rHGc2bf211c74bf: worker: don't expose readinto() on _blockingreader since pickle is picky

Summary

The pickle module expects the input to be buffered and a whole
object to be available when pickle.load() is called, which is not
necessarily true when we send data from workers back to the parent
process (i.e., it seems like a bad assumption for the pickle module
to make). We added a workaround for that in
https://phab.mercurial-scm.org/D8076, which made read() continue
until all the requested bytes have been read.

As we found out at work after a lot of investigation (I've spent the
last two days on this), the native version of pickle.load() has
started calling readinto() on the input since Python 3.8. That
started being called in
https://github.com/python/cpython/commit/91f4380cedbae32b49adbea2518014a5624c6523
(and only by the C version of pickle.load())). Before that, it was
only read() and readline() that were called. The problem with that
was that readinto() on our _blockingreader was simply delegating
to the underlying, *unbuffered* object. The symptom we saw was that
hg fix started failing sometimes on Python 3.8 on Mac. It failed
very relyable in some cases. I still haven't figured out under what
circumstances it fails and I've been unable to reproduce it in test
cases (I've tried writing larger amounts of data, using different
numbers of workers, and making the formatters sleep). I have, however,
been able to reproduce it 3-4 times on Linux, but then it stopped
reproducing on the following few hundred attempts.

To fix the problem, we can simply remove the implementation of
readinto(), since the unpickler will then fall back to calling
read(). The fallback was added a bit later, in
https://github.com/python/cpython/commit/b19f7ecfa3adc6ba1544225317b9473649815b38. However,
that commit also added checking that what read() returns is a
bytes, so we also need to convert the bytearray we use into
that. I was able to add a test for that failure at least.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.Aug 15 2020, 1:40 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptAug 15 2020, 1:40 AM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

I *may* have a lead on how to add a test case for this. I'll update this once I know more.

martinvonz edited the summary of this revision. (Show Details)Aug 17 2020, 1:53 PM

martinvonz updated this revision to Diff 22406.

In D8928#133663, @martinvonz wrote:

I *may* have a lead on how to add a test case for this. I'll update this once I know more.

Nope, still wasn't able to test he readinto() issue, but I was able to test another case we ran into and I've added a test for that, along with the fix.

martinvonz updated this revision to Diff 22407.Aug 17 2020, 2:04 PM

This looks overall good. However I recommend adding a comment to make this does not regress (see inline comment)

mercurial/worker.py
75	Can we add a comment the explain why `readinto` is missing from the object signature? That would avoid people re-introducing it carelessly in the future.
98	This seems like a correct, but unrelated change. Should we extract it in its own changesets?

martinvonz edited the summary of this revision. (Show Details)Aug 26 2020, 12:32 PM

martinvonz updated this revision to Diff 22463.

martinvonz marked an inline comment as done.Aug 26 2020, 12:35 PM

martinvonz added inline comments.

mercurial/worker.py
98	It's not needed without this patch (see commit message), and justifying without this patch is harder, so I prefer to leave it in this patch.

pulkit accepted this revision.Aug 27 2020, 4:01 AM

This revision is now accepted and ready to land.Aug 27 2020, 4:01 AM

martinvonz added a commit: rHGc2bf211c74bf: worker: don't expose readinto() on _blockingreader since pickle is picky.Aug 27 2020, 4:47 AM

Closed by commit rHGc2bf211c74bf: worker: don't expose readinto() on _blockingreader since pickle is picky (authored by martinvonz). · Explain Why

This revision was automatically updated to reflect the committed changes.

martinvonz added a commit: rHG7d24201b6447: worker: don't expose readinto() on _blockingreader since pickle is picky.Aug 27 2020, 10:10 AM

			Path	Packages
M			mercurial/worker.py (10 lines)
A	M		tests/test-fix-pickle.t (45 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	22405		Aug 15 2020, 1:40 AM	★	★
Diff 2	22406		Aug 17 2020, 1:53 PM	★	★
Diff 3	22407		Aug 17 2020, 2:04 PM	★	★
Diff 4	22463		Aug 26 2020, 12:32 PM	★	★
Diff 5	22469	rHGc2bf211c74bf97be0a24e2446b75867cb4f588ee	Aug 14 2020, 11:45 PM	★	★



	if pycompat.ispy3:			if pycompat.ispy3:

	class _blockingreader(object):			class _blockingreader(object):
	def __init__(self, wrapped):			def __init__(self, wrapped):
	self._wrapped = wrapped			self._wrapped = wrapped

	def __getattr__(self, attr):			# Do NOT implement readinto() by making it delegate to
	return getattr(self._wrapped, attr)			# _wrapped.readline(), since that is unbuffered. The unpickler is fine
				marmouteUnsubmitted Done Can we add a comment the explain why `readinto` is missing from the object signature? That would avoid people re-introducing it carelessly in the future. marmoute: Can we add a comment the explain why `readinto` is missing from the object signature? That…
				# with just read() and readline(), so we don't need to implement it.

				def readline(self):
				return self._wrapped.readline()

	# issue multiple reads until size is fulfilled			# issue multiple reads until size is fulfilled
	def read(self, size=-1):			def read(self, size=-1):
	if size < 0:			if size < 0:
	return self._wrapped.readall()			return self._wrapped.readall()

	buf = bytearray(size)			buf = bytearray(size)
	view = memoryview(buf)			view = memoryview(buf)
	pos = 0			pos = 0

	while pos < size:			while pos < size:
	ret = self._wrapped.readinto(view[pos:])			ret = self._wrapped.readinto(view[pos:])
	if not ret:			if not ret:
	break			break
	pos += ret			pos += ret

	del view			del view
	del buf[pos:]			del buf[pos:]
	return buf			return bytes(buf)
				marmouteUnsubmitted Not Done This seems like a correct, but unrelated change. Should we extract it in its own changesets? marmoute: This seems like a correct, but unrelated change. Should we extract it in its own changesets?
				martinvonzAuthorUnsubmitted Done It's not needed without this patch (see commit message), and justifying without this patch is harder, so I prefer to leave it in this patch. martinvonz: It's not needed without this patch (see commit message), and justifying without this patch is…


	else:			else:

	def _blockingreader(wrapped):			def _blockingreader(wrapped):
	return wrapped			return wrapped

				A script that implements uppercasing all letters in a file.

				$ UPPERCASEPY="$TESTTMP/uppercase.py"
				$ cat > $UPPERCASEPY <<EOF
				> import sys
				> from mercurial.utils.procutil import setbinary
				> setbinary(sys.stdin)
				> setbinary(sys.stdout)
				> sys.stdout.write(sys.stdin.read().upper())
				> EOF
				$ TESTLINES="foo\nbar\nbaz\n"
				$ printf $TESTLINES \| "$PYTHON" $UPPERCASEPY
				FOO
				BAR
				BAZ

				This file attempts to test our workarounds for pickle's lack of
				support for short reads.

				$ cat >> $HGRCPATH <<EOF
				> [extensions]
				> fix =
				> [fix]
				> uppercase-whole-file:command="$PYTHON" $UPPERCASEPY
				> uppercase-whole-file:pattern=set:**
				> EOF

				$ hg init repo
				$ cd repo

				# Create a file that's large enough that it seems to not fit in
				# pickle's buffer, making it use the code path that expects our
				# _blockingreader's read() method to return bytes.
				$ echo "some stuff" > file
				$ for i in $($TESTDIR/seq.py 13); do
				> cat file file > tmp
				> mv -f tmp file
				> done
				$ hg commit -Am "add large file"
				adding file

				Check that we don't get a crash

				$ hg fix -r .
				saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-fix.hg (glob)

This is an archive of the discontinued Mercurial Phabricator instance.

worker: don't expose readinto() on _blockingreader since pickle is picky
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 22469

mercurial/worker.py

tests/test-fix-pickle.t

This is an archive of the discontinued Mercurial Phabricator instance.

worker: don't expose readinto() on _blockingreader since pickle is pickyClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 22469

mercurial/worker.py

tests/test-fix-pickle.t

worker: don't expose readinto() on _blockingreader since pickle is picky
ClosedPublic

Revision Contents
Changeset List