Page MenuHomePhabricator

pybuf: add a simple abstraction around Py_buffer interface
ClosedPublic

Authored by quark on Nov 22 2017, 8:45 PM.

Details

Summary

This makes it possible to do zero-copy reading the changelog.i data from
Python to Rust because eventually changelog.i does not have to be fully
accessed and together with mmap, that's at least 100ms perf win.

There is a much more general purposed (support writable, non-contiguous
buffer) PyBuffer implementation in rust-cpython github version [1].
However, what NodeMap really wants is AsRef<[u8]> and AsRef<[u32]>,
which are not provided by that.

Since the code is short, let's just add our own wrapper. It also allows us
to use the published version of rust-cpython instead of pulling from github.

[1]: https://github.com/dgrunwald/rust-cpython/blob/master/src/buffer.rs

Test Plan

cargo build

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Nov 22 2017, 8:45 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 22 2017, 8:45 PM
quark edited the summary of this revision. (Show Details)Nov 22 2017, 9:31 PM
quark updated this revision to Diff 3793.
quark updated this revision to Diff 3796.Nov 22 2017, 9:42 PM
quark updated this revision to Diff 3799.Nov 22 2017, 10:40 PM
durham added a subscriber: durham.Dec 1 2017, 2:08 PM

Probably want Jeremy to look at this one. I'm not familiar with some of the unsafe bits.

rust/indexes/Cargo.toml
15

Should we keep the stuff with a python dependency in a separate library? To force us to not accidentally mix dependencies in the pure-rust stuff?

quark added a subscriber: mbthomas.Dec 1 2017, 4:37 PM

Maybe @mbthomas can have a look? He definitely knows (or was forced to learn about) CPython internals per the life-saving https://github.com/dgrunwald/rust-cpython/pull/115

rust/indexes/Cargo.toml
15

I think file-level separation might be good enough. The very low-level functions are already in a separate radixbuf.

jsgf requested changes to this revision.Dec 7 2017, 11:54 AM
jsgf added a subscriber: jsgf.
jsgf added inline comments.
rust/indexes/src/pybuf.rs
42

Why does this need to be #[repr(C)]? I'm not sure it really can be, technically, since PhantomData isn't really a thing in C.

48–49

I think for this to be safe, SimplePyBuf should own the GIL for its entire life. Otherwise we run the risk of the Python mutating the memory under Rust's read-only references, which is definitely undefined behaviour for Rust.

(or see below)

51

I think this should be constrained to T: Copy to make sure its a simple type - it definitely won't work if T has complex state or a Drop implementation.

77

Likewise, T: Copy

As an alternative to making SimplePyBuf always own the GIL, it could have an auxillary guard type which owns the GIL (analogous to MutexGuard), and that guard type is what implements AsRef<[T]>, so that at least while Rust is using the slice it definitely can't change.

So something like:

// assuming an existing SimplePyBuf in spb

let g = spb.lock(); // take GIL owned by SimplePyBufGuard
let s = g.as_ref(); //  get a slice view of the PyBuf
//...
mem::drop(g); // release GIL (alternatively g just goes out of scope)

SimplePyBufGuard could also implement ops::Index (index and slice) for fairly ergonomic direct access to the underlying memory.

This revision now requires changes to proceed.Dec 7 2017, 11:54 AM
quark added inline comments.Dec 7 2017, 3:12 PM
rust/indexes/src/pybuf.rs
42

Will remove. I thought an inner #[repr(C)] may be ineffective without an outer one.

48–49

Having a GIL taken is in theory more correct but practically undesirable - the SimplePyBuf object is part of a long-living struct that cannot block Python from running.

Python will make sure the python object cannot be resized so the pointer and length are always valid.

In [1]: b=bytearray([1,2,3])
In [2]: m=memoryview(b)
In [3]: b.append(4)
BufferError: Existing exports of data: object cannot be re-sized

With that, and only if the Python object is read-only, we can use the buffer confidently without gil.

Testing whether the Python object is read-only does not seem feasible for all cases. It's easy for builtin type bytes. But buffer(mmap_obj) is hard - the buffer object is read-only while its "inner" source of truth might not be. There is no way to test read-only from the inner mmap object, and no way to "unwrap" a buffer object.

51

Good idea. Will do.

77

Considering the fact that

  1. The current code is correct without gil since hg uses read-only python objects.
  2. We might migrate to a Rust-provided AsRef<[u8]> later. The lock would be unnecessary if that happens.

Since we also control the hg code path and this is not a public library, I'd prefer not adding gil for simplicity right now.

quark updated this revision to Diff 4370.Dec 11 2017, 7:20 PM
durham requested changes to this revision.Dec 14 2017, 7:32 PM
durham added inline comments.
rust/indexes/src/pybuf.rs
42

Your comment responding to Jeremy said this would be removed?

89

Jeremy mentioned using T: Copy here. You made the change above, but not here? Your response is seems to be able the GIL comment, so I just wanted to double check that the T: Copy bit wasn't lost accidentally.

This revision now requires changes to proceed.Dec 14 2017, 7:32 PM
quark added inline comments.Dec 14 2017, 7:43 PM
rust/indexes/src/pybuf.rs
42

Sorry, missed this change.

89

This is intentional. The AsRef and Drop implementation does not care about whether T is Copy or not.

quark updated this revision to Diff 4452.Dec 14 2017, 9:21 PM
durham accepted this revision.Dec 15 2017, 1:01 PM
This revision was automatically updated to reflect the committed changes.