This is an archive of the discontinued Mercurial Phabricator instance.

rust-cpython: add macro for sharing references
ClosedPublic

Authored by Alphare on Jul 10 2019, 7:46 AM.

Details

Summary

Following an experiment done by Georges Racinet, we now have a working way of
sharing references between Python and Rust. This is needed in many points of
the codebase, for example every time we need to expose an iterator to a
Rust-backed Python class.

In a few words, references are (unsafely) marked as 'static and coupled
with manual reference counting; we are doing manual borrow-checking.

This changes introduces two declarative macro to help reduce boilerplate.
While it is better than not using macros, they are not perfect. They need to:

  • Integrate with the garbage collector for container types (not needed
	  as of yet), as stated in the docstring
  • Allow for leaking multiple attributes at the same time
  • Inject the py_shared_state data attribute in py_class-generated
	  structs
  • Automatically namespace the functions and attributes they generate

For at least the last two points, we will need to write a procedural macro
instead of a declarative one.
While this reference-sharing mechanism is being ironed out I thought it best
not to implement it yet.

Lastly, and implementation detail renders our Rust-backed Python iterators too
strict to be proper drop-in replacements, as will be illustrated in a future
patch: if the data structure referenced by a non-depleted iterator is mutated,
an AlreadyBorrowed exception is raised, whereas Python would allow it, only
to raise a RuntimeError if next is called on said iterator. This will have
to be addressed at some point.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Alphare created this revision.Jul 10 2019, 7:46 AM

Is there any reason this can't be done using Rc<RefCell<DirsMultiset>>? I have an example here: https://rust.godbolt.org/z/MNNR_F

It uses Rc and RefCell however it does need to do some unsafe work to keep the RefCell borrowed for longer than would otherwise be easy to do. The advantages here are

  1. Lifetime is automatically managed by the Rc. This ensures that our parent data doesn't get deleted while the iterator is still alive.
  2. The linked implementation uses a mutable borrow. However it would be easy to make an immutable version which would allow multiple iterators over the same "container".
  3. Mutable access is managed by the RefCell so we don't need to do our own.
rust/hg-cpython/src/macros.rs
45 ↗(On Diff #15844)

You aren't really using the RefCell type here. It might be better to just use Cell for the count and UnsafeCell for the data.

57 ↗(On Diff #15844)

Instead of adding methods to the interface type I would just create a templated struct which has these methods as well as the data and count. It doesn't seem like these actually need to be in the parent type.

61 ↗(On Diff #15844)

I'm failing to see what actually prevents the "container" from being dropped while the "iterator" is alive. Is this somehow tied to the Python GC?

For example:

c = RustObject()
i = iter(i)
del c
# What is keeping the backing rust memory alive at this point.
gracinet added inline comments.
rust/hg-cpython/src/macros.rs
61 ↗(On Diff #15844)

Hi Kevin,

all that CPython's del does is decrement the reference count, and if it drops to 0, then the memory will be freed immediately. On the other hand, the clone_ref() at line 106 does increment it.

So, in your example, after del c the refcount of c is 1 and the memory shouldn't be freed.

Is there any reason this can't be done using Rc<RefCell<DirsMultiset>>? I have an example here: https://rust.godbolt.org/z/MNNR_F

I see now that since python is managing the reference it does make some sense to do it this way.

rust/hg-cpython/src/macros.rs
55 ↗(On Diff #15844)

The terminology is weird here. You don't appear to be tracking leaks, just borrows.

61 ↗(On Diff #15844)

Thanks. I missed that.

85 ↗(On Diff #15844)

This catches a mutable borrow after other borrows, however I believe it fails to catch a mutable borrow followed by immutable borrows.

Alphare edited the summary of this revision. (Show Details)Jul 24 2019, 11:23 AM
Alphare updated this revision to Diff 16050.

@kevincox I've renamed the file since it no longer just contains macros.

I've moved whatever I could move to separate structs and used Cell instead of RefCell on those scalar types. I'm not so keen on using UnsafeCell, espacially since the target py_class!-resulting struct could be using RefCell-specific APIs.

I'm not too sure on the terminology, so tell me if you have better ideas than mine.

Later down the line, I could see a trait to help define the interface for shared structs.

kevincox accepted this revision.Aug 12 2019, 12:11 PM

@kevincox I've renamed the file since it no longer just contains macros.
I've moved whatever I could move to separate structs and used Cell instead of RefCell on those scalar types. I'm not so keen on using UnsafeCell, espacially since the target py_class!-resulting struct could be using RefCell-specific APIs.

Ack. it still feels like we are kinda implementing a second RefCell anyways but I think this code is quite clean.

it still feels like we are kinda implementing a second RefCell anyways

We all share that feeling, I think. Let's hope we'll be able to circumvent that in some future.

but I think this code is quite clean.

Thanks, @Alphare is currently on leave. I'm sure he'll appreciate when he comes back.

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.
yuja added a subscriber: yuja.Aug 17 2019, 4:43 AM

+impl PySharedState {
+ pub fn borrow_mut<'a, T>(
+ &'a self,
+ py: Python<'a>,
+ pyrefmut: RefMut<'a, T>,
+ ) -> PyResult<PyRefMut<'a, T>> {
+ if self.mutably_borrowed.get() {
+ return Err(AlreadyBorrowed::new(
+ py,
+ "Cannot borrow mutably while there exists another \
+ mutable reference in a Python object",
+ ));
+ }
+ match self.leak_count.get() {
+ 0 => {
+ self.mutably_borrowed.replace(true);
+ Ok(PyRefMut::new(py, pyrefmut, self))
+ }
+ TODO
+
For now, this works differently than Python references
+ in the case of iterators.
+
Python does not complain when the data an iterator
+ points to is modified if the iterator is never used
+
afterwards.
+ Here, we are stricter than this by refusing to give a
+
mutable reference if it is already borrowed.
+ While the additional safety might be argued for, it
+
breaks valid programming patterns in Python and we need
+ to fix this issue down the line.
+ _ => Err(AlreadyBorrowed::new(
+ py,
+ "Cannot borrow mutably while there are \
+ immutable references in Python objects",
+ )),
+ }
+ }
+
+
/ Return a reference to the wrapped data with an artificial static
+ / lifetime.
+
/ We need to be protected by the GIL for thread-safety.
+ pub fn leak_immutable<T>(
+ &self,
+ py: Python,
+ data: &RefCell<T>,
+ ) -> PyResult<&'static T> {
+ if self.mutably_borrowed.get() {
+ return Err(AlreadyBorrowed::new(
+ py,
+ "Cannot borrow immutably while there is a \
+ mutable reference in Python objects",
+ ));
+ }
+ let ptr = data.as_ptr();
+ self.leak_count.replace(self.leak_count.get() + 1);
+ unsafe { Ok(&*ptr) }
+ }

This leak_immutable() looks potentially unsafe since it can extend the
lifetime of an arbitrary object, data.

+macro_rules! py_shared_ref {
+ (
+ $name: ident,
+ $inner_struct: ident,
+ $data_member: ident,
+ $leaked: ident,
+ ) => {
+ impl $name {
+ fn borrow_mut<'a>(
+ &'a self,
+ py: Python<'a>,
+ ) -> PyResult<crate::ref_sharing::PyRefMut<'a, $inner_struct>>
+ {
+ self.py_shared_state(py)
+ .borrow_mut(py, self.$data_member(py).borrow_mut())
+ }
+
+ fn leak_immutable<'a>(
+ &'a self,
+ py: Python<'a>,
+ ) -> PyResult<&'static $inner_struct> {
+ self.py_shared_state(py)
+ .leak_immutable(py, self.$data_member(py))
+ }
+ }

Is it okay to directly borrow/borrow_mut() the inner object?

For example, we can still do self.inner(py).borrow_mut() (in place of
self.borrow_mut(py)) while the inner object is leaked, which I think is
unsafe.

If I understand the py-shared logic correctly, PySharedState needs to own
the inner object to guarantee that any public operations are safe.