This is an archive of the discontinued Mercurial Phabricator instance.

rust-cpython: implementing Graph using C parents function
ClosedPublic

Authored by gracinet on Dec 15 2018, 6:42 AM.

Details

Summary

We introduce the Index struct that wraps the C index. It is
not intrinsically protected by the GIL (see the lengthy
discussion in its docstring). Improving on this seems
prematurate at this point.

A pointer to the parents function is stored on the parsers
C extension module as a capsule object.

This is the recommended way to export a C API for consumption
from other extensions.
See also: https://docs.python.org/2.7/c-api/capsule.html

In our case, we use it in cindex.rs, retrieving function
pointer from the capsule and storing it within the CIndex
struct, alongside with a pointer to the index. From there,
the implementation is very close to the one from hg-direct-ffi.

The naming convention for the capsule is inspired from the
one in datetime:

>>> import datetime
>>> datetime.datetime_CAPI
<capsule object "datetime.datetime_CAPI" at 0x7fb51201ecf0>

although in datetime's case, the capsule points to a struct holding
several type objects and methods.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

gracinet created this revision.Dec 15 2018, 6:42 AM

Here, I'd be tempted to submit a py_capsule_fn macro to rust-cpython, but I guess it can wait.

yuja added a subscriber: yuja.Dec 15 2018, 10:42 PM

+type IndexParentsFn = unsafe extern "C" fn(
+ index: *mut python_sys::PyObject,
+ rev: ssize_t,
+ ps: *mut [c_int; 2],
+ max_rev: c_int,
+) -> c_int;

The type differs from the private function. It's
int HgRevlogIndex_GetParents(PyObject *op, int rev, int *ps).

+impl Graph for Index {
+ /// wrap a call to the C extern parents function
+ fn parents(&self, rev: Revision) -> Result<[Revision; 2], GraphError> {
+ let mut res: [c_int; 2] = [0; 2];
+ let code = unsafe {
+ (self.parents)(
+ self.index.as_ptr(),
+ rev as ssize_t,
+ &mut res as *mut [c_int; 2],
+ rev,
+ )
+ };
+ match code {
+ 0 => Ok(res),
+ _ => Err(GraphError::ParentOutOfRange(rev)),
+ }

I think Python<'p> is needed here to make it safe, but I have no idea how
to express that clearly. Do we need a wrapper that implements Graph and
translates Graph::parents(rev) to Index::parents(py, rev)?

  • a/mercurial/cext/revlog.c

+++ b/mercurial/cext/revlog.c
@@ -2878,6 +2878,12 @@

	if (nullentry)
		PyObject_GC_UnTrack(nullentry);

+ void *caps = PyCapsule_New(

Please move void *caps to top. We still need to conform to C89.

@yuja yes sorry, forgot to adapt to the new signature. That'll be fixed in next version, and in revlog.c, the capsule pointer declaration is now at the top.

About protecting parents() with the GIL Python<'p>, I'll have to think more about it, but the solution that comes to mind is as you suggest a two-step wrapping: the long lived Index would wrap the actual index object and the function pointer retrieved from the capsule, but would not implement the Graph trait directly. A shorter term GILProtectedIndex would then implement the Graph trait by wrapping the Index together with the Python<'p> marker coming from a call to the Python interface (methods of the py_class! of this series), hence forbidding to release the GIL from any code that calls Graph::parents(). I think the current usage is safe, though, because our Python interface methods don't release the GIL, but it's true the Index isn't inherentrly safe.

A simpler possibility would be to go the other way: release the GIL from the Python interface methods, and reacquire it at each call of Graph::parents. I don't know what the overhead would be, especially with Mercurial being currently monothreaded (unless I'm mistaken), meaning that we don't have benefits to reap. Maybe I'll try that one, since it's so simple, and if I measure negligible overhead, I'll go for it.

As a side note, the same problem could arise with the direct-ffi bindings: the safety promise is held by the caller, this time from C code.

I gave the solution to reacquire the GIL explicitely from Index a quick try, without even releasing it from the callers (which is ok according to https://docs.python.org/2.7/c-api/init.html#c.PyGILState_Ensure), and it is more than a 20% penalty.

I'm measuring with hg perfancestors on mozilla-central, here are the medians of wall time

pure Python: 0.295
Rust cpython (unmodified): 0.101
Rust direct-ffi: 0.092
Rust cpython reacquiring the GIL: 0.124

yuja added a comment.Dec 17 2018, 7:23 AM
About protecting parents() with the GIL `Python<'p>`, I'll have to think more about it, but the solution that comes to mind is as you suggest a two-step wrapping: the long lived `Index` would wrap the actual index object and the function pointer retrieved from the capsule, but would not implement the `Graph` trait directly. A shorter term `GILProtectedIndex` would then implement the `Graph` trait by wrapping the `Index` together with the `Python<'p>` marker coming from a call to the Python interface (methods of the `py_class!` of this series), hence forbidding to release the GIL from any code that calls `Graph::parents()`. I think the current usage is safe, though, because our Python interface methods don't release the GIL, but it's true the `Index` isn't inherentrly safe.

Actually I like this idea since it will be a step forward to wrap the Index
object properly as a PyObject.

A simpler possibility would be to go the other way: release the GIL from the Python interface methods, and reacquire it at each call of `Graph::parents`. I don't know what the overhead would be, especially with Mercurial being currently monothreaded (unless I'm mistaken), meaning that we don't have benefits to reap. Maybe I'll try that one, since it's so simple, and if I measure negligible overhead, I'll go for it.
As a side note, the same problem could arise with the direct-ffi bindings: the safety promise is held by the caller, this time from C code.

Yes. The current code is safe unless we do invoke it with e.g.
py.allow_threads(), and I'm not so afraid of leaving the issue as TODO.

gracinet edited the summary of this revision. (Show Details)Dec 22 2018, 11:23 AM
gracinet updated this revision to Diff 12949.
This revision was automatically updated to reflect the committed changes.
yuja added a comment.Dec 24 2018, 5:31 AM

+/ # TODO find a solution to make it GIL safe again.
+
/
+/ This is non trivial, and can wait until we have a clearer picture with
+
/ more Rust Mercurial constructs.
+/
+
/ One possibility would be to a GILProtectedIndex wrapper enclosing
+/ a Python<'p> marker and have it be the one implementing the
+
/ Graph trait, but this would mean the Graph implementor would become
+/ likely to change between subsequent method invocations of the hg-core
+
/ objects (a serious change of the hg-core API):
+/ either exposing ways to mutate the Graph, or making it a non persistent
+
/ parameter in the relevant methods that need one.

Thinking this a bit further, I'm getting to feel that a "non persistent
parameter" will be an easier choice.

If an Index object were implemented in pure Rust, it would hold the entire
index data in memory. As we wouldn't want to memcpy such large object, there
would be some reference types (e.g. &Index, Rc<Index>, etc.) involved
somewhere. For instance, AncestorsIterator<G> might have to be
AncestorsIterator<G: 'g>, and holding a reference would slightly complicate
things in a similar way to holding Python<'p>.

Index could be backed by e.g. Rc<RefCell<_>> to allow any objects to own
<G: Index> copies, but I don't feel like this is a good design.

@yuja, thanks for the queueing !

Index could be backed by e.g. Rc<RefCell<_>> to allow any objects to own
<G: Index> copies, but I don't feel like this is a good design.

If we want one day to have Mercurial work in a multi-threaded way, I guess that something like Arc<RWLock<_>> would be about necessary for an index implemented in Rust. But anyway, we are thinking alike.