This is an archive of the discontinued Mercurial Phabricator instance.

Differential D5550

rust-cpython: bindings for MissingAncestors
ClosedPublic

Authored by gracinet on Jan 10 2019, 4:56 AM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Commits

rHG006c9ce486fa: rust-cpython: bindings for MissingAncestors

Summary

The exposition is rather straightforward, except for the
remove_ancestors_from() method, which forces us to an inefficient
conversion between Python sets and Rust HashSets.

Two alternatives are proposed in comments:

changing the inner API to "emit" the revision numbers to discard this would be a substantial change, and it would be better only in the cases where there are more to retain than to discard
mutating the Python set directly: this would force us to define an abstract RevisionSet trait, and implement it both for plain HashSet and for a struct enclosing a Python set with the GIL marker Python<'p>, also a non trivial effort.

The main (and seemingly only) caller of this method being
mercurial.setdiscovery, which is currently undergoing serious refactoring,
it's not clear whether these improvements would be worth the effort right now,
so we're leaving it as-is.

Also, in get_bases() (will also be used by setdiscovery), we'd prefer
to build a Python set directly, but we resort to returning a tuple, waiting
to hear back from our PR onto rust-cpython about that

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

gracinet created this revision.Jan 10 2019, 4:56 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptJan 10 2019, 4:56 AM

Herald added subscribers: mercurial-devel, kevincox, durin42. · View Herald Transcript

gracinet added a child revision: D5551: rust-cpython: using MissingAncestors from Python code.Jan 10 2019, 4:56 AM

Queued up to this patch, thanks.

+ def new(_cls, index: PyObject, bases: PyObject) -> PyResult<MissingAncestors> {
+ let bases_vec: Vec<Revision> = rev_pyiter_collect(py, &bases)?;
+ let inner = CoreMissing::new(Index::new(py, index)?, bases_vec);

We might want to directly build HashSet<Revision> here if that matters.

+ def missingancestors(&self, revs: PyObject) -> PyResult<PyList> {
+ let mut inner = self.inner(py).borrow_mut();
+ let revs_vec: Vec<Revision> = rev_pyiter_collect(py, &revs)?;
+ let missing_vec = match inner.missing_ancestors(revs_vec) {
+ Ok(missing) => missing,
+ Err(e) => {
+ return Err(GraphError::pynew(py, e));
+ }
+ };

+ // convert as Python list
+ let mut missing_pyint_vec: Vec<PyObject> = Vec::with_capacity(
+ missing_vec.len());
+ for rev in missing_vec {
+ missing_pyint_vec.push(rev.to_py_object(py).into_object());
+ }
+ Ok(PyList::new(py, missing_pyint_vec.as_slice()))

Maybe this can be extracted to a helper function so that we can .map()
the result.

Closed by commit rHG006c9ce486fa: rust-cpython: bindings for MissingAncestors (authored by gracinet). · Explain WhyJan 11 2019, 9:16 AM

This revision was automatically updated to reflect the committed changes.

kevincox added inline comments.Jan 12 2019, 6:30 AM

rust/hg-cpython/src/ancestors.rs
170	This could be a `.collect()` call. Something like: let bases_vec: Vec<PyObject> = bases_set.into_iter() .map(\|rev\| rev.to_py_object(py).into_object()) .collect();
193	This could also be a `collect()`.
204	This match can become a `.map_error(\|e\| GraphError::pynew(py, e))?`
212	This can also be a `.collect()`

gracinet added inline comments.Jan 14 2019, 1:16 PM

rust/hg-cpython/src/ancestors.rs
170	Hi @kevincox. I have a general question for these, please correct me if I'm making wrong assumptions. Given that we know in advance exactly the number of elements required, and that it can be large, doesn't it add overhead to convert everything to collection of iterators? I suppose the `Vec` would have to grow several times, and that's as many calls to `malloc()` inernally. I'm not sure at this point it would be very costly, but in this case where we consume the `Vec` immediately, is there a reason to believe that going the `collect()` way could have their own performance benefits? Note: this code landed, but I'm about to submit some related refactors, and in this specific case, it's going to be replace I hope very soon with a call to `PySet`, but I'm asking in general. Cheers,

kevincox added inline comments.Jan 14 2019, 6:16 PM

rust/hg-cpython/src/ancestors.rs
170	Iterator has a size_hint method which can provide a clue as to the size of the iterator if know. There is also the ExactSizeIterator trait but it is sufficient to say that most simple operations on slice (or Vec) iterators will maintain the size hint and that collecting into a vector will be the most efficient way to construct a vector. In theory the performance could be slightly better for the collect approach as you could avoid some bounds checking and incrementing the size each time but in practice I would expect similar performance. So the TL;DR is don't worry about collect performance especially for the simple situations. If there are more allocations then necessary then the bug is in the rust `std` crate.

gracinet added inline comments.Jan 14 2019, 7:45 PM

rust/hg-cpython/src/ancestors.rs
170	Thanks for the detailed answer. Then the problem, in case of iterators coming directly from Python, is that they don't currently implement `size_hint()`, so there's an improvement to be done at this level (many Pythoin iterators do have a `__length_hint()` method, so it shouldn't be a problem to reuse it) I have currently some suspicions that the conversions between Rust and Python are the bottleneck in some important cases, so I'll have to measure this things anyway. I won't touch that immediately, but I'll get back to it with real soon.

gracinet marked 2 inline comments as done.Jan 14 2019, 8:57 PM

gracinet added inline comments.

rust/hg-cpython/src/ancestors.rs
170	Got this one in the wrong direction (sorry about that!) : in this case, this is a conversion from Rust, so we de have proper `size_hint()` Still, I'll keep in mind the question for the other direction (in which we are using `collect()` already.

kevincox added inline comments.Jan 15 2019, 3:02 AM

rust/hg-cpython/src/ancestors.rs
170	That makes sense. Also if you still want to use iterators without wiring up size hints properly you could do something like: let v = Vector::with_capacity(hint); v.extend(iterators);

Revision Contents
Changeset List

			Path	Packages
M			rust/hg-cpython/src/ancestors.rs (102 lines)
M			tests/test-rust-ancestor.py (19 lines)

Status	Author	Revision
Closed	gracinet	D5551 rust-cpython: using MissingAncestors from Python code
Closed	gracinet	D5550 rust-cpython: bindings for MissingAncestors
Closed	gracinet	D5549 rust-cpython: generalised conversion function
Closed	gracinet	D5548 rust-cpython: style consistency leftovers
Closed	gracinet	D5547 rust-cpython: consistency in use of hg-core constructs
Closed	gracinet	D5546 rust-cpython: rustdoc improvements

Diff 13174

rust/hg-cpython/src/ancestors.rs

	//! and can be used as replacement for the the pure `ancestor` Python module.			//! and can be used as replacement for the the pure `ancestor` Python module.
	//!			//!
	//! # Classes visible from Python:			//! # Classes visible from Python:
	//! - [`LazyAncestors`] is the Rust implementation of			//! - [`LazyAncestors`] is the Rust implementation of
	//! `mercurial.ancestor.lazyancestors`.			//! `mercurial.ancestor.lazyancestors`.
	//! The only difference is that it is instantiated with a C `parsers.index`			//! The only difference is that it is instantiated with a C `parsers.index`
	//! instance instead of a parents function.			//! instance instead of a parents function.
	//!			//!
				//! - [`MissingAncestors`] is the Rust implementation of
				//! `mercurial.ancestor.incrementalmissingancestors`.
				//!
				//! API differences:
				//! + it is instantiated with a C `parsers.index`
				//! instance instead of a parents function.
				//! + `MissingAncestors.bases` is a method returning a tuple instead of
				//! a set-valued attribute. We could return a Python set easily if our
				//! [PySet PR](https://github.com/dgrunwald/rust-cpython/pull/165)
				//! is accepted.
				//!
	//! - [`AncestorsIterator`] is the Rust counterpart of the			//! - [`AncestorsIterator`] is the Rust counterpart of the
	//! `ancestor._lazyancestorsiter` Python generator.			//! `ancestor._lazyancestorsiter` Python generator.
	//! From Python, instances of this should be mainly obtained by calling			//! From Python, instances of this should be mainly obtained by calling
	//! `iter()` on a [`LazyAncestors`] instance.			//! `iter()` on a [`LazyAncestors`] instance.
	//!			//!
	//! [`LazyAncestors`]: struct.LazyAncestors.html			//! [`LazyAncestors`]: struct.LazyAncestors.html
				//! [`MissingAncestors`]: struct.MissingAncestors.html
	//! [`AncestorsIterator`]: struct.AncestorsIterator.html			//! [`AncestorsIterator`]: struct.AncestorsIterator.html
	use cindex::Index;			use cindex::Index;
	use cpython::{			use cpython::{
	ObjectProtocol, PyClone, PyDict, PyModule, PyObject, PyResult, Python,			ObjectProtocol, PyClone, PyDict, PyList, PyModule, PyObject,
				PyResult, PyTuple, Python, PythonObject, ToPyObject,
	};			};
	use exceptions::GraphError;			use exceptions::GraphError;
	use hg::Revision;			use hg::Revision;
	use hg::{AncestorsIterator as CoreIterator, LazyAncestors as CoreLazy};			use hg::{
				AncestorsIterator as CoreIterator, LazyAncestors as CoreLazy,
				MissingAncestors as CoreMissing,
				};
	use std::cell::RefCell;			use std::cell::RefCell;
	use std::iter::FromIterator;			use std::iter::FromIterator;
				use std::collections::HashSet;

	/// Utility function to convert a Python iterable into various collections			/// Utility function to convert a Python iterable into various collections
	///			///
	/// We need this in particular to feed to various methods of inner objects			/// We need this in particular to feed to various methods of inner objects
	/// with `impl IntoIterator<Item=Revision>` arguments, because			/// with `impl IntoIterator<Item=Revision>` arguments, because
	/// a `PyErr` can arise at each step of iteration, whereas these methods			/// a `PyErr` can arise at each step of iteration, whereas these methods
	/// expect iterables over `Revision`, not over some `Result<Revision, PyErr>`			/// expect iterables over `Revision`, not over some `Result<Revision, PyErr>`
	fn rev_pyiter_collect<C>(py: Python, revs: &PyObject) -> PyResult<C>			fn rev_pyiter_collect<C>(py: Python, revs: &PyObject) -> PyResult<C>
	CoreLazy::new(Index::new(py, index)?, initvec, stoprev, inclusive)			CoreLazy::new(Index::new(py, index)?, initvec, stoprev, inclusive)
	.map_err(\|e\| GraphError::pynew(py, e))?;			.map_err(\|e\| GraphError::pynew(py, e))?;

	Self::create_instance(py, RefCell::new(Box::new(lazy)))			Self::create_instance(py, RefCell::new(Box::new(lazy)))
	}			}

	});			});

	/// Create the module, with `__package__` given from parent			py_class!(pub class MissingAncestors \|py\| {
				data inner: RefCell<Box<CoreMissing<Index>>>;

				def __new__(_cls, index: PyObject, bases: PyObject) -> PyResult<MissingAncestors> {
				let bases_vec: Vec<Revision> = rev_pyiter_collect(py, &bases)?;
				let inner = CoreMissing::new(Index::new(py, index)?, bases_vec);
				MissingAncestors::create_instance(py, RefCell::new(Box::new(inner)))
				}

				def hasbases(&self) -> PyResult<bool> {
				Ok(self.inner(py).borrow().has_bases())
				}

				def addbases(&self, bases: PyObject) -> PyResult<PyObject> {
				let mut inner = self.inner(py).borrow_mut();
				let bases_vec: Vec<Revision> = rev_pyiter_collect(py, &bases)?;
				inner.add_bases(bases_vec);
				// cpython doc has examples with PyResult<()> but this gives me
				// the trait `cpython::ToPyObject` is not implemented for `()`
				// so let's return an explicit None
				Ok(py.None())
				}

				def bases(&self) -> PyResult<PyTuple> {
				let inner = self.inner(py).borrow();
				let bases_set = inner.get_bases();
				// convert as Python tuple TODO how to return a proper Python set?
				let mut bases_vec: Vec<PyObject> = Vec::with_capacity(
				bases_set.len());
				for rev in bases_set {
				bases_vec.push(rev.to_py_object(py).into_object());
				}
				kevincoxUnsubmitted Done This could be a `.collect()` call. Something like: let bases_vec: Vec<PyObject> = bases_set.into_iter() .map(\|rev\| rev.to_py_object(py).into_object()) .collect(); kevincox: This could be a `.collect()` call. Something like: ``` let bases_vec: Vec<PyObject> =…
				gracinetAuthorUnsubmitted Not Done Hi @kevincox. I have a general question for these, please correct me if I'm making wrong assumptions. Given that we know in advance exactly the number of elements required, and that it can be large, doesn't it add overhead to convert everything to collection of iterators? I suppose the `Vec` would have to grow several times, and that's as many calls to `malloc()` inernally. I'm not sure at this point it would be very costly, but in this case where we consume the `Vec` immediately, is there a reason to believe that going the `collect()` way could have their own performance benefits? Note: this code landed, but I'm about to submit some related refactors, and in this specific case, it's going to be replace I hope very soon with a call to `PySet`, but I'm asking in general. Cheers, gracinet: Hi @kevincox. I have a general question for these, please correct me if I'm making wrong…
				kevincoxUnsubmitted Done Iterator has a size_hint method which can provide a clue as to the size of the iterator if know. There is also the ExactSizeIterator trait but it is sufficient to say that most simple operations on slice (or Vec) iterators will maintain the size hint and that collecting into a vector will be the most efficient way to construct a vector. In theory the performance could be slightly better for the collect approach as you could avoid some bounds checking and incrementing the size each time but in practice I would expect similar performance. So the TL;DR is don't worry about collect performance especially for the simple situations. If there are more allocations then necessary then the bug is in the rust `std` crate. kevincox: Iterator has a [size_hint](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.
				gracinetAuthorUnsubmitted Not Done Thanks for the detailed answer. Then the problem, in case of iterators coming directly from Python, is that they don't currently implement `size_hint()`, so there's an improvement to be done at this level (many Pythoin iterators do have a `__length_hint()` method, so it shouldn't be a problem to reuse it) I have currently some suspicions that the conversions between Rust and Python are the bottleneck in some important cases, so I'll have to measure this things anyway. I won't touch that immediately, but I'll get back to it with real soon. gracinet: Thanks for the detailed answer. Then the problem, in case of iterators coming directly from…
				gracinetAuthorUnsubmitted Not Done Got this one in the wrong direction (sorry about that!) : in this case, this is a conversion from Rust, so we de have proper `size_hint()` Still, I'll keep in mind the question for the other direction (in which we are using `collect()` already. gracinet: Got this one in the wrong direction (sorry about that!) : in this case, this is a conversion…
				kevincoxUnsubmitted Not Done That makes sense. Also if you still want to use iterators without wiring up size hints properly you could do something like: let v = Vector::with_capacity(hint); v.extend(iterators); kevincox: That makes sense. Also if you still want to use iterators without wiring up size hints properly…
				Ok(PyTuple::new(py, bases_vec.as_slice()))
				}

				def removeancestorsfrom(&self, revs: PyObject) -> PyResult<PyObject> {
				let mut inner = self.inner(py).borrow_mut();
				// this is very lame: we convert to a Rust set, update it in place
				// and then convert back to Python, only to have Python remove the
				// excess (thankfully, Python is happy with a list or even an iterator)
				// Leads to improve this:
				// - have the CoreMissing instead do something emit revisions to
				// discard
				// - define a trait for sets of revisions in the core and implement
				// it for a Python set rewrapped with the GIL marker
				let mut revs_pyset: HashSet<Revision> = rev_pyiter_collect(py, &revs)?;
				inner.remove_ancestors_from(&mut revs_pyset)
				.map_err(\|e\| GraphError::pynew(py, e))?;

				// convert as Python list
				let mut remaining_pyint_vec: Vec<PyObject> = Vec::with_capacity(
				revs_pyset.len());
				for rev in revs_pyset {
				remaining_pyint_vec.push(rev.to_py_object(py).into_object());
				}
				kevincoxUnsubmitted Not Done This could also be a `collect()`. kevincox: This could also be a `collect()`.
				let remaining_pylist = PyList::new(py, remaining_pyint_vec.as_slice());
				revs.call_method(py, "intersection_update", (remaining_pylist, ), None)
				}

				def missingancestors(&self, revs: PyObject) -> PyResult<PyList> {
				let mut inner = self.inner(py).borrow_mut();
				let revs_vec: Vec<Revision> = rev_pyiter_collect(py, &revs)?;
				let missing_vec = match inner.missing_ancestors(revs_vec) {
				Ok(missing) => missing,
				Err(e) => {
				return Err(GraphError::pynew(py, e));
				kevincoxUnsubmitted Not Done This match can become a `.map_error(\|e\| GraphError::pynew(py, e))?` kevincox: This match can become a `.map_error(\|e\| GraphError::pynew(py, e))?`
				}
				};
				// convert as Python list
				let mut missing_pyint_vec: Vec<PyObject> = Vec::with_capacity(
				missing_vec.len());
				for rev in missing_vec {
				missing_pyint_vec.push(rev.to_py_object(py).into_object());
				}
				kevincoxUnsubmitted Not Done This can also be a `.collect()` kevincox: This can also be a `.collect()`
				Ok(PyList::new(py, missing_pyint_vec.as_slice()))
				}
				});

				/// Create the module, with __package__ given from parent
	pub fn init_module(py: Python, package: &str) -> PyResult<PyModule> {			pub fn init_module(py: Python, package: &str) -> PyResult<PyModule> {
	let dotted_name = &format!("{}.ancestor", package);			let dotted_name = &format!("{}.ancestor", package);
	let m = PyModule::new(py, dotted_name)?;			let m = PyModule::new(py, dotted_name)?;
	m.add(py, "__package__", package)?;			m.add(py, "__package__", package)?;
	m.add(			m.add(
	py,			py,
	"__doc__",			"__doc__",
	"Generic DAG ancestor algorithms - Rust implementation",			"Generic DAG ancestor algorithms - Rust implementation",
	)?;			)?;
	m.add_class::<AncestorsIterator>(py)?;			m.add_class::<AncestorsIterator>(py)?;
	m.add_class::<LazyAncestors>(py)?;			m.add_class::<LazyAncestors>(py)?;
				m.add_class::<MissingAncestors>(py)?;

	let sys = PyModule::import(py, "sys")?;			let sys = PyModule::import(py, "sys")?;
	let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?;			let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?;
	sys_modules.set_item(py, dotted_name, &m)?;			sys_modules.set_item(py, dotted_name, &m)?;
	// Example C code (see pyexpat.c and import.c) will "give away the			// Example C code (see pyexpat.c and import.c) will "give away the
	// reference", but we won't because it will be consumed once the			// reference", but we won't because it will be consumed once the
	// Rust PyObject is dropped.			// Rust PyObject is dropped.
	Ok(m)			Ok(m)
	}			}

tests/test-rust-ancestor.py

	from __future__ import absolute_import			from __future__ import absolute_import
	import sys			import sys
	import unittest			import unittest

	try:			try:
	from mercurial import rustext			from mercurial import rustext
	rustext.__name__ # trigger immediate actual import			rustext.__name__ # trigger immediate actual import
	except ImportError:			except ImportError:
	rustext = None			rustext = None
	else:			else:
	# this would fail already without appropriate ancestor.__package__			# this would fail already without appropriate ancestor.__package__
	from mercurial.rustext.ancestor import (			from mercurial.rustext.ancestor import (
	AncestorsIterator,			AncestorsIterator,
	LazyAncestors			LazyAncestors,
				MissingAncestors,
	)			)

	try:			try:
	from mercurial.cext import parsers as cparsers			from mercurial.cext import parsers as cparsers
	except ImportError:			except ImportError:
	cparsers = None			cparsers = None

	# picked from test-parse-index2, copied rather than imported			# picked from test-parse-index2, copied rather than imported
	del ait			del ait
	self.assertEqual(sys.getrefcount(idx), start_count + 2)			self.assertEqual(sys.getrefcount(idx), start_count + 2)
	del lazy			del lazy
	self.assertEqual(sys.getrefcount(idx), start_count)			self.assertEqual(sys.getrefcount(idx), start_count)

	# let's check bool for an empty one			# let's check bool for an empty one
	self.assertFalse(LazyAncestors(idx, [0], 0, False))			self.assertFalse(LazyAncestors(idx, [0], 0, False))

				def testmissingancestors(self):
				idx = self.parseindex()
				missanc = MissingAncestors(idx, [1])
				self.assertTrue(missanc.hasbases())
				self.assertEqual(missanc.missingancestors([3]), [2, 3])
				missanc.addbases({2})
				self.assertEqual(set(missanc.bases()), {1, 2})
				self.assertEqual(missanc.missingancestors([3]), [3])

				def testmissingancestorsremove(self):
				idx = self.parseindex()
				missanc = MissingAncestors(idx, [1])
				revs = {0, 1, 2, 3}
				missanc.removeancestorsfrom(revs)
				self.assertEqual(revs, {2, 3})

	def testrefcount(self):			def testrefcount(self):
	idx = self.parseindex()			idx = self.parseindex()
	start_count = sys.getrefcount(idx)			start_count = sys.getrefcount(idx)

	# refcount increases upon iterator init...			# refcount increases upon iterator init...
	ait = AncestorsIterator(idx, [3], 0, True)			ait = AncestorsIterator(idx, [3], 0, True)
	self.assertEqual(sys.getrefcount(idx), start_count + 1)			self.assertEqual(sys.getrefcount(idx), start_count + 1)
	self.assertEqual(next(ait), 3)			self.assertEqual(next(ait), 3)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	13128		Jan 10 2019, 4:56 AM	★	★
Diff 2	13174	rHG006c9ce486fa492ca043e3a85dd8c9a6cb714ab3	Nov 30 2018, 2:05 PM	★	★