+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;

It leaves self->nodes uninitialized on error, and nt_dealloc() would fail if
self->nodes wasn't luckily 0. Strictly speaking, it's too late to initialize
pointers in tp_init because init() may be called more than once, but our
C types don't handle such cases.

+ return nt_init(self, (indexObject*)index, capacity);

We'll probably need INCREF/DECREF business for the index object.

I didn't review the other refcounting thingy carefully. Since it's painful
to do refcounting right, an internal nodetree could be embedded in the
indexObject, and a thin PyObject wrapper could be added. Just an idea.

struct nodetree {
};

struct nodetreeObject {
    PyObject_HEAD
    nodetree nt;
};

struct indexObject {
    ...
    nodetree nt;
    ...
};

+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;
+ return nt_init(self, (indexObject*)index, capacity);

One more thing, we'll probably need to enforce the index type, by "O!".

And a lookup of 0xxx... hash might not work well due to the nullid entry.
I recalled it while I was making a breakfast. I might be wrong.

In D4118#64261, @yuja wrote:

Can you bump the cext version as this patch introduces new API?

Done. Thanks for the reminder.

+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;

It leaves self->nodes uninitialized on error, and nt_dealloc() would fail if
self->nodes wasn't luckily 0. Strictly speaking, it's too late to initialize
pointers in tp_init because init() may be called more than once, but our
C types don't handle such cases.

I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0). The only reason I think that might be the case is that we haven't seen any failures because of uninitialized memory in index->nt before (which used to be index->nodes). So I don't think we need to do anything about this. Let me know if you think we should change something.

+ return nt_init(self, (indexObject*)index, capacity);

We'll probably need INCREF/DECREF business for the index object.

Yes, I had meant to do that but then forgot, so thanks for reminding me. Done.

I didn't review the other refcounting thingy carefully. Since it's painful
to do refcounting right, an internal nodetree could be embedded in the
indexObject, and a thin PyObject wrapper could be added. Just an idea.

Do you mean the refcounting of the nodetree? Assuming that the previous calls to free() (which I changed to PyMem_Free()) were correct, shouldn't it still be correct after replacing those by nt_dealloc()?

martinvonz updated this revision to Diff 10082.Aug 8 2018, 8:09 PM

martinvonz mentioned this in D4163: shortest: don't include nullid in disambigution revset.Aug 9 2018, 3:27 AM

martinvonz updated this revision to Diff 10093.

martinvonz added a parent revision: D4166: index: make capacity argument to nt_init be measured in revisions.Aug 9 2018, 3:28 AM

martinvonz removed a parent revision: D4117: index: move index_clearcaches() further down.

I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).

Interesting. tp_alloc is documented to initialize memory to zeros.

https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc

So e57c532c3835 was a moot?

The only reason I think that might be the case is that we haven't seen any failures because of uninitialized memory in index->nt before (which used to be index->nodes).

self->nt was initialized very first at index_init().

> I didn't review the other refcounting thingy carefully. Since it's painful
>  to do refcounting right, an internal nodetree could be embedded in the
>  indexObject, and a thin PyObject wrapper could be added. Just an idea.
Do you mean the refcounting of the nodetree? Assuming that the previous calls to `free()` (which I changed to `PyMem_Free()`) were correct, shouldn't it still be correct after replacing those by `nt_dealloc()`?

Refcounting of self->nt. I just meant I didn't review the code carefully.

+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))

"O!I" to make sure index is an index object.

>   I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).
Interesting. tp_alloc is documented to initialize memory to zeros.
https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc
So https://phab.mercurial-scm.org/rHGe57c532c3835f6b244f21815cafcce0df1d272ce was a moot?

Nah, it crashed. Perhaps, tp_new and tp_alloc wouldn't be called by
PyObject_New(), which is said that "fields not defined by the Python
object header are not initialized."

https://docs.python.org/2/c-api/allocation.html#c.PyObject_New

In e57c532c3835^, parsers.parse_index2(0, 0) crashed but parsers.index()
didn't if I fixed Py_DECREF(self->data) in index_dealloc().

In D4118#64505, @yuja wrote:
>   I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).
Interesting. tp_alloc is documented to initialize memory to zeros.
https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc
So https://phab.mercurial-scm.org/rHGe57c532c3835f6b244f21815cafcce0df1d272ce was a moot?
Nah, it crashed. Perhaps, tp_new and tp_alloc wouldn't be called by
PyObject_New(), which is said that "fields not defined by the Python
object header are not initialized."
https://docs.python.org/2/c-api/allocation.html#c.PyObject_New
In e57c532c3835^, parsers.parse_index2(0, 0) crashed but parsers.index()
didn't if I fixed Py_DECREF(self->data) in index_dealloc().

Thanks for the pointer. I had not noticed that comment even though I had spent quite a lot of time in this code. And I'm sorry you had to do the troubleshooting :(

I think I've fixed it by making nt_init() set self->nodes = NULL; early. I tried to define a tp_new function, but it turned out that PyObject_New doesn't call it and the workaround that Stack Overflow suggested didn't seem worth it (use PyObject_CallObject instead).

I won't have access to a computer for a week now, just so you know in case this patch needs more work.

martinvonz updated this revision to Diff 10272.Aug 10 2018, 12:05 AM

martinvonz mentioned this in rHG4c4825db29e1: shortest: don't include nullid in disambigution revset.Aug 12 2018, 12:45 AM

martinvonz updated this revision to Diff 10456.Aug 20 2018, 3:35 AM

static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;

This comment seems a bit confusing since another argument-checking is done
before nt_init_py().

	self->index = index;
+ Py_INCREF(index);
	/* The input capacity is in terms of revisions, while the field is in
	 * terms of nodetree nodes. */
	self->capacity = (capacity < 4 ? 4 : capacity / 2);
@@ -1083,6 +1088,15 @@
	return 0;
}
+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "O!I", &index, &capacity))
+ return -1;
+ return nt_init(self, (indexObject*)index, capacity);
+}

yuja mentioned this in D4120: shortest: use nodetree for finding shortest node within revset.Aug 20 2018, 6:51 PM

Closed by commit rHGb85b377e7fc2: index: make node tree a Python object (authored by martinvonz). · Explain WhyAug 20 2018, 6:52 PM

This revision was automatically updated to reflect the committed changes.

static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;
+
	self->index = index;
+ Py_INCREF(index);

While thinking about a pure nodetree, I noticed this makes a similar situation
to reference cycle between two C objects, index and index->nt. The index can't
be freed until index->nt gets deleted.

Perhaps the easiest way around is to convert an internal nodetree back to
a plain C struct.

In D4118#66834, @yuja wrote:
static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;
+
	self->index = index;
+ Py_INCREF(index);
While thinking about a pure nodetree, I noticed this makes a similar situation
to reference cycle between two C objects, index and index->nt. The index can't
be freed until index->nt gets deleted.

Good point.

Perhaps the easiest way around is to convert an internal nodetree back to
a plain C struct.

You mean to embed that plain C struct in the parsers.index Python type and in the parsers.nodetree type as you suggested earlier? Yes, that's probably our best option. I'll start working on that in a bit.

> Perhaps the easiest way around is to convert an internal nodetree back to
>  a plain C struct.
You mean to embed that plain C struct in the `parsers.index` Python type and in the `parsers.nodetree` type as you suggested earlier? Yes, that's probably our best option. I'll start working on that in a bit.

Yes, that's one way, and I think is the simplest option.

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	9921		Aug 5 2018, 3:58 AM	★	★
Diff 2	9924		Aug 5 2018, 4:03 AM	★	★
Diff 3	9983		Aug 6 2018, 9:24 AM	★	★
Diff 4	10042		Aug 7 2018, 2:27 AM	★	★
Diff 5	10082		Aug 8 2018, 8:09 PM	★	★
Diff 6	10093		Aug 9 2018, 3:27 AM	★	★
Diff 7	10272		Aug 10 2018, 12:05 AM	★	★
Diff 8	10456		Aug 20 2018, 3:35 AM	★	★
Diff 9	10482	rHGb85b377e7fc216cd7adb0d93ffc8c047ac90b776	Jul 6 2018, 10:53 AM	★	★

Status	Author	Revision
Closed	martinvonz	D4120 shortest: use nodetree for finding shortest node within revset
Closed	martinvonz	D4119 index: move raise_revlog_error() further up
Closed	martinvonz	D4118 index: make node tree a Python object
Closed	martinvonz	D4166 index: make capacity argument to nt_init be measured in revisions
Closed	martinvonz	D4165 index: avoid duplicating capacity-growth expression
Closed	martinvonz	D4164 index: move check for too large capacity into nt_init()
Closed	martinvonz	D4163 shortest: don't include nullid in disambigution revset
Closed	martinvonz	D4162 index: don't include nullid in the internal "length" field

Diff 10482

mercurial/cext/parsers.c

	{"fm1readmarkers", fm1readmarkers, METH_VARARGS,			{"fm1readmarkers", fm1readmarkers, METH_VARARGS,
	"parse v1 obsolete markers\n"},			"parse v1 obsolete markers\n"},
	{NULL, NULL}};			{NULL, NULL}};

	void dirs_module_init(PyObject *mod);			void dirs_module_init(PyObject *mod);
	void manifest_module_init(PyObject *mod);			void manifest_module_init(PyObject *mod);
	void revlog_module_init(PyObject *mod);			void revlog_module_init(PyObject *mod);

	static const int version = 7;			static const int version = 8;

	static void module_init(PyObject *mod)			static void module_init(PyObject *mod)
	{			{
	PyModule_AddIntConstant(mod, "version", version);			PyModule_AddIntConstant(mod, "version", version);

	/* This module constant has two purposes. First, it lets us unit test			/* This module constant has two purposes. First, it lets us unit test
	* the ImportError raised without hard-coding any error text. This			* the ImportError raised without hard-coding any error text. This
	* means we can change the text in the future without breaking tests,			* means we can change the text in the future without breaking tests,

mercurial/cext/revlog.c

	/*			/*
	* A base-16 trie for fast node->rev mapping.			* A base-16 trie for fast node->rev mapping.
	*			*
	* Positive value is index of the next node in the trie			* Positive value is index of the next node in the trie
	* Negative value is a leaf: -(rev + 2)			* Negative value is a leaf: -(rev + 2)
	* Zero is empty			* Zero is empty
	*/			*/
	typedef struct {			typedef struct {
				PyObject_HEAD
	indexObject *index;			indexObject *index;
	nodetreenode *nodes;			nodetreenode *nodes;
	unsigned length; /* # nodes in use */			unsigned length; /* # nodes in use */
	unsigned capacity; /* # nodes allocated */			unsigned capacity; /* # nodes allocated */
	int depth; /* maximum depth of tree */			int depth; /* maximum depth of tree */
	int splits; /* # splits performed */			int splits; /* # splits performed */
	} nodetree;			} nodetree;

	static int nt_delete_node(nodetree self, const char node)			static int nt_delete_node(nodetree self, const char node)
	{			{
	/* rev==-2 happens to get encoded as 0, which is interpreted as not set */			/* rev==-2 happens to get encoded as 0, which is interpreted as not set */
	return nt_insert(self, node, -2);			return nt_insert(self, node, -2);
	}			}

	static int nt_init(nodetree self, indexObject index, unsigned capacity)			static int nt_init(nodetree self, indexObject index, unsigned capacity)
	{			{
				/* Initialize before argument-checking to avoid nt_dealloc() crash. */
				self->nodes = NULL;

	self->index = index;			self->index = index;
				Py_INCREF(index);
	/* The input capacity is in terms of revisions, while the field is in			/* The input capacity is in terms of revisions, while the field is in
	* terms of nodetree nodes. */			* terms of nodetree nodes. */
	self->capacity = (capacity < 4 ? 4 : capacity / 2);			self->capacity = (capacity < 4 ? 4 : capacity / 2);
	self->depth = 0;			self->depth = 0;
	self->splits = 0;			self->splits = 0;
	if ((size_t)self->capacity > INT_MAX / sizeof(nodetreenode)) {			if ((size_t)self->capacity > INT_MAX / sizeof(nodetreenode)) {
	PyErr_SetString(PyExc_ValueError, "overflow in init_nt");			PyErr_SetString(PyExc_ValueError, "overflow in init_nt");
	return -1;			return -1;
	}			}
	self->nodes = calloc(self->capacity, sizeof(nodetreenode));			self->nodes = calloc(self->capacity, sizeof(nodetreenode));
	if (self->nodes == NULL) {			if (self->nodes == NULL) {
	PyErr_NoMemory();			PyErr_NoMemory();
	return -1;			return -1;
	}			}
	self->length = 1;			self->length = 1;
	return 0;			return 0;
	}			}

				static PyTypeObject indexType;

				static int nt_init_py(nodetree self, PyObject args)
				{
				PyObject *index;
				unsigned capacity;
				if (!PyArg_ParseTuple(args, "O!I", &indexType, &index, &capacity))
				return -1;
				return nt_init(self, (indexObject*)index, capacity);
				}

	static int nt_partialmatch(nodetree self, const char node,			static int nt_partialmatch(nodetree self, const char node,
	Py_ssize_t nodelen)			Py_ssize_t nodelen)
	{			{
	return nt_find(self, node, nodelen, 1);			return nt_find(self, node, nodelen, 1);
	}			}

	/*			/*
	* Find the length of the shortest unique prefix of node.			* Find the length of the shortest unique prefix of node.
	* The node was still not unique after 40 hex digits, so this won't			* The node was still not unique after 40 hex digits, so this won't
	* happen. Also, if we get here, then there's a programming error in			* happen. Also, if we get here, then there's a programming error in
	* this file that made us insert a node longer than 40 hex digits.			* this file that made us insert a node longer than 40 hex digits.
	*/			*/
	PyErr_SetString(PyExc_Exception, "broken node tree");			PyErr_SetString(PyExc_Exception, "broken node tree");
	return -3;			return -3;
	}			}

				static void nt_dealloc(nodetree *self)
				{
				Py_XDECREF(self->index);
				free(self->nodes);
				self->nodes = NULL;
				PyObject_Del(self);
				}

				static PyTypeObject nodetreeType = {
				PyVarObject_HEAD_INIT(NULL, 0) /* header */
				"parsers.nodetree", /* tp_name */
				sizeof(nodetree) , /* tp_basicsize */
				0, /* tp_itemsize */
				(destructor)nt_dealloc, /* tp_dealloc */
				0, /* tp_print */
				0, /* tp_getattr */
				0, /* tp_setattr */
				0, /* tp_compare */
				0, /* tp_repr */
				0, /* tp_as_number */
				0, /* tp_as_sequence */
				0, /* tp_as_mapping */
				0, /* tp_hash */
				0, /* tp_call */
				0, /* tp_str */
				0, /* tp_getattro */
				0, /* tp_setattro */
				0, /* tp_as_buffer */
				Py_TPFLAGS_DEFAULT, /* tp_flags */
				"nodetree", /* tp_doc */
				0, /* tp_traverse */
				0, /* tp_clear */
				0, /* tp_richcompare */
				0, /* tp_weaklistoffset */
				0, /* tp_iter */
				0, /* tp_iternext */
				0, /* tp_methods */
				0, /* tp_members */
				0, /* tp_getset */
				0, /* tp_base */
				0, /* tp_dict */
				0, /* tp_descr_get */
				0, /* tp_descr_set */
				0, /* tp_dictoffset */
				(initproc)nt_init_py, /* tp_init */
				0, /* tp_alloc */
				};

	static int index_init_nt(indexObject *self)			static int index_init_nt(indexObject *self)
	{			{
	if (self->nt == NULL) {			if (self->nt == NULL) {
	self->nt = PyMem_Malloc(sizeof(nodetree));			self->nt = PyObject_New(nodetree, &nodetreeType);
	if (self->nt == NULL) {			if (self->nt == NULL) {
	PyErr_NoMemory();
	return -1;			return -1;
	}			}
	if (nt_init(self->nt, self, self->raw_length) == -1) {			if (nt_init(self->nt, self, self->raw_length) == -1) {
	PyMem_Free(self->nt);			nt_dealloc(self->nt);
	self->nt = NULL;			self->nt = NULL;
	return -1;			return -1;
	}			}
	if (nt_insert(self->nt, nullid, -1) == -1) {			if (nt_insert(self->nt, nullid, -1) == -1) {
	PyMem_Free(self->nt);			nt_dealloc(self->nt);
	self->nt = NULL;			self->nt = NULL;
	return -1;			return -1;
	}			}
	self->ntrev = (int)index_length(self);			self->ntrev = (int)index_length(self);
	self->ntlookups = 1;			self->ntlookups = 1;
	self->ntmisses = 0;			self->ntmisses = 0;
	}			}
	return 0;			return 0;
	free(self->cache);			free(self->cache);
	self->cache = NULL;			self->cache = NULL;
	}			}
	if (self->offsets) {			if (self->offsets) {
	PyMem_Free((void *)self->offsets);			PyMem_Free((void *)self->offsets);
	self->offsets = NULL;			self->offsets = NULL;
	}			}
	if (self->nt != NULL) {			if (self->nt != NULL) {
	free(self->nt->nodes);			nt_dealloc(self->nt);
	PyMem_Free(self->nt);
	}			}
	self->nt = NULL;			self->nt = NULL;
	Py_CLEAR(self->headrevs);			Py_CLEAR(self->headrevs);
	}			}

	static PyObject index_clearcaches(indexObject self)			static PyObject index_clearcaches(indexObject self)
	{			{
	_index_clearcaches(self);			_index_clearcaches(self);
	self->ntrev = -1;			self->ntrev = -1;
	self->ntlookups = self->ntmisses = 0;			self->ntlookups = self->ntmisses = 0;
	Py_RETURN_NONE;			Py_RETURN_NONE;
	}			}

	static void index_dealloc(indexObject *self)			static void index_dealloc(indexObject *self)
	{			{
	_index_clearcaches(self);			_index_clearcaches(self);
	Py_XDECREF(self->filteredrevs);			Py_XDECREF(self->filteredrevs);
	if (self->buf.buf) {			if (self->buf.buf) {
	PyBuffer_Release(&self->buf);			PyBuffer_Release(&self->buf);
	memset(&self->buf, 0, sizeof(self->buf));			memset(&self->buf, 0, sizeof(self->buf));
	}			}
	Py_XDECREF(self->data);			Py_XDECREF(self->data);
	Py_XDECREF(self->added);			Py_XDECREF(self->added);
				Py_XDECREF(self->nt);
	PyObject_Del(self);			PyObject_Del(self);
	}			}

	static PySequenceMethods index_sequence_methods = {			static PySequenceMethods index_sequence_methods = {
	(lenfunc)index_length, /* sq_length */			(lenfunc)index_length, /* sq_length */
	0, /* sq_concat */			0, /* sq_concat */
	0, /* sq_repeat */			0, /* sq_repeat */
	(ssizeargfunc)index_get, /* sq_item */			(ssizeargfunc)index_get, /* sq_item */
	void revlog_module_init(PyObject *mod)			void revlog_module_init(PyObject *mod)
	{			{
	indexType.tp_new = PyType_GenericNew;			indexType.tp_new = PyType_GenericNew;
	if (PyType_Ready(&indexType) < 0)			if (PyType_Ready(&indexType) < 0)
	return;			return;
	Py_INCREF(&indexType);			Py_INCREF(&indexType);
	PyModule_AddObject(mod, "index", (PyObject *)&indexType);			PyModule_AddObject(mod, "index", (PyObject *)&indexType);

				nodetreeType.tp_new = PyType_GenericNew;
				if (PyType_Ready(&nodetreeType) < 0)
				return;
				Py_INCREF(&nodetreeType);
				PyModule_AddObject(mod, "nodetree", (PyObject *)&nodetreeType);

	nullentry = Py_BuildValue(PY23("iiiiiiis#", "iiiiiiiy#"), 0, 0, 0,			nullentry = Py_BuildValue(PY23("iiiiiiis#", "iiiiiiiy#"), 0, 0, 0,
	-1, -1, -1, -1, nullid, 20);			-1, -1, -1, -1, nullid, 20);
	if (nullentry)			if (nullentry)
	PyObject_GC_UnTrack(nullentry);			PyObject_GC_UnTrack(nullentry);
	}			}

mercurial/policy.py

	return fakelocals[modname]			return fakelocals[modname]

	# keep in sync with "version" in C modules			# keep in sync with "version" in C modules
	_cextversions = {			_cextversions = {
	(r'cext', r'base85'): 1,			(r'cext', r'base85'): 1,
	(r'cext', r'bdiff'): 3,			(r'cext', r'bdiff'): 3,
	(r'cext', r'mpatch'): 1,			(r'cext', r'mpatch'): 1,
	(r'cext', r'osutil'): 4,			(r'cext', r'osutil'): 4,
	(r'cext', r'parsers'): 7,			(r'cext', r'parsers'): 8,
	}			}

	# map import request to other package or module			# map import request to other package or module
	_modredirects = {			_modredirects = {
	(r'cext', r'charencode'): (r'cext', r'parsers'),			(r'cext', r'charencode'): (r'cext', r'parsers'),
	(r'cffi', r'base85'): (r'pure', r'base85'),			(r'cffi', r'base85'): (r'pure', r'base85'),
	(r'cffi', r'charencode'): (r'pure', r'charencode'),			(r'cffi', r'charencode'): (r'pure', r'charencode'),
	(r'cffi', r'parsers'): (r'pure', r'parsers'),			(r'cffi', r'parsers'): (r'pure', r'parsers'),

			Path	Packages
M			mercurial/cext/parsers.c (2 lines)
M			mercurial/cext/revlog.c (81 lines)
M			mercurial/policy.py (2 lines)

This is an archive of the discontinued Mercurial Phabricator instance.

index: make node tree a Python object
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 10482

mercurial/cext/parsers.c

mercurial/cext/revlog.c

mercurial/policy.py

This is an archive of the discontinued Mercurial Phabricator instance.

index: make node tree a Python objectClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 10482

mercurial/cext/parsers.c

mercurial/cext/revlog.c

mercurial/policy.py

index: make node tree a Python object
ClosedPublic

Revision Contents
Changeset List