This is an archive of the discontinued Mercurial Phabricator instance.

index: make node tree a Python object
ClosedPublic

Authored by martinvonz on Aug 5 2018, 3:58 AM.

Details

Summary

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.Aug 5 2018, 3:58 AM
martinvonz updated this revision to Diff 9924.Aug 5 2018, 4:03 AM
martinvonz updated this revision to Diff 9983.Aug 6 2018, 9:24 AM
martinvonz updated this revision to Diff 10042.Aug 7 2018, 2:27 AM
yuja added a subscriber: yuja.Aug 7 2018, 10:03 AM

Can you bump the cext version as this patch introduces new API?

+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;

It leaves self->nodes uninitialized on error, and nt_dealloc() would fail if
self->nodes wasn't luckily 0. Strictly speaking, it's too late to initialize
pointers in tp_init because init() may be called more than once, but our
C types don't handle such cases.

+ return nt_init(self, (indexObject*)index, capacity);

We'll probably need INCREF/DECREF business for the index object.

I didn't review the other refcounting thingy carefully. Since it's painful
to do refcounting right, an internal nodetree could be embedded in the
indexObject, and a thin PyObject wrapper could be added. Just an idea.

struct nodetree {
};

struct nodetreeObject {
    PyObject_HEAD
    nodetree nt;
};

struct indexObject {
    ...
    nodetree nt;
    ...
};
yuja added a comment.Aug 7 2018, 7:05 PM

+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;
+ return nt_init(self, (indexObject*)index, capacity);

One more thing, we'll probably need to enforce the index type, by "O!".

And a lookup of 0xxx... hash might not work well due to the nullid entry.
I recalled it while I was making a breakfast. I might be wrong.

In D4118#64261, @yuja wrote:

Can you bump the cext version as this patch introduces new API?

Done. Thanks for the reminder.

+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;

It leaves self->nodes uninitialized on error, and nt_dealloc() would fail if
self->nodes wasn't luckily 0. Strictly speaking, it's too late to initialize
pointers in tp_init because init() may be called more than once, but our
C types don't handle such cases.

I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0). The only reason I think that might be the case is that we haven't seen any failures because of uninitialized memory in index->nt before (which used to be index->nodes). So I don't think we need to do anything about this. Let me know if you think we should change something.

+ return nt_init(self, (indexObject*)index, capacity);

We'll probably need INCREF/DECREF business for the index object.

Yes, I had meant to do that but then forgot, so thanks for reminding me. Done.

I didn't review the other refcounting thingy carefully. Since it's painful
to do refcounting right, an internal nodetree could be embedded in the
indexObject, and a thin PyObject wrapper could be added. Just an idea.

Do you mean the refcounting of the nodetree? Assuming that the previous calls to free() (which I changed to PyMem_Free()) were correct, shouldn't it still be correct after replacing those by nt_dealloc()?

martinvonz updated this revision to Diff 10082.Aug 8 2018, 8:09 PM
yuja added a comment.Aug 9 2018, 8:42 AM
I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).

Interesting. tp_alloc is documented to initialize memory to zeros.

https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc

So e57c532c3835 was a moot?

The only reason I think that might be the case is that we haven't seen any failures because of uninitialized memory in index->nt before (which used to be index->nodes).

self->nt was initialized very first at index_init().

> I didn't review the other refcounting thingy carefully. Since it's painful
>  to do refcounting right, an internal nodetree could be embedded in the
>  indexObject, and a thin PyObject wrapper could be added. Just an idea.
Do you mean the refcounting of the nodetree? Assuming that the previous calls to `free()` (which I changed to `PyMem_Free()`) were correct, shouldn't it still be correct after replacing those by `nt_dealloc()`?

Refcounting of self->nt. I just meant I didn't review the code carefully.

yuja added a comment.Aug 9 2018, 8:43 AM

+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))

"O!I" to make sure index is an index object.

yuja added a comment.Aug 9 2018, 9:24 AM
>   I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).
Interesting. tp_alloc is documented to initialize memory to zeros.
https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc
So https://phab.mercurial-scm.org/rHGe57c532c3835f6b244f21815cafcce0df1d272ce was a moot?

Nah, it crashed. Perhaps, tp_new and tp_alloc wouldn't be called by
PyObject_New(), which is said that "fields not defined by the Python
object header are not initialized."

https://docs.python.org/2/c-api/allocation.html#c.PyObject_New

In e57c532c3835^, parsers.parse_index2(0, 0) crashed but parsers.index()
didn't if I fixed Py_DECREF(self->data) in index_dealloc().

In D4118#64505, @yuja wrote:
>   I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).
Interesting. tp_alloc is documented to initialize memory to zeros.
https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc
So https://phab.mercurial-scm.org/rHGe57c532c3835f6b244f21815cafcce0df1d272ce was a moot?

Nah, it crashed. Perhaps, tp_new and tp_alloc wouldn't be called by
PyObject_New(), which is said that "fields not defined by the Python
object header are not initialized."
https://docs.python.org/2/c-api/allocation.html#c.PyObject_New
In e57c532c3835^, parsers.parse_index2(0, 0) crashed but parsers.index()
didn't if I fixed Py_DECREF(self->data) in index_dealloc().

Thanks for the pointer. I had not noticed that comment even though I had spent quite a lot of time in this code. And I'm sorry you had to do the troubleshooting :(

I think I've fixed it by making nt_init() set self->nodes = NULL; early. I tried to define a tp_new function, but it turned out that PyObject_New doesn't call it and the workaround that Stack Overflow suggested didn't seem worth it (use PyObject_CallObject instead).

I won't have access to a computer for a week now, just so you know in case this patch needs more work.

martinvonz updated this revision to Diff 10456.Aug 20 2018, 3:35 AM
yuja added a comment.Aug 20 2018, 6:51 PM

static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;

This comment seems a bit confusing since another argument-checking is done
before nt_init_py().

	self->index = index;

+ Py_INCREF(index);

	/* The input capacity is in terms of revisions, while the field is in
	 * terms of nodetree nodes. */
	self->capacity = (capacity < 4 ? 4 : capacity / 2);

@@ -1083,6 +1088,15 @@

	return 0;

}
+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "O!I", &index, &capacity))
+ return -1;
+ return nt_init(self, (indexObject*)index, capacity);
+}

This revision was automatically updated to reflect the committed changes.
yuja added a comment.Aug 22 2018, 8:12 AM

static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;
+

	self->index = index;

+ Py_INCREF(index);

While thinking about a pure nodetree, I noticed this makes a similar situation
to reference cycle between two C objects, index and index->nt. The index can't
be freed until index->nt gets deleted.

Perhaps the easiest way around is to convert an internal nodetree back to
a plain C struct.

In D4118#66834, @yuja wrote:

static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;
+

	self->index = index;

+ Py_INCREF(index);

While thinking about a pure nodetree, I noticed this makes a similar situation
to reference cycle between two C objects, index and index->nt. The index can't
be freed until index->nt gets deleted.

Good point.

Perhaps the easiest way around is to convert an internal nodetree back to
a plain C struct.

You mean to embed that plain C struct in the parsers.index Python type and in the parsers.nodetree type as you suggested earlier? Yes, that's probably our best option. I'll start working on that in a bit.

yuja added a comment.Aug 23 2018, 9:10 AM
> Perhaps the easiest way around is to convert an internal nodetree back to
>  a plain C struct.
You mean to embed that plain C struct in the `parsers.index` Python type and in the `parsers.nodetree` type as you suggested earlier? Yes, that's probably our best option. I'll start working on that in a bit.

Yes, that's one way, and I think is the simplest option.