Details
- Reviewers
- None
- Group Reviewers
hg-reviewers - Commits
- rHGb85b377e7fc2: index: make node tree a Python object
Diff Detail
- Repository
- rHG Mercurial
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Can you bump the cext version as this patch introduces new API?
+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;
It leaves self->nodes uninitialized on error, and nt_dealloc() would fail if
self->nodes wasn't luckily 0. Strictly speaking, it's too late to initialize
pointers in tp_init because init() may be called more than once, but our
C types don't handle such cases.
+ return nt_init(self, (indexObject*)index, capacity);
We'll probably need INCREF/DECREF business for the index object.
I didn't review the other refcounting thingy carefully. Since it's painful
to do refcounting right, an internal nodetree could be embedded in the
indexObject, and a thin PyObject wrapper could be added. Just an idea.
struct nodetree { }; struct nodetreeObject { PyObject_HEAD nodetree nt; }; struct indexObject { ... nodetree nt; ... };
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;
+ return nt_init(self, (indexObject*)index, capacity);
One more thing, we'll probably need to enforce the index type, by "O!".
And a lookup of 0xxx... hash might not work well due to the nullid entry.
I recalled it while I was making a breakfast. I might be wrong.
Done. Thanks for the reminder.
+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
+ return -1;It leaves self->nodes uninitialized on error, and nt_dealloc() would fail if
self->nodes wasn't luckily 0. Strictly speaking, it's too late to initialize
pointers in tp_init because init() may be called more than once, but our
C types don't handle such cases.
I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0). The only reason I think that might be the case is that we haven't seen any failures because of uninitialized memory in index->nt before (which used to be index->nodes). So I don't think we need to do anything about this. Let me know if you think we should change something.
+ return nt_init(self, (indexObject*)index, capacity);
We'll probably need INCREF/DECREF business for the index object.
Yes, I had meant to do that but then forgot, so thanks for reminding me. Done.
I didn't review the other refcounting thingy carefully. Since it's painful
to do refcounting right, an internal nodetree could be embedded in the
indexObject, and a thin PyObject wrapper could be added. Just an idea.
Do you mean the refcounting of the nodetree? Assuming that the previous calls to free() (which I changed to PyMem_Free()) were correct, shouldn't it still be correct after replacing those by nt_dealloc()?
I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0).
Interesting. tp_alloc is documented to initialize memory to zeros.
https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc
So e57c532c3835 was a moot?
The only reason I think that might be the case is that we haven't seen any failures because of uninitialized memory in index->nt before (which used to be index->nodes).
self->nt was initialized very first at index_init().
> I didn't review the other refcounting thingy carefully. Since it's painful > to do refcounting right, an internal nodetree could be embedded in the > indexObject, and a thin PyObject wrapper could be added. Just an idea. Do you mean the refcounting of the nodetree? Assuming that the previous calls to `free()` (which I changed to `PyMem_Free()`) were correct, shouldn't it still be correct after replacing those by `nt_dealloc()`?
Refcounting of self->nt. I just meant I didn't review the code carefully.
+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "OI", &index, &capacity))
"O!I" to make sure index is an index object.
> I think the default tp_new clears the memory of the object, so it won't be uninitialized (maybe that's what you mean by "luckily", but I first thought your meant it would require luck for it to be set to 0). Interesting. tp_alloc is documented to initialize memory to zeros. https://docs.python.org/2/c-api/typeobj.html#c.PyTypeObject.tp_alloc So https://phab.mercurial-scm.org/rHGe57c532c3835f6b244f21815cafcce0df1d272ce was a moot?
Nah, it crashed. Perhaps, tp_new and tp_alloc wouldn't be called by
PyObject_New(), which is said that "fields not defined by the Python
object header are not initialized."
https://docs.python.org/2/c-api/allocation.html#c.PyObject_New
In e57c532c3835^, parsers.parse_index2(0, 0) crashed but parsers.index()
didn't if I fixed Py_DECREF(self->data) in index_dealloc().
Thanks for the pointer. I had not noticed that comment even though I had spent quite a lot of time in this code. And I'm sorry you had to do the troubleshooting :(
I think I've fixed it by making nt_init() set self->nodes = NULL; early. I tried to define a tp_new function, but it turned out that PyObject_New doesn't call it and the workaround that Stack Overflow suggested didn't seem worth it (use PyObject_CallObject instead).
I won't have access to a computer for a week now, just so you know in case this patch needs more work.
static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;
This comment seems a bit confusing since another argument-checking is done
before nt_init_py().
self->index = index;+ Py_INCREF(index);
/* The input capacity is in terms of revisions, while the field is in * terms of nodetree nodes. */ self->capacity = (capacity < 4 ? 4 : capacity / 2);@@ -1083,6 +1088,15 @@
return 0;}
+static int nt_init_py(nodetree *self, PyObject *args)
+{
+ PyObject *index;
+ unsigned capacity;
+ if (!PyArg_ParseTuple(args, "O!I", &index, &capacity))
+ return -1;
+ return nt_init(self, (indexObject*)index, capacity);
+}
static int nt_init(nodetree *self, indexObject *index, unsigned capacity)
{
+ /* Initialize before argument-checking to avoid nt_dealloc() crash. */
+ self->nodes = NULL;
+self->index = index;+ Py_INCREF(index);
While thinking about a pure nodetree, I noticed this makes a similar situation
to reference cycle between two C objects, index and index->nt. The index can't
be freed until index->nt gets deleted.
Perhaps the easiest way around is to convert an internal nodetree back to
a plain C struct.
Good point.
Perhaps the easiest way around is to convert an internal nodetree back to
a plain C struct.
You mean to embed that plain C struct in the parsers.index Python type and in the parsers.nodetree type as you suggested earlier? Yes, that's probably our best option. I'll start working on that in a bit.
> Perhaps the easiest way around is to convert an internal nodetree back to > a plain C struct. You mean to embed that plain C struct in the `parsers.index` Python type and in the `parsers.nodetree` type as you suggested earlier? Yes, that's probably our best option. I'll start working on that in a bit.
Yes, that's one way, and I think is the simplest option.