Now uniondatapackstore can also hold python data stores. PythonDataStore
wrapper simply passes function calls to underlying python objects and marshals
the output.
Details
- Reviewers
durham simonfar ryanmce - Group Reviewers
Restricted Project - Commits
- rFBHGX2e4f9ab6b9a0: cstore: uniondatapackstore support for python stores
- Added test case
- Tested on fbsource with treemanifest.usecunionstore=True
Diff Detail
- Repository
- rFBHGX Facebook Mercurial Extensions
- Branch
- pythonstore (bookmark) on default (branch)
- Lint
Lint OK - Unit
Unit Tests OK
Event Timeline
I don't think any of the current tests add a python store to the cunionstore. Could we get some tests to ensure this is working? cstore-uniondatapackstore.py is a unit test that makes it easy to write a new test for this.
Reviewed half of this. Will look at the rest later.
cstore/py-datapackstore.h | ||
---|---|---|
31 | Not quite alphabetical | |
225 | This requires that the caller create a pythondatastore type instead of passing in the actual python store (like a DatapackStore) they made right? Why not allow them to pass in an arbitrary store and wrap it in a PythonDataStore in here? |
cstore/py-pythondatastore.h | ||
---|---|---|
24 ↗ | (On Diff #1612) | Why void*? Couldn't it be PyObject* since this is always going to be called from python code and therefore all objects are PyObject's? |
26 ↗ | (On Diff #1612) | When you do this, you are telling the PythonObj that it now own's this PyObject, and that it should call decrement when it goes out of scope. In this case, it would mean it decrements the store ref count when this function is over, which isn't what we want. I think if you change the _store to be a PythonObj from the beginning, then you never have to cast it to anything, and the ref count will just be handled for you. |
40 ↗ | (On Diff #1612) | PyIter_Next returns a new reference, which means we need to decrement the ref count when we're done with it. So I'd change tuples type to be PythonObj so the ref count is handled for you. You'll then have to convert it to PyObject* to pass to PyArg_ParseTuple below, but that's ok. |
46 ↗ | (On Diff #1612) | When using s or z without # it means it can't contain a null character. It looks like you are using: s# for filename, filesize |
52 ↗ | (On Diff #1612) | If PyArg_ParseTuple fails, we need to raise the exception. PyArg_ParseTuple will set the python exception flag when it fails, so you just need to throw a pyexception (defined in our pythonutil.h), and then you can have a catch in the top level python-to-c interface function that catches pyexception and returns the appropriate error code to python. Search for pyexception in py-datapackstore.h to see how it's done elsewhere. |
55 ↗ | (On Diff #1612) | DeltaChainLink doesn't take ownership of the strings you pass in right? If these strings are temporarily created for the duration of the iterator, then when tuple goes out of scope and is freed, I think these pointers will become invalid? So we may need to either A) copy the strings when they are passed in to DeltaChainLink(), or B) attach all the returned tuple objects to PyDeltaChain as well, so they have the same lifetime as their associated DeltaChainLink objects. Option B would be more efficient |
62 ↗ | (On Diff #1612) | Same void* issue as above |
cstore/py-structs.h | ||
30 | The existence of this means we'll have python code constructing these. I think we want the PythonDataStore concept to be internal to the C++ implementation, and actual python code should never construct it. | |
cstore/pythondatastore.cpp | ||
22 | Why isn't the implementation of getdeltachani, getdeltachainraw, and getmessing just put inside this class? Instead of calling out to a global function? | |
45 | Why copy this? | |
cstore/pythondatastore.h | ||
21 | If this is a python object, this should be of type PythonObj, so all the ref counting is handled for you. |
cstore/pythondatastore.cpp | ||
---|---|---|
22 | I wanted to separate python specific from non python specific code because of 'Eden related stuff' as you told me | |
45 | Because key is a reference to const Key. In which case Single has to accept pointer to const Key in cosntructor and also return pointer to const Key. And we cannot simply pass a const pointer to Single class because it extends KeyIterator. So the simples solution was to make a copy. | |
cstore/pythondatastore.h | ||
21 | Yes, it should be PythonObj. However, I tried to separate python specific code from non python specific code and decided that PythonDataStore could be a wrapper which is not python specific (ironically) so that we dont iclude Python.h (bc it is needed only for PythonObj here) |
cstore/py-pythondatastore.h | ||
---|---|---|
24 ↗ | (On Diff #1612) | I think including python.h in PythonDataStore will be ok since PythonDataStore will only ever be included by a top level python-c file. |
cstore/pythondatastore.cpp | ||
22 | Since this is specific to the PythonDataStore, which will only ever be linked into Python specific implementations, I think it's ok to include python stuff here. | |
45 | You could probably use const_cast or static_cast or something to cast it to non-const at compile time, since we know the Key isn't being modified. Alternatively, it might make sense to change KeyIterator to return (const key)*, but that's probably more code churn that necessary right now. I'd just do the cast, since copying Key's requires allocations that we'd rather avoid. | |
cstore/pythondatastore.h | ||
21 | I think since this whole class is specific to python usage of the data store, it's ok for it to #include python.h. That won't prevent us from linking the other code into non-python implementations (like Eden) since they wouldn't be using pythondatastore.h anyway. |
cstore/py-pythondatastore.h | ||
---|---|---|
55 ↗ | (On Diff #1612) | To be honest, I am not sure about option B, even though I like it more. As I am not completely sure what PyArg_ParseTuple does, but my guess is that it simply allocates memory for filename, node ... arrays and puts the contents of the tuple in there. Which means that we will need to copy these arrays because on the next iteration of the cycle PyArg_ParseTuple can (potentially) override existing arrays, because there is no memory management for the filename, node.. allocated variables. |
cstore/py-datapackstore.h | ||
---|---|---|
225 | Tbh, now I am a little bit confused about how we deal with memory management of PythonDataStores. | |
225 | I essentially followed the pattern of py_datapackstore, because my previous implementation was ugly(in uniondatapackstore destructor I iterated over the vector of stores and explicitly freed the memory of pythondatastores only). Again, If I push_back pointers to PythonDataStore into stores vector then memory management has to be done by uniondatapackstore. Or if we make stores the vector of shared_ptr then the memory management of cdatapacks will be performed by uniondatapackstores which is not what we want. So I am a little bit confused about this part. |
Moved py-pythondatastore functions to pythondatastore.cpp where they belong.
Addressed memory management of PythonDataStores and memory management of
DeltaChainLink's when passed from python.
This diff does not include unit tests for PythonDataStore's yet.
Looking good! I think one more round will do it. The only real blocker now is getting the python.h references out of the core deltachain files
cstore/deltachain.cpp | ||
---|---|---|
10 | Why does this import python.h and pythonutil.h? I don't see it used in the file, and this would prevent Eden from linking in non-python deltachain code. | |
cstore/deltachain.h | ||
163 | Hmm, since PyDeltaChain is specific to python implementations, it should probably be moved out to another file. Then you could import the python util headers and avoid the forward declaration here. | |
163 | I'd name this _pythonChainLinks, so it refers to the semantic use instead of the object type. | |
cstore/py-datapackstore.h | ||
227 | I'd use make_shared here. It saves you one allocation (the shared_pointer ref count and the PythonDataStore are allocated together), and avoiding 'new' helps ensure maintainability of memory safety (i.e. if someone later added lines after this new and before the shared_ptr constructor, an exception could cause a memory leak). Then for the stores.push_back below you can ask the shared_ptr for the raw ptr | |
cstore/pythondatastore.cpp | ||
73 | We should probably throw a real exception. Probably pyexception if PyList_Append sets the exception bit for us. |
There is a test failure. I am not sure that fastdatapacks have to be in the uniondatapackstore.
tests/cstore-uniondatapackstore.py | ||
---|---|---|
160 | + ERROR: testGetDeltaChainMultiPackPyStore (__main__.uniondatapackstoretests) |
Code looks good. Need to fix up some test stuff then we're good to go.
tests/cstore-uniondatapackstore.py | ||
---|---|---|
31 | Instead of copying the tests like this, I think you can define a new class and just override createpack: class uniondatastorepythonpacktests(uniondatapackstoretests): def createPack(self, *args, **kwargs): fastpack = super(uniondatapackstorepythonpacktests).createPack(*args, **kwargs) return datapack(fastpack.path) # you will need to import datapack from remotefilelog.datapack Then it will run all the same tests, but with python packs instead of fastdatapacks. Technically a fastdatapack is python store, and it just happens to wrap a cdatapack | |
160 | I think this is because UnionDeltaChainIterator::getNextChain tries to combine chains from multiple stores at once, and it's not handling the case where one store doesn't have the Key at all. It looks like the expected error path is for getDeltaChainRaw to return a DeltaChain with a code() value that is not GET_DELTA_CHAIN_OK. I think we need to change PythonDataStore::getDeltaChainRaw to catch KeyError exceptions when it calls PythonObj list = _store.callmethod("getdeltachain", pyKey); and turn those into return values. |
tests/cstore-uniondatapackstore.py | ||
---|---|---|
31 | I dont think we can do that. Because in the first case uniondatapackstore accepts datapackstore(packdir), but when testing datapacks we want it to accepts multiple datapack(packpath) i.e. not packdir |
Updated tests and got rid of a bug. However, been stuck with anothoer for quite a while now.
cstore/pythondatastore.cpp | ||
---|---|---|
40 | Took me quite a while to figure out that we need to clear python exception, otherwise after entering once again we will exit immediately | |
112–114 | I have been stuck with this thing for quite a while now. it->next() is giving me segmentation fault. Is this because variable 'it' has a pointer to KeyIterator but KeyIterator is an abstract class and doesnt have implementation of next? I.e. at runtime it is not known that it is actually a pointer to PythonKeyIterator, right? |
cstore/py-datapackstore.h | ||
---|---|---|
346–347 | for debugging purposes only |
cstore/py-datapackstore.h | ||
---|---|---|
348 | I'd expand this to the full 3 lines, with braces. Generally we've avoided single line if's like this because they are error prone during later refactors. Also, why change away from the normal while loop pattern? | |
cstore/pythondatastore.cpp | ||
38 | It'd be nice if we could verify that the current exception is a KeyError before we eat it. | |
42 | If the store doesn't contain the key, we need to return an error code. UnionDeltaChainIterator.getNextChain expects it to return a chain->status() != GET_DELTA_CHAIN_OK if there's an error. Returning an empty chain just causes the union getNextChain to return the empty chain and stop searching other stores. | |
113 | Was this a debug line? | |
tests/cstore-uniondatapackstore.py | ||
60–61 | We should probably rename this function to createPackStore instead of createPack, since it doesn't actually always return a pack. |
cstore/pythondatastore.cpp | ||
---|---|---|
42 | chain->status() of the empty delta chain returns GET_DELTA_CHAIN_NOT_FOUND. |
cstore/pythondatastore.cpp | ||
---|---|---|
113 | Yes. It is a debug line. |
cstore/pythondatastore.cpp | ||
---|---|---|
42 | Eh, that seems pretty subtle. But ok for now I guess. |
cstore/py-datapackstore.h | ||
---|---|---|
220–234 | I'd find this easier to read as a switch block: switch(is_cdatapack) { case 1: // Do things break; case 0: // Do things break; default: // Error return -1; } |
Hmm, since PyDeltaChain is specific to python implementations, it should probably be moved out to another file. Then you could import the python util headers and avoid the forward declaration here.