This is an archive of the discontinued Mercurial Phabricator instance.

RFC cstore: gerenic uniondatapackstore
AbandonedPublic

Authored by ms2316 on Aug 23 2017, 9:18 AM.
Tags
None
Subscribers

Details

Reviewers
simonfar
durham
Group Reviewers
Restricted Project
Summary

Implementation of uniondatapacktore that can hold both cdatapacks and python
packs.

Test Plan

This is not a working version and the diff is for my own use

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Branch
genericstore (bookmark) on default (branch)
Lint
Lint ErrorsExcuse: RFC
SeverityLocationCodeMessage
Errorcstore/py-pythondatastore.h:1S&RXCheckCode
Errorcstore/pythondatastore.cpp:1S&RXCheckCode
Errorcstore/pythondatastore.h:1S&RXCheckCode
Unit
Unit Test Errors

Event Timeline

ms2316 created this revision.Aug 23 2017, 9:18 AM
Herald added a reviewer: Restricted Project. · View Herald TranscriptAug 23 2017, 9:18 AM

This diff is not for review!

simonfar requested changes to this revision.Aug 23 2017, 9:21 AM
simonfar added a subscriber: simonfar.

To push it back into your queue

This revision now requires changes to proceed.Aug 23 2017, 9:21 AM
ms2316 edited edge metadata.Aug 23 2017, 10:38 AM
ms2316 updated this revision to Diff 1199.

Fixed bugs. Implemented abstraction around delta chains and delta chain links.

ms2316 added inline comments.Aug 23 2017, 5:35 PM
cstore/datastore.h
58

The methods will be changed later to cope with other types of DeltaChain (i.e. deltachain from python objects)

durham added a subscriber: durham.Aug 23 2017, 8:00 PM
durham added inline comments.
cstore/datapackstore.cpp
106–107

Instead of this pattern, you could have getDeltaChainRaw return a shared_ptr<DeltaChain>, then have the freedeltachain call happen in the DeltaChain destructor. That way it's automatically freed whenever it goes out of scope (i.e. here, or when the above DeltaChainIterator is destructed)

194

We should almost never use new. Instead use make_shared and return a shared pointer, so the memory ownership is not ambiguous. For instance, if we return a raw pointer like this, it's not clear to the caller if it is their responsibility to destruct the object or not.

ms2316 marked an inline comment as done.Aug 24 2017, 9:31 AM
ms2316 updated this revision to Diff 1244.

Changed native pointers to shared pointers, addressed memory management issues

ms2316 marked an inline comment as done.Aug 24 2017, 9:31 AM
ms2316 updated this revision to Diff 1261.Aug 24 2017, 5:42 PM

This is not a working version. Added PythonDataStore to wrap around python stores. At this point pythondatastore.cpp was not added to the makefile.

ms2316 updated this revision to Diff 1262.Aug 24 2017, 5:50 PM

Removed some redundant code

durham requested changes to this revision.Aug 24 2017, 8:30 PM

I'm really liking how all the nasty delta_* C structs are getting removed from the code!

Before landing this, we'll want to make sure it also compiles on OSX. We've had some issues with gcc vs. clang differences in the past.

I think this diff is getting big enough that we may want to break it in two. First migrate the existing code to use your new Delta* and DataStore* classes, then a second diff implements the Python version of those classes.

cstore/datapackstore.h
40

Why change this to be a shared_ptr? I think the lifetime of this object is controlled by this class, so it's just handing a reference to the caller.

cstore/datastore.h
29

Maybe this can be something like:

DeltaChainLink(delta_chain_link_t *ptr) :
    _filename1(link->filename),
    _node(link->node),
    ...

So we could get rid of link completely, and we wouldn't need any if (link) return link->... else return _foo code?

56

Is isnull the normal name for this for a c++ iterator? From an iterator perspective, isfinished() or isdone() or something would read nicer in the for loops below.

75

Why the two constructors? This second one in particular seems funky. This constructor does a shallow copy, but I assume freedeltachain does a deep delete, which I could imagine could cause double-free's.

cstore/py-datapackstore.h
113

Why shared_ptr this? Key objects are pretty small and I don't think we're going to have derived classes form them, so I think we can pass them around by value.

251

What is the lifetime of this PythonDataStore? The stores vector gets passed to the UnionDatapackStore constructor, but I don't think that class assumes ownership. The non-python stores that are passed in have their lifetime maintained by their python wrapper (py_datapackstore), so when UnionDatapackStore is deallocated, it releases it's substore reference, which allows the py_datapackstore python wrappers to be deallocated, which then deallocates the native store.

In your case, the store itself will be deallocated that way, but this PythonDataStore will leak I think.

cstore/py-pythondatastore.h
28

Why is this in the .h instead of in the c++ where it's used? Do we expect this to be used by other c++ files? Same for the other functions in this file.

This revision now requires changes to proceed.Aug 24 2017, 8:30 PM

Thanks for the comment. I will break this code into two diffs. But I have some questions that I left below.

cstore/datapackstore.h
40

Well. The idea behind this is the following: DatapackStoreKeyIterator and PythonDataStoreKeyIterator both inherit from KeyIterator. However, DatapackStoreKeyIterator and PythonDataStoreKeyIterator have different implementations.

  • That is, DatapackStoreKeyIterator simply has a reference to a store and therefore can return only pointers to the keys, because memory management is actually done by DatapackStore.
  • On the other hand PythonDataStoreKeyIterator is simply a wrapper around a vector of Keys and on next() method it removes the next key from the vector, puts that Key on heap(or we can in the first place put the keys on the heap) and returns the shared_ptr to that key so that the key is available while we have pointer to it and is destroyed automatically.

Ahhh, after writing this message I actually noticed that PythonKeyIterator (which is already implemented) does approximately the same what I need(does memory management for current key) and therefore I will hopefully copy that design pattern.

cstore/datastore.h
29

Yeah. I will do that definitely. However, why are there two filenames passed in a 5tuple from python getdeltachain method? This is in remotefilelog/datapack.py:: datapack.getdeltachain()

56

Well, actually DeltaChainLink is not an iterator. DeltaChainIterator is the actual iterator, which returns DeltaChainLinks. And if there are not any links left it simply returns a nullLink. Therefore I have that isnull method to check if the link is null, i.e. last. I decided to name it that way but I can of course rename it.

75

Yes, I thought I am doing deep copy, bet since the delta_chain_t contains pointers to objects then probably not. Now this constructor doesn't break anything, because the only use of it is on lines 130 and 151 of datapackstore.cpp. There the freedeltachain is not called, therefore the memory is not freed and the ownership is implicitly passed to this DeltaChain object. But I guess I need to change it because it's not a good practice, right?

cstore/py-datapackstore.h
251

Yes, Agree.

cstore/py-pythondatastore.h
28

This .h file is included by pythondatastore.cpp. Shouldnt it be .h?

durham added inline comments.Aug 25 2017, 7:26 PM
cstore/datapackstore.h
40

The only reason's I can think of for returning a shared_ptr are:

  • if we need to return instances of a derived type of Key
  • if we want the receiver to also manage the lifetime of the Key instance

In this case, we aren't making derived versions of the Key class, and generally the receiver won't need to own this type, and if it does need to, a Key instance is pretty cheap to copy. So yea, the PythonKeyIterator pattern will be cheaper because you avoid all the allocations and ref counting.

cstore/datastore.h
29

In theory the deltabasenode could be a node from a completely different file. In practice it's always from the same file for now though. Maybe we should rename filename1 and filename2 to filename and deltabasefilename

75

Hmm. Maybe just add comments explaining that this class takes ownership of the chain, regardless of if you pass it by pointed or by value. The interaction between C code and C++ is always a bit funky like this.

cstore/py-pythondatastore.h
28

.h files are usually used to declare function and structure definitions that need to be reused between C++ files. In this case, since they're only used on the pythondatastore.cpp file, you could just declare the functions directly in there (even though they aren't part of the PythonDataStore class).

ms2316 abandoned this revision.Sep 7 2017, 11:48 AM

D631 replaces this diff