The previous solution was incomplete. This solution logs once per run, with two
separate metrics (filestore_ and treestore_), each logging the number of packs
and bytes. I also did some refactoring.
Details
- Reviewers
singhsrb durham - Group Reviewers
Restricted Project - Commits
- rFBHGX15e22474bcec: packs: improve packfile metrics
Diff Detail
- Repository
- rFBHGX Facebook Mercurial Extensions
- Lint
Lint Skipped - Unit
Unit Tests Skipped
Event Timeline
remotefilelog/basepack.py | ||
---|---|---|
92–109 | *more |
Thanks for doing this refactoring and saving developer time in the future :). I just have a few suggestions/comments.
remotefilelog/basepack.py | ||
---|---|---|
61 | I don't see what the general convention is but I would prefer to see for filepath, _mtime, _size in self._getavailablepackfilessorted(): | |
105 | I removed the id from ids at this point in my refactor. I did not confirm if it actually helps or not but maybe something worth considering. | |
121 | size -> _size | |
136 | file->_file, mtime->_mtime | |
143 | hmm, number of packs in self.packs can be different from ones that gettotalsize perceives because self.packs may not have the most recent packs.. |
remotefilelog/basepack.py | ||
---|---|---|
61 | Why prefix them? |
remotefilelog/basepack.py | ||
---|---|---|
61 | We talked offline -- it's to mark it as unused. I like the idea marking unused variables, but _size looks too much to me like an instance variable (and the next person might not rename it). Since _ is used by the localization framework, I humbly propose __. It doesn't look _too_ weird, although it's not been done before. | |
105 | My hunch is the combined time of removing the items is greater than the time saved in subsequent lookups -- but I could be wrong. It's probably still a good idea to reduce memory usage. | |
143 | This is a great point, the two could be inconsistent. |
remotefilelog/basepack.py | ||
---|---|---|
61 | Does it make more sense to actually get the tuple and extract stuff out of it instead of expanding in the loop in this case? |
remotefilelog/basepack.py | ||
---|---|---|
61 | Would that be to eliminate the unused variables? |
remotefilelog/basepack.py | ||
---|---|---|
143 | I ended up combining the calculation of both into a gettotalsizeandcount (ick) to ensutre consistency. |
Looks good to me mostly. But I don't have much context here. So, I will be more comfortable with another pair of eyes on this!
remotefilelog/basepack.py | ||
---|---|---|
61 | I was thinking of something like: for pack in self._getavailablepackfilessorted(): filepath = pack.filepath . . | |
143 | That's what I was mainly concerned about. Thanks for taking care of this! | |
remotefilelog/shallowutil.py | ||
98 | I am not sure sum is the right word here mainly because of its use in the string case. Perhaps, add or aggregate is better. |
remotefilelog/basepack.py | ||
---|---|---|
61 | I misunderstood the discussion last week (I thought it was choosing between _name vs __). But if it's between _name and name, I don't think it has to be that strict as long as pyflakes does not complain. The unprefixed version will causes leas code churn if the variables are used in a later patch, which makes the blame output tidier. I remembered that the Mercurial community strongly preferred less code churn when mpm was still there - for example, there was an AST transformer to convert 'str' to b'str' for Py3 compatibility, instead of doing a codemod. |
The 'count is not incremented' comment is the only important thing. All my other comments are nits or suggestions.
remotefilelog/basepack.py | ||
---|---|---|
61 | It looks like the 2nd and 3rd parameter from _getavailablepackfilessorted is never used. Can we just not return them? | |
105 | If we're going to remove it, I'd add a comment explaining why. I don't think we really need that optimization, since most of the time this loop will only be over 10's of items. | |
139 | Count is not incremented | |
remotefilelog/shallowrepo.py | ||
80–81 | I'd drop the underscore and have the helper function add it. So the caller doesn't have to be aware of formatting. I'd also not use *[] in this case and just pass the list as a list, since that's how it's consumed later. I know we use the * pattern in other places in remotefilelog, but I've come to regret it. If you did want to use *stores in the function, then you'd just pass , 'filestore_', packcontentstore, packmetadatastore) here. Same for sumdicts, where we pass in *metrics, and receive out *dicts. Just drop the * | |
remotefilelog/shallowutil.py | ||
106 | The default of 0 here means it won't work with anything other than numbers. >>> from collections import defaultdict >>> foo = defaultdict(lambda: 0) >>> foo['bar'] += ['a', 'b', 'c'] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for +=: 'int' and 'list' |
remotefilelog/basepack.py | ||
---|---|---|
139 | Well definitely needed another pair of eyes :). |
remotefilelog/basepack.py | ||
---|---|---|
61 | I think it's more confusing if we don't, because that function claims to otherwise be identical to _getavailablepackfilessorted(). | |
139 | whoops, thanks | |
remotefilelog/shallowutil.py | ||
106 | hm, derp. I'll just drop the strings array comment for now. We could always set the default conditionally based on the types of the values later on, if we really want that use case. |
remotefilelog/basepack.py | ||
---|---|---|
61 | *_getavailablepackfiles() |
I don't see what the general convention is but I would prefer to see