The cleanup method right now does not clean up all the directories that are
empty. This leads to a situation where we can have a large number of empty
directories for the next repack which is unnecessary. This commit removes all
the empty directories in the cache directory during the clean up.
Details
Details
- Reviewers
quark - Group Reviewers
Restricted Project - Commits
- rFBHGX545ee92bfc45: basestore: remove empty directories recursively during clean up
- Ran all the tests.
- Removed about 220k empty directories from the local cache. Running time of hg repack --incremental went from 3-4s every time to around 5s for the first time (because of deleting the empty directories) and ~200 milliseconds every subsequent time. Note that there was nothing to repack during the repack operation.
Diff Detail
Diff Detail
- Repository
- rFBHGX Facebook Mercurial Extensions
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
remotefilelog/repack.py | ||
---|---|---|
687–694 ↗ | (On Diff #3986) | isdir is a syscall. Instead of testing if root is a directory or not first (which is also subject to file system race condition - someone could replace that directory with a file really quickly), just call rmdir and handle its exception. |
remotefilelog/repack.py | ||
---|---|---|
687–694 ↗ | (On Diff #3986) | Nice observation! I have fixed that. |
Comment Actions
It still feels weird that both cleanup and _removeemptydirectories have os.listdir. Could it be recursive? Besides, I think there is no need to test true/false before calling rmdir since filesystem is racy.
osutil = mercurial.policy.importmod('osutil') def tryrmdir(path): try: os.rmdir(path) except OSError: pass def removeemptydirs(path): # osutil.listdir returns stat information which saves some rmdir/listdir syscalls. for name, mode in osutil.listdir(path): if stat.S_ISDIR(mode): removeemptydirs(os.path.join(path, name)) tryrmdir(path)
Comment Actions
@quark I think I have addressed most of your concerns. Also, the earlier code was potentially trying to remove empty directories from the whole cache. So, I have changed it a bit to only care for the repository in consideration.