Page MenuHomePhabricator

basestore: remove empty directories recursively during clean up
ClosedPublic

Authored by singhsrb on Nov 29 2017, 7:11 PM.

Details

Summary

The cleanup method right now does not clean up all the directories that are
empty. This leads to a situation where we can have a large number of empty
directories for the next repack which is unnecessary. This commit removes all
the empty directories in the cache directory during the clean up.

Test Plan
  • Ran all the tests.
  • Removed about 220k empty directories from the local cache. Running time of hg repack --incremental went from 3-4s every time to around 5s for the first time (because of deleting the empty directories) and ~200 milliseconds every subsequent time. Note that there was nothing to repack during the repack operation.

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

singhsrb created this revision.Nov 29 2017, 7:11 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 29 2017, 7:11 PM
singhsrb edited the summary of this revision. (Show Details)Nov 29 2017, 7:15 PM
singhsrb edited the test plan for this revision. (Show Details)Nov 29 2017, 7:25 PM
singhsrb edited the test plan for this revision. (Show Details)Nov 29 2017, 7:28 PM
quark requested changes to this revision.Nov 29 2017, 8:33 PM
quark added a subscriber: quark.
quark added inline comments.
remotefilelog/repack.py
687–694 ↗(On Diff #3986)

isdir is a syscall. Instead of testing if root is a directory or not first (which is also subject to file system race condition - someone could replace that directory with a file really quickly), just call rmdir and handle its exception.

This revision now requires changes to proceed.Nov 29 2017, 8:33 PM
singhsrb updated this revision to Diff 3994.Nov 29 2017, 8:54 PM
singhsrb marked an inline comment as done.Nov 29 2017, 8:54 PM
singhsrb added inline comments.
remotefilelog/repack.py
687–694 ↗(On Diff #3986)

Nice observation! I have fixed that.

quark added a comment.EditedNov 29 2017, 9:27 PM

It still feels weird that both cleanup and _removeemptydirectories have os.listdir. Could it be recursive? Besides, I think there is no need to test true/false before calling rmdir since filesystem is racy.

osutil = mercurial.policy.importmod('osutil')

def tryrmdir(path):
    try:
        os.rmdir(path)
    except OSError:
        pass

def removeemptydirs(path):
    # osutil.listdir returns stat information which saves some rmdir/listdir syscalls.
    for name, mode in osutil.listdir(path):
        if stat.S_ISDIR(mode):
            removeemptydirs(os.path.join(path, name))
    tryrmdir(path)
singhsrb marked an inline comment as done.Nov 30 2017, 1:57 PM
singhsrb retitled this revision from repack: remove empty directories recursively during clean up to basestore: remove empty directories recursively during clean up.
singhsrb updated this revision to Diff 4014.

@quark I think I have addressed most of your concerns. Also, the earlier code was potentially trying to remove empty directories from the whole cache. So, I have changed it a bit to only care for the repository in consideration.

quark accepted this revision.Nov 30 2017, 6:20 PM
This revision is now accepted and ready to land.Nov 30 2017, 6:20 PM
This revision was automatically updated to reflect the committed changes.