This is an archive of the discontinued Mercurial Phabricator instance.

basestore: remove empty directories recursively during clean up
ClosedPublic

Authored by singhsrb on Nov 29 2017, 7:11 PM.
Tags
None
Subscribers

Details

Summary

The cleanup method right now does not clean up all the directories that are
empty. This leads to a situation where we can have a large number of empty
directories for the next repack which is unnecessary. This commit removes all
the empty directories in the cache directory during the clean up.

Test Plan
  • Ran all the tests.
  • Removed about 220k empty directories from the local cache. Running time of hg repack --incremental went from 3-4s every time to around 5s for the first time (because of deleting the empty directories) and ~200 milliseconds every subsequent time. Note that there was nothing to repack during the repack operation.

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

singhsrb created this revision.Nov 29 2017, 7:11 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 29 2017, 7:11 PM
singhsrb edited the summary of this revision. (Show Details)Nov 29 2017, 7:15 PM
singhsrb edited the test plan for this revision. (Show Details)Nov 29 2017, 7:25 PM
singhsrb edited the test plan for this revision. (Show Details)Nov 29 2017, 7:28 PM
quark requested changes to this revision.Nov 29 2017, 8:33 PM
quark added a subscriber: quark.
quark added inline comments.
remotefilelog/repack.py
687–694

isdir is a syscall. Instead of testing if root is a directory or not first (which is also subject to file system race condition - someone could replace that directory with a file really quickly), just call rmdir and handle its exception.

This revision now requires changes to proceed.Nov 29 2017, 8:33 PM
singhsrb updated this revision to Diff 3994.Nov 29 2017, 8:54 PM
singhsrb marked an inline comment as done.Nov 29 2017, 8:54 PM
singhsrb added inline comments.
remotefilelog/repack.py
687–694

Nice observation! I have fixed that.

quark added a comment.EditedNov 29 2017, 9:27 PM

It still feels weird that both cleanup and _removeemptydirectories have os.listdir. Could it be recursive? Besides, I think there is no need to test true/false before calling rmdir since filesystem is racy.

osutil = mercurial.policy.importmod('osutil')

def tryrmdir(path):
    try:
        os.rmdir(path)
    except OSError:
        pass

def removeemptydirs(path):
    # osutil.listdir returns stat information which saves some rmdir/listdir syscalls.
    for name, mode in osutil.listdir(path):
        if stat.S_ISDIR(mode):
            removeemptydirs(os.path.join(path, name))
    tryrmdir(path)
singhsrb marked an inline comment as done.Nov 30 2017, 1:57 PM
singhsrb retitled this revision from repack: remove empty directories recursively during clean up to basestore: remove empty directories recursively during clean up.
singhsrb updated this revision to Diff 4014.

@quark I think I have addressed most of your concerns. Also, the earlier code was potentially trying to remove empty directories from the whole cache. So, I have changed it a bit to only care for the repository in consideration.

quark accepted this revision.Nov 30 2017, 6:20 PM
This revision is now accepted and ready to land.Nov 30 2017, 6:20 PM
This revision was automatically updated to reflect the committed changes.