This is an archive of the discontinued Mercurial Phabricator instance.

win32: use fewer system calls for unlink()
AbandonedPublic

Authored by durin42 on Aug 31 2017, 6:32 PM.

Details

Reviewers
quark
indygreg
Group Reviewers
hg-reviewers
Summary

unlink() on Windows is complicated by the fact that Windows doesn't
support removing an open file. Our unlink wrapper currently goes
through a dance where it generates a random filename, renames the
file, then attempts to unlink.

This is a lot of extra work for what is likely the special case of
attempting to unlink a file that is open or can't be simply unlinked.

In this commit, we refactor the code to use fewer system calls in
the common cases of 1) unlink just works 2) file doesn't exist.

For the file doesn't exist case (before)

  1. stat() to determine if path is directory
  2. generate random filename
  3. rename()

after:

  1. stat() to get path info

For the file not open case (before)

  1. stat() to determine if path is directory
  2. generate random filename
  3. rename()
  4. unlink()

after:

  1. stat()
  2. unlink()

For the file open case (before)

  1. stat() to determine if path is directory
  2. generate random filename
  3. rename()
  4. unlink()

after:

  1. stat()
  2. unlink()
  3. generate random filename
  4. rename()
  5. unlink()

There is also a scenario where the unlink fails due to the file being
marked as read-only. In this case we also introduce an extra unlink()
call. However, I reckon the common case is the file isn't marked as
read-only and skipping the random generation and rename is worth it.

I /think/ this change makes bulk file writing on Windows faster. This
is because vfs.call calls unlink() when a file is opened for
writing. When writing a few hundred files files as part of a clone or
working directory update, the extra system calls can matter.
win32.unlink() did show up in performance profiles, which is what
caused me to look at this code. But I/O performance is hard to measure
and I'm not sure if the ~30s off of ~620s for a stream clone unbundle
on the Firefox repo is indicative of real world performance. I do know
the new code uses fewer system calls and shouldn't be slower.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Aug 31 2017, 6:32 PM

On my Windows desktop at home, this reduced hg debugapplystreamclonebundle for mozilla-unified from ~185s to ~160s. So that's 2 machines showing a wall time reduction for this operation.

quark accepted this revision.Sep 1 2017, 1:55 PM
quark added a subscriber: quark.

LGTM. Out of curious, which profiler did you use?

durin42 added a subscriber: durin42.Sep 1 2017, 2:12 PM

From the list:

On Sep 1, 2017, at 02:16, Adrian Buehlmann <adrian@cadifra.com> wrote:

Ugh. Can I reply to a phabricator notification by email?
Adding gregory.szorc@gmail.com manually, as I'm not sure replaying to
those nasty phabricator emails is going to work...

On 2017-09-01 00:32, indygreg (Gregory Szorc) wrote:
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

[...]

  • if os.path.isdir(f):
  • # use EPERM because it is POSIX prescribed value, even though
  • # unlink(2) on directories returns EISDIR on Linux
  • raise IOError(errno.EPERM,
  • "Unlinking directory not permitted: '%s'" % f)

+ # If the path doesn't exist, raise that exception.
+ # If it is a directory, emulate POSIX behavior.
+ try:
+ st = os.stat(f)
+ if stat.S_ISDIR(st.st_mode):
+ # use EPERM because it is POSIX prescribed value, even though
+ # unlink(2) on directories returns EISDIR on Linux
+ raise IOError(errno.EPERM,
+ "Unlinking directory not permitted: '%s'" % f)
+ except OSError as e:
+ if e.errno == errno.ENOENT:
+ raise
+
+ # In the common case, a normal unlink will work. Try that first and fall
+ # back to more complexity if and only if we need to.
+ try:
+ os.unlink(f)
+ return
+ except (IOError, OSError) as e:
+ pass

  1. POSIX allows to unlink and rename open files. Windows has serious
  2. problems with doing that:

Do you get an error at all, if a file, which is in open state, is unlinked?
My fear is: You won't get an error, but instead the filename is blocked
by the file being held in place by the other process, until the other
process closes it. Which means: You already lost the game.
Which would explain why we didn't do things like you propose here.
See also

https://www.mercurial-scm.org/wiki/UnlinkingFilesOnWindows

(specifically, heading 2)

quark requested changes to this revision.EditedSep 1 2017, 7:19 PM

I can confirm os.unlink may succeed without actually removing the file:

>>> from mercurial import util
>>> f=util.posixfile('c:\\users\\quark\\a.txt', 'w')
>>> os.unlink('c:\\users\\quark\\a.txt') # success, but file is still there
>>> f.close() # removes the file

There is a proposed boost.afio that seems to take a same approach: https://boostgsoc13.github.io/boost.afio/doc/html/afio/FAQ/deleting_open_files.html

mercurial/win32.py
542

This succeeded, but the file still exists and cannot be renamed.

This revision now requires changes to proceed.Sep 1 2017, 7:19 PM
abuehl added a subscriber: abuehl.Sep 2 2017, 2:49 AM
In D588#9925, @quark wrote:

I can confirm os.unlink may succeed without actually removing the file:

>>> from mercurial import util
>>> f=util.posixfile('c:\\users\\quark\\a.txt', 'w')
>>> os.unlink('c:\\users\\quark\\a.txt') # success, but file is still there
>>> f.close() # removes the file

There is a proposed boost.afio that seems to take a same approach: https://boostgsoc13.github.io/boost.afio/doc/html/afio/FAQ/deleting_open_files.html

(finally created a login to this phabricator madness.. argh).

Yep. You just redescovered what is already known.

The bad thing is, the "zombie file" (can be held open by another process, even by opening any hardlinked name to the same file) makes it impossible to create a new file under the same name after "unlinking" the previous one. For as long as the other process holds the "unlinked" file open.

Awesome pointer, BTW!

I'll quote the interesting part found on that link you posted:

AFIO works around these un-POSIX file semantics by taking a dual step to deleting files. Firstly, it renames the file to a 128 bit cryptographically strong random name prefixed by “.afiod” into as high up the directory hierarchy as it is able to, and only then does it request the deletion of the file. As AFIO always opens files with FILE_SHARE_DELETE permission enabled, with that flag Windows permits renaming and deletion, and because the name was changed to a very random name somewhere not in its origin directory before deletion, you don't see those unexpected and random errors when creating files with the same name as a recently deleted file as you do anywhere else on Windows. Because the file is probably not in its original containing directory any more, deletions of that directory will not fail with “directory not empty” as they otherwise normally would and indeed do in most Windows programs. Handy eh?

Very handy indeed and we already do something similar on Mercurial. We don't rename the "zombie file" to a different directory though, which sounds pretty clever. We could steal that trick from AFIO for Mercurial.

For manual tests: I usually opened the file with an other process to match the scenario we were facing back then when we introduced the "rename to a random name before unlinking" dance.

OK. I failed to grok the unlink semantics on Windows. I'll need to read up on MSDN.

Would it be safe to keep the `os.stat()` code and return if the file doesn't exist? That at least allows us to do the "is directory" and "file missing" check with a single system call. That will avoid the random number generation and the 2nd system call for the rename when the file doesn't exist.

abuehl added a comment.Sep 2 2017, 1:09 PM
In D588#9989, @indygreg wrote:

OK. I failed to grok the unlink semantics on Windows. I'll need to read up on MSDN.

I doubt this is documented on MSDN, but in case you find it there, please share the link (perhaps add it on the UnlinkingFilesOnWindows wiki page).

The code has pretty big comment right underneath where you inserted your change. The important part is this one:

# - Calling os.unlink on a file that has been opened with Mercurial's
#   posixfile (or comparable methods) will delay the actual deletion of
#   the file for as long as the file is held open. The filename is blocked
#   during that time and cannot be used for recreating a new file under
#   that same name ("zombie file"). Directories containing such zombie files
#   cannot be removed or moved.
abuehl added inline comments.Sep 3 2017, 3:00 AM
mercurial/win32.py
537

why continue on errors != ENOENT ?

In D588#9989, @indygreg wrote:

Would it be safe to keep the `os.stat()` code and return if the file doesn't exist? That at least allows us to do the "is directory" and "file missing" check with a single system call. That will avoid the random number generation and the 2nd system call for the rename when the file doesn't exist.

I guess something like

diff --git a/mercurial/win32.py b/mercurial/win32.py
--- a/mercurial/win32.py
+++ b/mercurial/win32.py
@@ -12,6 +12,7 @@
 import msvcrt
 import os
 import random
+import stat
 import subprocess

 from . import (
@@ -522,7 +523,8 @@
 def unlink(f):
     '''try to implement POSIX' unlink semantics on Windows'''

-    if os.path.isdir(f):
+    st = os.stat(f)
+    if stat.S_ISDIR(st.st_mode):
         # use EPERM because it is POSIX prescribed value, even though
         # unlink(2) on directories returns EISDIR on Linux
         raise IOError(errno.EPERM,

might work (not tested).

If f does not exist, the os.stat call will raise directly, which would spare us another system call in that case.

durin42 commandeered this revision.
durin42 abandoned this revision.

Let's reopen this if/when we return to it.