This is an archive of the discontinued Mercurial Phabricator instance.

filemerge: use arbitraryfilectx for backup files

Authored by phillco on Sep 11 2017, 12:03 AM.


Group Reviewers

With in-memory merge, backup files might be overlayworkingfilectxs stored
in memory. But they could also be real files if the user's backup directory is
outside the working dir.

Rather than have two code paths everywhere, let's use arbitraryfilectx so they
can be consistent.

Diff Detail

rHG Mercurial
Lint Skipped
Unit Tests Skipped

Event Timeline

phillco created this revision.Sep 11 2017, 12:03 AM
phillco updated this revision to Diff 1724.Sep 11 2017, 4:05 PM
martinvonz added inline comments.

Feel like checking out the "abc" module and see if that's helpful here?


Looks like either _customcmp should be part of abstractfilectx or checked for here (util.safehasattr())


The documentation for filecmp.cmp() says

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Are we losing out on that optimization with this patch (when not doing in-memory merge). Do you have any sense of how relevant that optimization is?

85 ↗(On Diff #1724)

Why did this change?

phillco added a subscriber: sid0.Sep 13 2017, 1:42 AM
phillco added inline comments.
85 ↗(On Diff #1724)

It's caused by the addition of this block into abstractfilectx.

if isinstance(fctx, abstractfilectx):
    return !=

When fctx is a workingfilectx, and has been replaced by a directory, calling data() on it raises an IOError instead of returning True like this function used to do.

Interestingly, the test behavior is caused by this error being caught by an existing block inside workingctx._checklookup():

except (IOError, OSError):
        # A file become inaccessible in between? Mark it as deleted,
        # matching dirstate behavior (issue5584).
        # The dirstate has more complex behavior around whether a
        # missing file matches a directory, etc, but we don't need to
        # bother with that: if f has made it to this point, we're sure
        # it's in the dirstate.

Which seems to somewhat predict the case of files being unavailable and raising raw IOErrors, although not this case specifically.

Also, @sid0's comment around the test case says:

XXX Note that this returns M for files that got replaced by directories. This is
definitely a bug, but the fix for that is hard and the next status run is fine

Which is in fact the case for e. So maybe continuing to throw is actually the better behavior here, given that it would also throw if fctx was missing so the idea isn't unprecedented.

On the other hand, just catching the error and returning True is the most conservative path forward, so I might just do that here.

phillco added inline comments.Sep 13 2017, 1:49 AM

Yes, and it's a bigger impact than past changes of this nature, since each merged file, and its backup file, will be read again. So I think we need some way to reintroduce this fast-path for two disk-backed files inside cmp.

I'd propose adding something like this to filectx:

def ondisk():
   """Returns True iff this filectx is directly backed by a file in the filesystem and not some other abstraction. If so, callers can run system file functions on it for better performance.

It'd be True only for workingfilectxs and abstractfilectxs.

Then, inside abstractfilectx.cmp(), check if both the caller and other are on-disk and use filecmp in that case.

It's a naive first take though, so improvements are appreciated.

martinvonz added inline comments.Sep 13 2017, 12:02 PM

Can we not simply make arbitraryfilectx.cmp() something like

def cmp(self, fctx):
    if isinstance(fctx, arbitraryfilectx):
        return filecmp.cmp(self.path(), fctx.path())
    return !=
phillco added inline comments.Sep 13 2017, 2:05 PM

fcd is a workingfilectx, so it'd need to need to be something like:

return filecmp.cmp(self.path(), self._repo.wjoin(fctx.path()))

and arbitraryfilectx doesn't have a _repo because we use it in contrib/simplemerge. Maybe we could make it an optional property and raise in this case if it's missing? contrib/simplemerge doesn't need it.

phillco updated this revision to Diff 1830.Sep 14 2017, 4:13 PM
phillco planned changes to this revision.Sep 15 2017, 3:38 PM

Putting back in my queue -- still have two comments of Martin's to respond to.

phillco updated this revision to Diff 1871.Sep 18 2017, 4:35 PM
phillco updated this revision to Diff 1874.Sep 18 2017, 4:40 PM

One extra comment: maybe include some "why" as well as "what" in your commit message. :)

durin42 removed a subscriber: durin42.Sep 20 2017, 12:05 PM
martinvonz added inline comments.Sep 21 2017, 12:45 AM

abstractfilectx doesn't seem referenced anywhere else, so just revert this part? Maybe you meant to make arbitraryfilectx extend it? That would make sense. If we do, I feel like we should implement the smart cmp() in this abstract base class to avoid the ugly asymmetry in the implementation otherwise (a.cmp(b) might be faster or slower than b.cmp(a) and one might perhaps even crash?).

Perhaps we should even make it a top-level function so it will be easier to override by extensions that want to add their own subclasses? Something like:

def filectxcmp(fctx1, fctx2):
class abstractfilectx(object):
    def cmp(self, otherfctx): # don't override in subclasses, wrap filefctxcmp() instead
        return filectxcmp(fctx1, fctx2)

Maybe there's precedence for this kind of thing elsewhere in Mercurial? Surely at least elsewhere in Python.

Or maybe I'm just overthinking this and we're pretty sure all call sites will pass it as arbitraryfilectx.cmp(filectx) and not the other way around, so it won't be a problem in practice. I'm not even sure I got that right (and I don't know where overlayfilectx fits in), which seems like a sign that it's best to have a single cmp() method.

phillco updated this revision to Diff 1989.Sep 21 2017, 6:35 PM
martinvonz added inline comments.Sep 22 2017, 2:04 PM

Isn't repo.wjoin(fcd.path()) the same thing as _workingpath(repo, fcd) inlined? You call the same thing further down, so why not keep the "a" variable (possibly renamed to something better)?


Why care whether it's in the working directory when doing in-memory merge? Why not instead always (when doing in-memory merge) either redirect the backup to memory or put it outside of the working directory? Is it so if a there's a conflict, it will get flushed to the place the user expects?

One extra comment: maybe include some "why" as well as "what" in your commit message. :)

Sorry I missed this, the comment is very valid. Will send a new version.


Yeah, not sure why I reverted to wjoin. I'll switch it back.


Is it so if a there's a conflict, it will get flushed to the place the user expects?

Yes, basically.

phillco edited the summary of this revision. (Show Details)Sep 26 2017, 9:38 AM
phillco updated this revision to Diff 2088.Sep 26 2017, 10:06 AM
martinvonz requested changes to this revision.Oct 3 2017, 5:08 PM
martinvonz added inline comments.

I mentioned on IRC the other day that this *could* be left for another patch and that I didn't expect it to be here after reading the commit message. I didn't insist that had to be done, but Phil liked the idea, so he said he'd do it. I just thought I'd point that out here too to get the status right on the dashboard.

This revision now requires changes to proceed.Oct 3 2017, 5:08 PM
phillco abandoned this revision.Oct 13 2017, 3:48 PM

I've split this into D1056, D1057, D1058, D1059, D1060, so abandoning this version.