remotefilelog: import pruned-down remotefilelog extension from hg-experimental
ClosedPublic

Authored by durin42 on Sep 27 2018, 4:10 PM.

Details

Summary

This is remotefilelog as of my recent patches for compatibility with
current tip of hg, minus support for old versions of Mercurial and
some FB-specific features like their treemanifest extension and
fetching linkrev data from a patched phabricator. The file extutil.py
moved from hgext3rd to remotefilelog.

This is not yet ready to be landed, consider it a preview for
now. Planned changes include:

  • replace lz4 with zstd
  • rename some capabilities, requirements and wireproto commands to mark them as experimental
  • consolidate bits of shallowutil with related functions (eg readfile)

I'm certainly open to other (small) changes, but my rough mission is
to land this largely as-is so we can use it as a model of the
functionality we need going forward for lazy-fetching of file contents
from a server.

  1. no-check-commit because of a few foo_bar functions

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
durin42 created this revision.Sep 27 2018, 4:10 PM

My primary concern is migrating from hg-experimental remotefilelog to this. Making one of them compatible with the other at a wire protocol level would assist with the migration immensely, so I think that avoiding complex changes to the wire protocol would make sense. I think we should make changes to hg-experimental's remotefilelog to support any capabilities/command we implement here, so that the migration path is:

  1. upgrade client hg-experimental remotefilelog, it can now talk to a server using either hg-experimental or hgext/remotefilelog
  2. upgrade server - only supports hgext/remotefilelog protocol
  3. switch clients to hgext/remotefilelog

(This doesn't address any migration of on-disk data structures/configuration/caches)

The problem is that unless the two versions of the extension can play nicely together both being enabled in the repo at the same time, we can't just have a flag day where we change the wire protocol without a LOT of pain.

hgext/remotefilelog/remotefilelogserver.py
209

Per offline discussion, and I know this was in your TODOs in the commit description, I think we should do something like the following, but let's use the same format specifier/terminology that we're already using in other places in the wire protocol. This list is ordered in descending preference, in case that matters to the client?

x-getfile-formats=zstd,unc,gzip,bz2

The client then issues a command like, where this formats list is unordered; the client SHOULD NOT send items not in the capabilities, but the server MUST ignore any it doesn't understand (this allows rolling upgrades where the capabilities response may be from a different, newer server).

x-getfile path=foo/file.txt node=abc123 formats=unc,gzip

The server gets to decide what it wants to send the client: uncompressed or gzip.

Thoughts?

durin42 updated this revision to Diff 11467.Sep 28 2018, 10:31 AM
durin42 edited the summary of this revision. (Show Details)
durin42 updated this revision to Diff 11468.Sep 28 2018, 10:35 AM

So much code.

I would prefer we not vendor RFL because by the time we adapt it to work properly with recent wire protocol and storage refactorings, we'll hardly have any of the original code remaining. But, you've made your point that having it in the official repo will make it easier to keep running while we rewrite the world around it. And having the code readily available will give us easier insight into future areas where we need to teach core about shallow stores. I'm specifically thinking about all the filectx/filelog hacks, including the need for prefetching. These are problems that we'll need to solve in core pretty soon now.

I'd strongly prefer to remove some non-essential features from RFL so the code base is more manageable. This includes background prefetching and standalone cache server support. Those are both useful features don't get me wrong. But I don't see an obvious path for them into core at this time. Not having their code lingering around would make the RFL extension much easier to approach.

Anyway, I'm willing to rubber stamp a review. But I figure @durin42 will want to comment on my comments first.

hgext/remotefilelog/__init__.py
279–325

We probably want to rip out this narrow stream clone bit, since Pulkit is implementing this properly in core.

726–727

Maybe rename this to debuggc or something else?

1058

Add debug prefix?

1084

Add debug prefix?

hgext/remotefilelog/basepack.py
152

This should be using a vfs instance for I/O.

375–378

I thought there was an API on vfs instances for using temp files and doing an atomic rename on close?

hgext/remotefilelog/basestore.py
253–257

Oh hey - it is hg share done righter. But it should be doing this file write with a lock. If we were using the vfs layer, we'd get a devel warning about this.

327

This is assuming POSIX paths.

hgext/remotefilelog/cacheclient.py
3

We might want to nuke this direct-to-memcache functionality. At least as currently implemented. D4774 is a much more robust solution to this general problem and we could shoehorn memcache support into that.

hgext/remotefilelog/connectionpool.py
2

This is also functionality we should consider ripping out.

I mean it is potentially useful. But this type of thing should exist at the peer/ui level in core.

hgext/remotefilelog/contentstore.py
322

???

Is Facebook storing a flat and tree manifest in the same store?

This isn't functionality we'll want in core. At least not in the RFL extension. (The ability to store both versions of a manifest in the same repo is useful and I would like to create an extension to do that.)

hgext/remotefilelog/debugcommands.py
92–93

This looks to be a copy of hg debugindex from some point in time.

I had to split out revlog index dumping from hg debugindex a few weeks back in order to get things happy with the new file storage interface. RFL should probably define its own debug* command for dumping a pack's index.

142–143

This also looks like a copy.

hgext/remotefilelog/fileserverclient.py
2

This may be a candidate for removal, since we may not want the standalone cache process feature in core.

82

This can be nuked if removing the stream clone feature.

hgext/remotefilelog/lz4wrapper.py
10–14

This compatibility code isn't needed in core.

hgext/remotefilelog/remotefilectx.py
26

The amount of changes to filectx to get this to work scares me a little. I feel like these things need to be on the fast path to moving to core because some feel somewhat generic for all shallow storage backends.

81

This should use cl.revs().

hgext/remotefilelog/remotefilelog.py
44

We should declare that this implements repository.filestorage and hook things up to test-check-interfaces.py.

46

This is no longer part of the storage interface.

183

This is no longer part of the storage interface.

267–294

D4803 should allow us to use a more correct implementation.

hgext/remotefilelog/remotefilelogserver.py
205–207

Maybe we can drop this?

213–222

Eek. So just activating this extensions on the server can do bad things to linkrev resolution?

hgext/remotefilelog/shallowbundle.py
164–165

This is allowing any pattern type. We recently decided to limit the types of patterns to allow. This is potentially a security issue.

hgext/remotefilelog/shallowrepo.py
157–164

This should be done by wrapping localrepo.makefilestorage().

194–205

Do we want to rip out the background prefetching feature?

302–305

I feel like the narrow matcher already covers this functionality?

tests/remotefilelog-getflogheads.py
13–21

hg debugwireproto can be used for testing whatever uses this.

tests/remotefilelog-library.sh
20

I don't recall seeing any extension code for validating cg3 before using RFL. Should there be?

durin42 updated this revision to Diff 12176.Oct 16 2018, 3:17 PM

I need to give the final state of the code a final one-over. I will do that after breakfast.

One thing that is missing is a detailed docstring for the extension. That can be added as a follow-up. I think said docstring should discourage use of the extension and explicitly state that there is a possibility much of the functionality in the extension never stabilizes without major BC breakage. The sqlitestore extension may have some language worth borrowing.

durin42 updated this revision to Diff 12401.Nov 5 2018, 1:09 PM
durin42 updated this revision to Diff 12403.Nov 5 2018, 1:49 PM
durin42 updated this revision to Diff 12405.Nov 5 2018, 2:44 PM
This revision was automatically updated to reflect the committed changes.