This is an archive of the discontinued Mercurial Phabricator instance.

wireprotov2: define and implement "filedata" command
ClosedPublic

Authored by indygreg on Sep 5 2018, 12:20 PM.

Details

Summary

Continuing our trend of implementing *data commands for retrieving
information about specific repository data primitives, this commit
implements a command for retrieving data about an individual tracked
file.

The command is very similar to "manifestdata." The only significant
difference is that we have a standalone function for obtaining
storage for a tracked file. This is to provide a monkeypatch point
for extensions to implement path-based access control.

With this API available, wire protocol version 2 now exposes all
data primitives necessary to implement a full clone. Of course,
since "filedata" can only resolve data for a single path at a time,
clients would need to issue N commands to perform a full clone. On
the Firefox repository, this would be ~461k commands. We'll likely
need to implement a file data retrieval command that supports
multiple paths. But that can be implemented later.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

indygreg created this revision.Sep 5 2018, 12:20 PM
durin42 accepted this revision.Sep 12 2018, 11:11 AM
durin42 added a subscriber: durin42.

In general I'm happy with where this is going, but I'm extremely uncomfortable with the manifest and filelog commands requiring so much client-server transfer to get everything, especially the rpc-per-file nature of this. It's elegant in terms of how easy it'll make remotefilelog et al, but it makes me worried about performance in the small-repo case where today we can just issue one big hoss RPC for everything and we get it all in one go.

This revision is now accepted and ready to land.Sep 12 2018, 11:11 AM

Oh, one more thing: I think this, as-stated, will make it at best challenging to have ACLed branches: we'd have to, on the server, walk the linknodes and see if the client was authorized to see any branch(es) that contain the node.

That's probably okayer for us than it was for git (since we have linkrevs at least), but it's still worth considering.

Oh, one more thing: I think this, as-stated, will make it at best challenging to have ACLed branches: we'd have to, on the server, walk the linknodes and see if the client was authorized to see any branch(es) that contain the node.
That's probably okayer for us than it was for git (since we have linkrevs at least), but it's still worth considering.

First, totally agree with you about wire protocol / command overhead. I plan to add bulk querying APIs later.

Also, I wasn't aware of the ACL'd branches requirement. I'll have to think of ways to handle that. It may involve the server advertising the available mechanisms for requesting file data (e.g. "request by file node" and "request by changeset node") and the server not allowing certain modes. But that example would undercut raw data access. So maybe it gets advertised as a server preference. Or maybe we end up not doing the "request by file node" path in the common case and the server overhead isn't a problem in practice.

I'll keep this use case in mind when writing future commits.

indygreg updated this revision to Diff 10969.Sep 12 2018, 1:11 PM
This revision was automatically updated to reflect the committed changes.