wireproto: support for pullbundles
ClosedPublic

Authored by joerg.sonnenberger on Jan 12 2018, 12:32 PM.

Details

Summary

Pullbundles are similar to clonebundles, but served as normal inline
bundle streams. They are almost transparent to the client -- the only
visible effect is that the client might get less changes than what it
asked for, i.e. not all requested head revisions are provided.

The client announces support for the necessary retries with the
partial-pull capability. After receiving a partial bundle, it updates
the set of revisions shared with the server and drops all now-known
heads from the request list. It will then rerun getbundle until
no changes are received or all remote heads are present.

Extend badserverext to support per-socket limit, i.e. don't assume that
the same limits should be applied to all sockets.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

For my test case, which is a bundle of all changes in the NetBSD repo before 2014 and a yearly bundle afterwards until 2018/1/1 and normal pull for the rest, find_pullbundle needs less than 0.5s of CPU time in this iteration when it matches.
After the initial clone and with additional available changes, pull time is:

With pullbundles.manifest: 0.42s
Without pullbundles.manifest: 0.41s

i.e. the difference is the noise level. Further benchmarks by others would be appreciated.

The biggest remaining question for me is whether we want to introduce a capability for the client to mark that it is willing to do multiple pull rounds.

indygreg requested changes to this revision.Jan 14 2018, 5:42 PM
indygreg added subscribers: durin42, indygreg.

This patch needs a lot of work. But I'm very supportive of the feature and the preliminary implementation!

I wish we could find a way to send multiple, inline, pre-generated bundles in one response. However, the existing design of compression within the bundle2 format doesn't easily facilitate this. We should think long and hard about whether to implement this feature as partial pull or extend bundle2 to allow us to do nice things and avoid the extra client-server roundtrips.

Speaking of compression, we probably need some more code around e.g. the mismatch between on-disk compression and wire compression. There is a potential for local bundles to be stored uncompressed and for the server to compress this data as part of streaming it out. That involves CPU. We still come out ahead because it avoids having to generate the bundle content from revlogs. But it isn't as ideal as streaming an already compressed bundle over the wire. We may want to open the bundle2 files and look at their compression info and make sure we don't do stupid things. It should be fast to parse the bundle2 header to obtain this knowledge.

My biggest concern with the architecture of this feature is the multiple roundtrips. I really wish we could stream multiple bundles off disk to the wire with no decompression/compression involved. That would mean storing compressed bundles on disk. But this would require some additional bundle2 magic. The existing solution is simple and elegant. I do like that. I'd very much like to get the opinion of someone like @durin42 (who also likes designing protocols).

mercurial/wireproto.py
941

This function needs a docstring. And the expected behavior of this function needs to be documented in the docstring and/or inline as comments.

943

: as a delimiter for nodes doesn't feel right because : has meaning for revsets. I would use , or ;.

That being said, it may make the manifest format slightly less human friendly because URI encoding may come into play. That should be fine though: we already escape values for BUNDLESPEC when using packed1 bundles.

945

I think there should be an if not manifest: return here to handle the common case.

955

We'll need documentation of the fields in the manifest somewhere. If we combine this feature with the clonebundles.py extension, that seems like the logical place to document things.

961–969

I worry about performance issues here. You were making some noise about this on IRC. So it sounds like it is on your radar.

I'm just not sure how to make things more efficient. I just know that doing so many DAG tests feels like it will lead to performance problems for repos with hundreds of thousands or millions of changesets.

But we shouldn't let performance issues derail this feature. Pull bundles have the potential to offload tens of seconds of CPU time from the server. So even if DAG operations consume a few seconds of CPU, we come out ahead. It would be nice to get that down to 100's of milliseconds at most though. But this feels like the territory of follow-up work, possibly involving caching or more efficient stores of which revisions are in which bundles.

981–984

Using a URL field for a vfs path (which doesn't have nor handle protocol schemes - e.g. file:// IIRC) feels wrong. I think the entry in the manifest should be PATH`.

1042

I'd consider enabling this feature by default. Checking for the presence of a pullbundles.manifest should be a cheap operation. Especially when you consider that serving a bundle will open likely dozens of revlog files.

Another idea to consider is to tie this feature into the clonebundles.py extension. We already have extensive documentation in that extension for offloading server load via pre-generated bundles. I view this feature as a close sibling. I think they could live under the same umbrella. But because "pull bundles" are part of the getbundle wire protocol command, the bulk of the server code needs to remain in core.

1043

AFAICT this capability isn't defined anywhere. Is there a missing patch in this series?

I do agree with the use of a partial-pull capability rather than exposing pull bundles functionality in the capabilities layer. Pull bundles should be purely a server-side implementation detail. It's partial pull that is the generic feature that should be exposed as part of the wire protocol.

1048

prefer_uncompressed doesn't exist.

I know you were talking about compression behavior on IRC. Please do invent whatever APIs you need to communicate to the wire protocol layer that you want to stream data with whatever compression behavior that you need.

This revision now requires changes to proceed.Jan 14 2018, 5:42 PM

Another idea to consider is storing changegroup data or bundle2 parts on disk instead of full bundles. Then, we could stream multiple changegroup parts into a larger bundle2 payload.

That does mean that'd we'd have to compress outgoing data over the wire. So not ideal if minimizing CPU usage is your goal. But we still come out ahead by not having to generate the changegroup data.

This would require a mechanism to generate files of that format, however. Not a huge amount of work. But still work.

I'm just not sure what the sweet spot for this feature is. Is it enough to eliminate changegroup generation overhead. Or are we striving for ~0 CPU usage to service hg pull operations (like what clone bundles achieve)?

I wish we could find a way to send multiple, inline, pre-generated bundles in one response. However, the existing design of compression within the bundle2 format doesn't easily facilitate this. We should think long and hard about whether to implement this feature as partial pull or extend bundle2 to allow us to do nice things and avoid the extra client-server roundtrips.

I was looking at both parts initially. The existing bundle2 compression doesn't work very well for this purpose as it doesn't allow switching compression in the middle. It might be possible to hack something together with a flag to hg bundle to skip the final zero record and depend on the ability of all currently supported compression engines to do concat streams, but that feels like a crude hack. Doing extra client-server roundtrips is a bit annoying and costly, but it also makes the functionality a bit more forgiving for mistakes in the bundle specification. Unlike clonebundles, it requires the records to be much more precise and we certainly don't want to parse the bundles themselve over and over again. It might be nice for some future bundle version to allow easy access to the roots and leaves of the DAG.

Speaking of compression, we probably need some more code around e.g. the mismatch between on-disk compression and wire compression. There is a potential for local bundles to be stored uncompressed and for the server to compress this data as part of streaming it out. That involves CPU. We still come out ahead because it avoids having to generate the bundle content from revlogs. But it isn't as ideal as streaming an already compressed bundle over the wire. We may want to open the bundle2 files and look at their compression info and make sure we don't do stupid things. It should be fast to parse the bundle2 header to obtain this knowledge.

The bundle will be sent without wire compression if supported. A v1 HTTP client will still get the basic zlib framing, but that's not really avoidable. Otherwise, BUNDLESPEC is supposed to make sure that the client knows how to deal with the bundle. The idea is explicitly that the administrator has a free choice over the compression format and the data is send as is. Only crypto overhead should apply.

mercurial/wireproto.py
961–969

As I wrote in the comments on the client side, it seems to be OK now. The heavy lifting is the initial computation of the ancestor sets outside the loop, which is in the worst case an enumeration of all local revisions. That by itself doesn't seem to result in a big performance difference. The rest of the loop depends mostly on the number of roots and leaves of the bundle DAGs. My test case had a total of ~800 and the impact on normal pull operations was in the noise.

981–984

The URL field is synthesized by parseclonebundlesmanifest for the first entry. It isn't tagged in the actual file. As long as the format of the files is otherwise the same, I'd just keep it identical.

1042

The bigger question is whether it should have an associated bundle or protocol capability. The client needs to be able to deal with the partial replies. If we go that way, I agree that it can and should be enabled by default.

The clonebundles extension is mostly for show nowadays, the interesting stuff is all in core.

1048

Sorry, this was a rebase leak.

joerg.sonnenberger marked 5 inline comments as done.Jan 14 2018, 7:03 PM

My biggest concern with the architecture of this feature is the multiple roundtrips. I really wish we could stream multiple bundles off disk to the wire with no decompression/compression involved. That would mean storing compressed bundles on disk. But this would require some additional bundle2 magic. The existing solution is simple and elegant. I do like that. I'd very much like to get the opinion of someone like @durin42 (who also likes designing protocols).

I don't disagree with this goal (fewer roundtrips), but I think it's probably better than nothing to do it with multiple roundtrips.

I'm too rusty on bundle2 at the moment to grok what magic would be required to pre-compress payloads.

joerg.sonnenberger edited the summary of this revision. (Show Details)
joerg.sonnenberger retitled this revision from wireproto: server-side support for pullbundles to wireproto: support for pullbundles.

I'm too rusty on bundle2 at the moment to grok what magic would be required to pre-compress payloads.

The ideal solution would be a way to reset the context for the byte stream. Essentially we'd add a marker telling consumers they've reached EOF of either a bundle2 stream or a compression context. The next byte should be interpreted as a new bundle2 stream or a new compression context.

In the case of zstandard, we can force the end of a zstandard frame and start sending a new frame. As long as the consumer reads all frames as one coherent stream, the output of the decompressor will look as if nothing special happened. I'm not actually sure if python-zstandard supports this though. But it could. For other compression formats, the answer isn't as clear cut.

It's probably best to have an explicit marker in the bundle2 or protocol stream identifying the end of a compression context. Either a "reset at end of part" flag in the part header. Or an explicit bundle2 part that says "reset after me." Either way, we'd need to invent an API on the compression engine that allows the compression context to be reset. This code could get gnarly, as there are various buffers in play. It's possible. But a bit of work. Since we're at code freeze and it looks like this feature will have to wait for 4.6, it looks like we'll have time to implement this feature if we want to go that route. I don't like scope bloating though.

I'm too rusty on bundle2 at the moment to grok what magic would be required to pre-compress payloads.

The ideal solution would be a way to reset the context for the byte stream. Essentially we'd add a marker telling consumers they've reached EOF of either a bundle2 stream or a compression context. The next byte should be interpreted as a new bundle2 stream or a new compression context.

I should add that if clients automatically consumed all bundle2 payloads sent by the server and the compression context was automatically reset after end of bundle2 payload, then the easiest path forward is likely having the client advertise that it can receive multiple bundle2 payloads and then we implement this feature in terms of sending multiple bundle2 payloads. As long as the compression context is flushed, we should be OK. But we model the response stream as a single stream and feed that whole thing into a compressor/decompressor. We'd need to hack up the compression code a bit no matter what path we use.

joerg.sonnenberger edited the summary of this revision. (Show Details)
indygreg requested changes to this revision.Jan 17 2018, 2:22 AM

This is looking much better!

There are some places where the code doesn't abide by our (IMO lousy) mergeallthewords naming convention.

More importantly, I think there are some bugs around hidden changesets. We definitely need some test coverage involving hidden changesets. We should also have some basic tests for phases and bookmarks transfer. We want to be sure that all that data makes it across the wire and mirrors the remote's state. Specifically, I think there are still some bundle2 parts we only emit over the wire and not as part of local bundles generated via hg bundle. If so, then a pull serviced completely by pullbundles may not transfer all metadata. I'm not 100% sure about this. That's what tests will uncover. One thing we could do is ensure the client always sends a non-partial getbundle request to ensure the Mercurial server itself sends over any non-changegroup data. That shouldn't be necessary. And it complicates the wire protocol to communicate if the reply came from a pre-generated bundle. I'm not a huge fan of this... mostly thinking aloud here.

Something else to think about (which is an outstanding problem with clone bundles) is generating the manifests. It is very much an exercise left to the reader. That kinda sucks. At some point we need to make hg bundle emit manifest lines or something. And the bundlespec format needs to be taught about individual bundle2 parts. We're still missing a way to define which parts/data to put in the generated bundle via hg bundle invocation (e.g. should we include obsolescence or phases parts) and how to express the presence of these parts in the bundlespec. I think we can implement this feature without solving those problems. But if hg bundle is producing bundles with required part processing, we won't be. (I'm not sure what hg bundle is doing these days.)

Writing backwards and future compatible data storage and exchange protocols is hard...

hgext/clonebundles.py
61

Nit: "the" instead of "and"

mercurial/exchange.py
1488–1490

I think there are some subtle bugs here around the use of unfiltered heads creeping into the wire protocol because old_heads can contained hidden changesets. The intent behind this code seems sound. But I think it needs to respect node visibility more strictly.

mercurial/wireproto.py
963

This filtering takes the current process's capabilities into consideration. Since this code runs on the server, it is filtering based on the server's abilities, not the client's. That's bad because it means the server can send a bundle that the client can't read.

There will need to be a bit of code written to pass the client's capabilities into this filtering. That will likely require a significant amount of plumbing in exchange.py.

It /might/ be easier to take a different approach completely: choosing the same architecture as clonebundles and having the client download the manifest and fetching the relevant bundles itself. If you squint hard enough, this seems possible. The hard part is discovery. The server could potentially order the bundles based on DAG relationship / define dependencies between bundles based on DAG relationship. I think you've thought about this design already. But just want to check.

tests/test-pull-r.t
149 ↗(On Diff #4846)

I'd separate this into its own test file, since it is highly domain specific.

167–168 ↗(On Diff #4846)

I don't think I've seen this pattern in our .t tests before. I think you want to use hg serve -d --accesslog <path> --errorlog <path> like other tests do. There are no race conditions in daemon mode AFAIK, otherwise many tests would fall into this trap.

266 ↗(On Diff #4846)

That's a pretty bad error message for something the user has no control over (a misconfigured server).

First, the error message should be better. Something like "the server didn't send expected data."

Second, the error message should offer the user a way to disable the feature and fall back to a traditional clone/pull. The clonebundles error messages do this. Read through test-clonebundles.t.

This revision now requires changes to proceed.Jan 17 2018, 2:22 AM
joerg.sonnenberger marked an inline comment as done.Jan 17 2018, 9:15 AM

This is looking much better!

There are some places where the code doesn't abide by our (IMO lousy) mergeallthewords naming convention.

I can rename the intermediate variables, if that is desirable.

More importantly, I think there are some bugs around hidden changesets. We definitely need some test coverage involving hidden changesets.

There is some test coverage from the existing obsolete test cases.

We should also have some basic tests for phases and bookmarks transfer. We want to be sure that all that data makes it across the wire and mirrors the remote's state. Specifically, I think there are still some bundle2 parts we only emit over the wire and not as part of local bundles generated via hg bundle. If so, then a pull serviced completely by pullbundles may not transfer all metadata. I'm not 100% sure about this. That's what tests will uncover. One thing we could do is ensure the client always sends a non-partial getbundle request to ensure the Mercurial server itself sends over any non-changegroup data. That shouldn't be necessary. And it complicates the wire protocol to communicate if the reply came from a pre-generated bundle. I'm not a huge fan of this... mostly thinking aloud here.

This would seem to be a bug in bundles and quite unrelated to this functionality. It would also cover clonebundles as well. I'm somewhat reluctant to increase this patch set even more.

Something else to think about (which is an outstanding problem with clone bundles) is generating the manifests. It is very much an exercise left to the reader. That kinda sucks. At some point we need to make hg bundle emit manifest lines or something. And the bundlespec format needs to be taught about individual bundle2 parts. We're still missing a way to define which parts/data to put in the generated bundle via hg bundle invocation (e.g. should we include obsolescence or phases parts) and how to express the presence of these parts in the bundlespec. I think we can implement this feature without solving those problems. But if hg bundle is producing bundles with required part processing, we won't be. (I'm not sure what hg bundle is doing these days.)

I'm preparing a set a helper scripts separately and there were some discussions about stable history partitioning separately that could also help. The goal still is to make this fully transparent from the client beyond the need for fetching multiple bundles.

mercurial/exchange.py
1488–1490

I expect the original discover to handle the visibility question. The updates here deal strictly with the results of the exchange, i.e. any new head is a result of the pulled bundle as as such visible from remote. Same for the reduction of rheads. The unfiltered access is necessary as soon from the obs test cases.

mercurial/wireproto.py
963

I have to double check the feature checks. The client has notified the server of the supported bundle formats at this point, so it shouldn't be a problem. I'll look into a test case.

I don't think the client is in a reasonable position to make this decisions as it doesn't have the actual DAG at hand. In principle, the server could at a later iteration also estimate the bundle size or similar aspects and that's something the client fundamentally can't do.

tests/test-pull-r.t
266 ↗(On Diff #4846)

I didn't add the error message, it existed already. It's also specific to the update case. I'm somewhat reluctant to expose pullbundles anymore to the user as it is even more supposed to be transparent than clonebundles.

joerg.sonnenberger retitled this revision from wireproto: support for pullbundles to * wireproto: support for pullbundles.

I like where this is headed, but we'll have to plan to land this early in the next cycle, as I'm too chicken to land a feature with protocol implications on the last day of work before we freeze

joerg.sonnenberger edited the summary of this revision. (Show Details)
joerg.sonnenberger retitled this revision from * wireproto: support for pullbundles to wireproto: support for pullbundles.

@joerg.sonnenberger: if you are willing to wait just a week or two more, I think the next version of the wire protocol that I'm writing will make the "pullbundles" feature significantly better. Specifically, servers will be able to send multiple bundle2 bundles or parts with independent compression settings. In other words, a pullbundles aware server-side handler could "stream" pre-generated bundles from disk and then dynamically emit the data not found in any pre-generated bundles, applying compression for just the dynamic bit.

I don't wish to create stop energy and discourage you from working on this feature. But at the same time, I'm reluctant to add new features to the existing wire protocol because the new mechanism will be much, much better and will make features like pullbundles even better.

Updated to the current tree. One open question is an interaction with the narrow extension. Running the narrow tests will show an additional round trip.

The discovery phase currently doesn't know about the narrowspec, so all heads are discovered by the client, but the latter narrowbundle only covers the matching heads. This in turn triggers the partial-pull logic from the change, i.e. the client assumes that the server send a partial reply and asks again. IMO this is a design flaw in the narrow extension and should be addressed properly on that side.

This revision was automatically updated to reflect the committed changes.