This is an archive of the discontinued Mercurial Phabricator instance.

wireprotov2: add an extension to cache wireproto v2 responses in S3
AbandonedPublic

Authored by sheehan on Oct 26 2018, 5:16 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

With wireprotocol version two introducing command response caching
and enabling content redirect responses, it is possible to store
response objects in an arbitrary blob store and send clients to
the store to retrieve large responses. This commit adds an extension
which implements such wire protocol caching in Amazon S3.

Servers add their AWS access key and key ID to an hgrc config,
and specify the name of the S3 bucket which holds the objects.
When a cache lookup request comes in, the cacher sends a HEAD
request to S3 which will return a 404 if the object does not
exist (ie a cache miss). If the request is a cache hit, a presigned
url for the object is generated and used to issue a content
redirect response which is sent to the client. If the response
indicates a cache miss, the response is generated by the server
and buffered in the cache until onfinished is called. During
onfinished, we calculate the size of the response and can
optionally avoid caching if the response is below a configured
minimum threshold. Otherwise we insert the object into the
cache bucket using the put_object API.

To test this extension, we require the moto mock AWS library.
Specifically, we use the "standalone server" functionality,
which creates a Flask application that imitates S3. A new hghave
predicate is added to check for this functionality before
testing.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

sheehan created this revision.Oct 26 2018, 5:16 PM

Throwing this up for review now, but there are a few things that could be done to improve this. A cache expiration policy might be useful, but is difficult to test with the S3 bucket expiration rules. It may also be desirable to be able to specify more than one S3 bucket/region/account in the future.

@indygreg will have more thoughts when he returns, I'm sure. :)

hgext/s3wireprotocache.py
185

This is needed for determinism in testing, but there is likely a better way to avoid it that checking for an alternative endpoint url.

Is this useful enough to others that it should live in the hg core repo? It doesn't seem like it to me, but maybe I'm wrong.

Is this useful enough to others that it should live in the hg core repo? It doesn't seem like it to me, but maybe I'm wrong.

My thought process was that since the new wire protocol supports caching command responses but does not actually provide any cache implementations, including some optional OOB support for something as common as S3 would be useful for anyone considering use of that feature.

Maybe that's not enough reason to justify an extension in the core repo, I'm not certain. Either way, we'll be deploying this to Mozilla's hg servers in the next few months and testing it out. Perhaps after it's been in production for some time we will have a stronger case for inclusion in core. :)

Is this useful enough to others that it should live in the hg core repo? It doesn't seem like it to me, but maybe I'm wrong.

I think having plug-and-play caching solutions in the official Mercurial distribution would be an extremely compelling product feature. We could tell people "just install Mercurial and add these config options to make your server scale nearly effortlessly." That's a killer feature IMO.

S3 is pretty popular as a key-value store and I think there is a market for it.

Obviously other cache backends would be useful too. And if we move forward with cache backends in core, we should be prepared to support GCP, Redis, other backends. Whether those are supported in the same extension or in separate extensions, I'm not sure. Time will tell.

sheehan abandoned this revision.Nov 2 2018, 11:00 AM

Either way, we'll be deploying this to Mozilla's hg servers in the next few months and testing it out. Perhaps after it's been in production for some time we will have a stronger case for inclusion in core. :)

Going to deploy and maintain this at Mozilla for the time being and consider moving into core at a later time.