This is an archive of the discontinued Mercurial Phabricator instance.

Differential D2885

RFC: use Redis to cache file data
AbandonedPublic

Authored by indygreg on Mar 16 2018, 7:07 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Summary

This [very hacky] commit implements a new files store that uses
Redis to cache files data. It services requests from Redis (if
available) and falls back to a base store / interface (revlogs in
our case) on misses.

The purpose of this commit is to first demonstrate the value in having
interfaces for storage. If we code to the interface, then another
interface can come along and do useful things - like caching.

The other purpose was to investigate performance. Would a memory-backed
key-value store have a significant impact on performance of our
experimental wire protocol command to serve file data fulltexts for
a specific revisions? The answer is a very resounding yet!

Using the same mozilla-unified revision from the previous commit:

no compression: 1478MB; ~94s wall; ~56s CPU w/ hot redis: 1478MB; ~9.6s wall; ~8.6s CPU
zstd level 3: 343MB; ~97s wall; ~57s CPU w/ hot redis: 343MB; ~8.5s wall; ~8.3s CPU
zstd level 1 w/ hot redis: 377MB; ~6.8s wall; ~6.6s CPU
zlib level 6: 367MB; ~116s wall; ~74s CPU w/ hot redis: 367MB; ~36.7s wall; ~36s CPU

For the curious, the ls profiler says that our hotspot without
compression is in socket I/O. With zstd compression, the hotspot is
compression.

I reckon the reason for the socket I/O overhead is because we end up
writing tons more chunks on the wire when uncompressed (compression
will effectively ensure each output chunk is a similar, large'ish
size). All those extra Python function calls and system calls do add
up!

Anyway, I'm definitely happy with the performance improvements. I'd
say this was a useful experiment!

Diff Detail

Repository

rHG Mercurial

Lint

Lint Skipped

Unit

Unit Tests Skipped

Event Timeline

indygreg created this revision.Mar 16 2018, 7:07 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptMar 16 2018, 7:07 PM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

This was just an RFC. It doesn't need to be an open review.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/localrepo.py (6 lines)
M			mercurial/revlogstore.py (48 lines)

Status	Author	Revision
Closed	indygreg	D2987 stringutil: add function to pretty print an object
Closed	indygreg	D2986 wireproto: add frame flag to denote payloads as CBOR
Closed	indygreg	D2985 wireproto: implement custom __repr__ for frame
Closed	indygreg	D2984 keepalive: implement readinto()
Closed	indygreg	D2983 wireproto: port protocol handler to zope.interface
Closed	indygreg	D2982 wireproto: separate commands tables for version 1 and 2 commands
Closed	indygreg	D2981 wireproto: mark SSHv2 as a version 1 transport
Closed	indygreg	D2979 wireproto: stop aliasing wire protocol types (API)
Closed	indygreg	D2951 wireproto: use CBOR for command requests
Closed	indygreg	D2902 wireproto: define frame to represent progress updates
Closed	indygreg	D2948 wireproto: syntax for encoding CBOR into frames
Closed	indygreg	D2947 wireproto: explicit API to create outgoing streams
Closed	indygreg	D2907 wireproto: add streams to frame-based protocol
Closed	indygreg	D2906 wireproto: start to associate frame generation with a stream
Closed	indygreg	D2950 tests: fix duplicate and failing test
Closed	indygreg	D2978 cbor: import CBORDecoder and CBOREncoder
Closed	indygreg	D2949 setup: install cbor packages
Abandoned	indygreg	D2885 RFC: use Redis to cache file data
Changes Planned	indygreg	D2884 wireproto: experimental command to emit file data
Abandoned	indygreg	D2883 revlogstore: create and implement an interface for repo files storage
Closed	indygreg	D2901 wireproto: explicitly track which requests are active
Closed	indygreg	D2900 wireproto: use named arguments when passing around frame data
Closed	indygreg	D2899 wireproto: define attr-based classes for representing frames
Closed	indygreg	D2872 wireproto: define human output side channel frame
Closed	indygreg	D2871 wireproto: service multiple command requests per HTTP request
Closed	indygreg	D2870 wireproto: support for receiving multiple requests
Closed	indygreg	D2869 wireproto: add request IDs to frames
Closed	indygreg	D2860 wireproto: buffer output frames when in half duplex mode
Closed	indygreg	D2858 wireproto: define and implement responses in framing protocol
Closed	indygreg	D2857 wireproto: implement basic command dispatching for HTTPv2
Closed	indygreg	D2856 wireproto: nominally don't expose "batch" to version 2 wire transports
Closed	indygreg	D2852 wireproto: implement basic frame reading and processing
Closed	indygreg	D2851 wireproto: define and implement protocol for issuing requests
Closed	indygreg	D2868 util: prefer "bytesio" to "stringio"
Closed	indygreg	D2850 wireproto: define content negotiation for HTTPv2
Closed	indygreg	D2849 hgweb: also set Content-Type header
Closed	indygreg	D2837 wireproto: require POST for all HTTPv2 requests
Closed	indygreg	D2836 wireproto: define permissions-based routing of HTTPv2 wire protocol
Closed	indygreg	D2834 wireproto: support /api/* URL space for exposing APIs
Closed	indygreg	D2843 url: support suppressing Accept header
Closed	indygreg	D2842 util: don't log low-level I/O calls for HTTP peer
Closed	indygreg	D2841 debugcommands: support sending HTTP requests with debugwireproto
Closed	indygreg	D2726 debugcommands: support connecting to HTTP peers
Closed	indygreg	D2722 url: add HTTP handler that uses a proxied socket
Closed	indygreg	D2721 util: observable proxy objects for sockets
Closed	indygreg	D2840 hgweb: allow defining Server response header for HTTP server
Closed	indygreg	D2839 tests: use $HTTP_DATE$ for Date header
Closed	indygreg	D2720 debugcommands: introduce actions to perform deterministic reads
Closed	indygreg	D2725 httppeer: refactor how httppeer is created (API)
Closed	indygreg	D2724 httppeer: alias url as urlmod
Closed	indygreg	D2723 httppeer: consolidate _requestbuilder assignments and document
Closed	indygreg	D2832 hgweb: remove wsgirequest (API)
Closed	indygreg	D2831 hgweb: store the raw WSGI environment dict
Closed	indygreg	D2830 hgweb: remove dead wsgirequest code
Closed	indygreg	D2829 hgweb: port to new response API
Closed	indygreg	D2828 hgweb: pass modern request type into templater()
Closed	indygreg	D2827 hgweb: use modern response type for index generation
Closed	indygreg	D2826 hgweb: don't pass wsgireq to makeindex and other functions
Closed	indygreg	D2825 hgweb: replace PATH_INFO with dispatchpath
Closed	indygreg	D2824 hgweb: rewrite path generation for index entries
Closed	indygreg	D2823 hgweb: construct {url} with req.apppath
Closed	indygreg	D2822 hgweb: support constructing URLs from an alternate base URL
Closed	indygreg	D2821 hgweb: clarify that apppath begins with a forward slash
Closed	indygreg	D2820 hgweb: change how dispatch path is reported
Closed	indygreg	D2819 hgweb: refactor repository name URL parsing
Closed	indygreg	D2818 tests: add test coverage for parsing WSGI requests
Closed	indygreg	D2817 hgweb: construct static URL like hgweb does
Closed	indygreg	D2816 hgweb: remove unused **map argument
Closed	indygreg	D2815 hgweb: extract entries() to standalone function
Closed	indygreg	D2814 hgweb: move rawentries() to a standalone function
Closed	indygreg	D2813 hgweb: move archivelist to standalone function
Closed	indygreg	D2812 hgweb: move readallowed to a standalone function
Closed	indygreg	D2805 hgweb: remove some use of wsgireq in hgwebdir
Closed	indygreg	D2804 hgweb: fix a bug due to variable name typo
Closed	indygreg	D2803 hgweb: stop passing req and tmpl into @webcommand functions (API)
Closed	indygreg	D2802 hgweb: pass modern request type into various webutil functions (API)
Closed	indygreg	D2801 hgweb: don't redundantly pass templater with requestcontext (API)
Closed	indygreg	D2800 hgweb: use templater on requestcontext instance
Closed	indygreg	D2799 hgweb: add a sendtemplate() helper function
Closed	indygreg	D2798 hgweb: use web.req instead of req.req
Closed	indygreg	D2797 hgweb: stop setting headers on wsgirequest
Closed	indygreg	D2796 hgweb: always return iterable from @webcommand functions (API)
Closed	indygreg	D2795 hgweb: send errors using new response API
Closed	indygreg	D2794 hgweb: refactor 304 handling code
Closed	indygreg	D2793 hgweb: transition permissions hooks to modern request type (API)
Closed	indygreg	D2792 hgweb: port archive command to modern response API
Closed	indygreg	D2791 hgweb: refactor fake file object proxy for archiving
Closed	indygreg	D2790 tests: additional test coverage of archive web command
Closed	indygreg	D2789 hgweb: port static file handling to new response API
Closed	indygreg	D2788 hgweb: remove one-off routing for file?style=raw
Closed	indygreg	D2787 hgweb: port most @webcommand to use modern response type
Closed	indygreg	D2786 hgweb: support using new response object for web commands
Closed	indygreg	D2785 hgweb: inline caching() and port to modern mechanisms
Closed	indygreg	D2784 hgweb: expose repo name on parsedrequest
Closed	indygreg	D2783 hgweb: expose URL scheme and REMOTE_* attributes
Closed	indygreg	D2782 hgweb: remove wsgirequest.form (API)
Closed	indygreg	D2781 hgweb: perform all parameter lookup via qsparams
Closed	indygreg	D2780 hgweb: set variables in qsparams
Closed	indygreg	D2779 hgweb: use our new request object for "style" parameter
Closed	indygreg	D2776 hgweb: use a multidict for holding query string parameters
Closed	indygreg	D2775 hgweb: create dedicated type for WSGI responses
Closed	indygreg	D2778 tests: add test for a wire protocol request to wrong base URL
Closed	indygreg	D2773 hgweb: remove support for short query string based aliases (BC)
Closed	indygreg	D2774 hgweb: remove support for POST form data (BC)
Closed	indygreg	D2771 hgweb: expose input stream on parsed WSGI request object
Closed	indygreg	D2770 hgweb: make parsedrequest part of wsgirequest
Closed	indygreg	D2769 hgweb: refactor the request draining code
Closed	indygreg	D2768 hgweb: use a capped reader for WSGI input stream
Closed	indygreg	D2767 hgweb: document continuereader
Closed	indygreg	D2749 hgweb: remove wsgirequest.__iter__
Closed	indygreg	D2748 hgweb: remove wsgirequest.read()
Closed	indygreg	D2747 hgweb: remove unused methods on wsgirequest
Closed	indygreg	D2746 wireprotoserver: remove unused argument from _handlehttperror()
Closed	indygreg	D2745 hgweb: store and use request method on parsed request
Closed	indygreg	D2744 hgweb: handle CONTENT_LENGTH
Closed	indygreg	D2743 wireprotoserver: access headers through parsed request
Closed	indygreg	D2742 hgweb: parse and store HTTP request headers
Closed	indygreg	D2741 wireprotoserver: remove broken optimization for non-httplib client
Closed	indygreg	D2740 wireprotoserver: move all wire protocol handling logic out of hgweb
Closed	indygreg	D2739 hgweb: use parsed request to construct query parameters
Closed	indygreg	D2738 hgweb: only recognize wire protocol commands from query string (BC)
Closed	indygreg	D2737 hgweb: teach WSGI parser about query strings
Closed	indygreg	D2736 hgweb: use the parsed application path directly
Closed	indygreg	D2735 hgweb: use computed base URL from parsed request
Closed	indygreg	D2734 hgweb: parse WSGI request into a data structure
Closed	indygreg	D2733 hgweb: always use "?" when writing session vars
Closed	indygreg	D2732 hgweb: rename req to wsgireq
Closed	indygreg	D2731 hgweb: validate WSGI environment dict
Closed	indygreg	D2730 hgweb: ensure all wsgi environment values are str

Diff 7082

mercurial/localrepo.py

	self.cachevfs.createmode = self.store.createmode			self.cachevfs.createmode = self.store.createmode
	if (self.ui.configbool('devel', 'all-warnings') or			if (self.ui.configbool('devel', 'all-warnings') or
	self.ui.configbool('devel', 'check-locks')):			self.ui.configbool('devel', 'check-locks')):
	if util.safehasattr(self.svfs, 'vfs'): # this is filtervfs			if util.safehasattr(self.svfs, 'vfs'): # this is filtervfs
	self.svfs.vfs.audit = self._getsvfsward(self.svfs.vfs.audit)			self.svfs.vfs.audit = self._getsvfsward(self.svfs.vfs.audit)
	else: # standard vfs			else: # standard vfs
	self.svfs.audit = self._getsvfsward(self.svfs.audit)			self.svfs.audit = self._getsvfsward(self.svfs.audit)
	self._applyopenerreqs()			self._applyopenerreqs()
	self.filesstore = revlogstore.revlogfilesstore(self.svfs)			import redis
				basefilesstore = revlogstore.revlogfilesstore(self.svfs)
				redisconn = redis.StrictRedis(host='localhost', port=6379, db=0)
				self.filesstore = revlogstore.redisacceleratedrevlogfilesstore(
				redisconn, basefilesstore)

	if create:			if create:
	self._writerequirements()			self._writerequirements()

	self._dirstatevalidatewarned = False			self._dirstatevalidatewarned = False

	self._branchcaches = {}			self._branchcaches = {}
	self._revbranchcache = None			self._revbranchcache = None

mercurial/revlogstore.py

	continue			continue

	if fl.iscensored(rev):			if fl.iscensored(rev):
	yield 'censored', path, node, None			yield 'censored', path, node, None
	continue			continue

	data = fl.read(node)			data = fl.read(node)
	yield 'ok', path, node, data			yield 'ok', path, node, data

				def redisfiledatakey(path, node):
				return b'filedata:%s:%s' % (path, node)

				class redisacceleratedrevlogfilesstore(repository.basefilesstore):
				""""A filesstore that can use a redis server to speed up operations."""
				def __init__(self, redis, basestore):
				self._redis = redis
				self._basestore = basestore

				def resolvefilesdata(self, entries):
				# Our strategy is to batch requests to redis because this is faster
				# than a command for every entry.

				batch = []
				for i, entry in enumerate(entries):
				batch.append(entry)

				if i and not i % 1000:
				for res in self._processfiledatabatch(batch):
				yield res

				batch = []

				if batch:
				for res in self._processfiledatabatch(batch):
				yield res

				def _processfiledatabatch(self, batch):
				keys = [redisfiledatakey(path, node) for path, node in batch]

				missing = []

				for i, redisdata in enumerate(self._redis.mget(keys)):
				path, node = batch[i]

				if redisdata is None:
				missing.append((path, node))
				else:
				yield 'ok', path, node, redisdata

				# Now resolve all the missing data from the base store.
				for res, path, node, data in self._basestore.resolvefilesdata(missing):
				yield res, path, node, data

				# Don't forget to cache it!
				if res == 'ok':
				self._redis.set(redisfiledatakey(path, node), data)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	7082			Mar 16 2018, 7:07 PM	★	★