This is an archive of the discontinued Mercurial Phabricator instance.

Differential D9155

revlog: don't cache parsed tuples in the C module
ClosedPublic

Authored by joerg.sonnenberger on Oct 6 2020, 7:35 AM.

Download Raw Diff

Details

Reviewers

martinvonz

Group Reviewers

hg-reviewers

Commits

rHG4404f129341e: revlog: don't cache parsed tuples in the C module
rHG9c6f9a1a0d1c: revlog: don't cache parsed tuples in the C module

Summary

A cached entry creates ~8 Python objects per cached changeset, which
comes to around 200 Bytes per cached changeset on AMD64. Especially for
operations that touch a lot of changesets, that can easily sum up to
more than a 100MB of memory. Simple tests on large repositories show
<2% runtime penalty for ripping out the cache, even for cache heavy
operations like "hg log" for all revisions.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

joerg.sonnenberger created this revision.Oct 6 2020, 7:35 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptOct 6 2020, 7:35 AM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

joerg.sonnenberger added a parent revision: D9161: unionrepo: don't insert index tuples with None as int field.Oct 6 2020, 9:26 PM

joerg.sonnenberger added a child revision: D9162: revlog: store new index entries as binary.

Needs some explanation about why not to cache in commit description.

Basically, because the cache in the current form is very expensive in terms of memory use. For a large repository, it can easily consume 100MB+ for something like 2% runtime gain, which doesn't seem a good trade-off.

In D9155#137939, @joerg.sonnenberger wrote:

Basically, because the cache in the current form is very expensive in terms of memory use. For a large repository, it can easily consume 100MB+ for something like 2% runtime gain, which doesn't seem a good trade-off.

Okay. I mean that this description should be in commit message.

joerg.sonnenberger retitled this revision from [WIP] revlog: don't cache parsed tuples in the C module to revlog: don't cache parsed tuples in the C module.Oct 14 2020, 4:28 PM

joerg.sonnenberger edited the summary of this revision. (Show Details)

Seems fine to me. We can always roll it back if we notice unacceptable slowdown. Is it safe to apply this patch without the two before it?

This revision is now accepted and ready to land.Oct 27 2020, 7:46 PM

No, the two parents are necessary because they store invalid data in the index, i.e. they violate the interface contract, but it wasn't enforced so far.

Sorry, let me retract that. Wrong review, this one can be pulled out in isolation.

joerg.sonnenberger removed a parent revision: D9161: unionrepo: don't insert index tuples with None as int field.Oct 29 2020, 4:27 PM

joerg.sonnenberger removed a child revision: D9162: revlog: store new index entries as binary.

joerg.sonnenberger added a child revision: D9162: revlog: store new index entries as binary.

joerg.sonnenberger added a commit: rHG9c6f9a1a0d1c: revlog: don't cache parsed tuples in the C module.Oct 30 2020, 2:42 PM

Closed by commit rHG9c6f9a1a0d1c: revlog: don't cache parsed tuples in the C module (authored by joerg.sonnenberger). · Explain Why

This revision was automatically updated to reflect the committed changes.

I was skeptical when I saw this patch because complexity in C code in the Mercurial codebase tends to exist for good reasons since we tend to treat C code as toxic and a CVE waiting to happen. Anyway, I traced this cache to https://www.mercurial-scm.org/repo/hg-committed/log/a6fde9d789d9/mercurial/cext/revlog.c?patch=&linerange=349:358 / https://www.mercurial-scm.org/repo/hg-committed/rev/2cdd7e63211b.

The cache is essentially preventing a redundant Py_BuildValue(). Rapid construction of PyObject in the revlog index or obsolesence marker C code has been identified as a bottleneck. So I decided to benchmark this against the Firefox repo's changelog. First line is before. Second is after. Using CPython 3.8.6.

$ ./hg perfrevlogindex -R ~/src/firefox -c
! revlog constructor
! wall 0.011219 comb 0.010000 user 0.000000 sys 0.010000 (best of 256)
! wall 0.014012 comb 0.020000 user 0.000000 sys 0.020000 (best of 210)
! read
! wall 0.011319 comb 0.010000 user 0.000000 sys 0.010000 (best of 253)
! wall 0.013859 comb 0.010000 user 0.000000 sys 0.010000 (best of 212)
! create index object
! wall 0.000001 comb 0.000000 user 0.000000 sys 0.000000 (best of 716721)
! wall 0.000001 comb 0.000000 user 0.000000 sys 0.000000 (best of 713905)
! retrieve index entry for rev 0
! wall 0.000282 comb 0.000000 user 0.000000 sys 0.000000 (best of 9580)
! wall 0.000001 comb 0.000000 user 0.000000 sys 0.000000 (best of 678673)
! look up missing node
! wall 0.003180 comb 0.000000 user 0.000000 sys 0.000000 (best of 712)
! wall 0.002469 comb 0.000000 user 0.000000 sys 0.000000 (best of 768)
! look up node at rev 0
! wall 0.004028 comb 0.000000 user 0.000000 sys 0.000000 (best of 713)
! wall 0.003714 comb 0.000000 user 0.000000 sys 0.000000 (best of 780)
! look up node at 1/4 len
! wall 0.003274 comb 0.000000 user 0.000000 sys 0.000000 (best of 874)
! wall 0.003121 comb 0.000000 user 0.000000 sys 0.000000 (best of 918)
! look up node at 1/2 len
! wall 0.002540 comb 0.010000 user 0.010000 sys 0.000000 (best of 1125)
! wall 0.002539 comb 0.000000 user 0.000000 sys 0.000000 (best of 1121)
! look up node at 3/4 len
! wall 0.001798 comb 0.000000 user 0.000000 sys 0.000000 (best of 1547)
! wall 0.001916 comb 0.000000 user 0.000000 sys 0.000000 (best of 1467)
! look up node at tip
! wall 0.000873 comb 0.000000 user 0.000000 sys 0.000000 (best of 2898)
! wall 0.001107 comb 0.000000 user 0.000000 sys 0.000000 (best of 2362)
! look up all nodes (forward)
! wall 0.154415 comb 0.150000 user 0.150000 sys 0.000000 (best of 63)
! wall 0.157524 comb 0.160000 user 0.160000 sys 0.000000 (best of 62)
! look up all nodes 2x (forward)
! wall 0.267256 comb 0.260000 user 0.260000 sys 0.000000 (best of 37)
! wall 0.271308 comb 0.270000 user 0.270000 sys 0.000000 (best of 37)
! look up all nodes (reverse)
! wall 0.080962 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)
! wall 0.081913 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)
! look up all nodes 2x (reverse)
! wall 0.191335 comb 0.190000 user 0.190000 sys 0.000000 (best of 51)
! wall 0.195113 comb 0.190000 user 0.190000 sys 0.000000 (best of 51)
! retrieve all index entries (forward)
! wall 0.173264 comb 0.170000 user 0.120000 sys 0.050000 (best of 56)
! wall 0.110747 comb 0.110000 user 0.110000 sys 0.000000 (best of 89)
! retrieve all index entries 2x (forward)
! wall 0.198171 comb 0.200000 user 0.160000 sys 0.040000 (best of 50)
! wall 0.221631 comb 0.220000 user 0.220000 sys 0.000000 (best of 45)
! retrieve all index entries (reverse)
! wall 0.165797 comb 0.170000 user 0.140000 sys 0.030000 (best of 59)
! wall 0.102428 comb 0.100000 user 0.100000 sys 0.000000 (best of 97)
! retrieve all index entries 2x (reverse)
! wall 0.179876 comb 0.170000 user 0.130000 sys 0.040000 (best of 55)
! wall 0.206158 comb 0.200000 user 0.200000 sys 0.000000 (best of 49)

There's some odd behavior in there and I'm suspecting CPU power throttling is at play. But I don't see any obvious regressions that would raise objections from me.

joerg.sonnenberger added a commit: rHG4404f129341e: revlog: don't cache parsed tuples in the C module.Nov 2 2020, 2:52 PM

joerg.sonnenberger removed a child revision: D9162: revlog: store new index entries as binary.Nov 2 2020, 4:42 PM

Revision Contents
Changeset List

			Path	Packages
M			mercurial/cext/revlog.c (45 lines)

Diff 23388

mercurial/cext/revlog.c

	* With string keys, we lazily perform a reverse mapping from node to			* With string keys, we lazily perform a reverse mapping from node to
	* rev, using a base-16 trie.			* rev, using a base-16 trie.
	*/			*/
	struct indexObjectStruct {			struct indexObjectStruct {
	PyObject_HEAD			PyObject_HEAD
	/* Type-specific fields go here. */			/* Type-specific fields go here. */
	PyObject data; / raw bytes of index */			PyObject data; / raw bytes of index */
	Py_buffer buf; /* buffer of data */			Py_buffer buf; /* buffer of data */
	PyObject *cache; / cached tuples */
	const char *offsets; / populated on demand */			const char *offsets; / populated on demand */
	Py_ssize_t raw_length; /* original number of elements */			Py_ssize_t raw_length; /* original number of elements */
	Py_ssize_t length; /* current number of elements */			Py_ssize_t length; /* current number of elements */
	PyObject added; / populated on demand */			PyObject added; / populated on demand */
	PyObject headrevs; / cache, invalidated on changes */			PyObject headrevs; / cache, invalidated on changes */
	PyObject filteredrevs; / filtered revs set */			PyObject filteredrevs; / filtered revs set */
	nodetree nt; /* base-16 trie */			nodetree nt; /* base-16 trie */
	int ntinitialized; /* 0 or 1 */			int ntinitialized; /* 0 or 1 */
	*/			*/
	static PyObject index_get(indexObject self, Py_ssize_t pos)			static PyObject index_get(indexObject self, Py_ssize_t pos)
	{			{
	uint64_t offset_flags;			uint64_t offset_flags;
	int comp_len, uncomp_len, base_rev, link_rev, parent_1, parent_2;			int comp_len, uncomp_len, base_rev, link_rev, parent_1, parent_2;
	const char *c_node_id;			const char *c_node_id;
	const char *data;			const char *data;
	Py_ssize_t length = index_length(self);			Py_ssize_t length = index_length(self);
	PyObject *entry;

	if (pos == nullrev) {			if (pos == nullrev) {
	Py_INCREF(nullentry);			Py_INCREF(nullentry);
	return nullentry;			return nullentry;
	}			}

	if (pos < 0 \|\| pos >= length) {			if (pos < 0 \|\| pos >= length) {
	PyErr_SetString(PyExc_IndexError, "revlog index out of range");			PyErr_SetString(PyExc_IndexError, "revlog index out of range");
	return NULL;			return NULL;
	}			}

	if (pos >= self->length) {			if (pos >= self->length) {
	PyObject *obj;			PyObject *obj;
	obj = PyList_GET_ITEM(self->added, pos - self->length);			obj = PyList_GET_ITEM(self->added, pos - self->length);
	Py_INCREF(obj);			Py_INCREF(obj);
	return obj;			return obj;
	}			}

	if (self->cache) {
	if (self->cache[pos]) {
	Py_INCREF(self->cache[pos]);
	return self->cache[pos];
	}
	} else {
	self->cache = calloc(self->raw_length, sizeof(PyObject *));
	if (self->cache == NULL)
	return PyErr_NoMemory();
	}

	data = index_deref(self, pos);			data = index_deref(self, pos);
	if (data == NULL)			if (data == NULL)
	return NULL;			return NULL;

	offset_flags = getbe32(data + 4);			offset_flags = getbe32(data + 4);
	if (pos == 0) /* mask out version number for the first entry */			if (pos == 0) /* mask out version number for the first entry */
	offset_flags &= 0xFFFF;			offset_flags &= 0xFFFF;
	else {			else {
	uint32_t offset_high = getbe32(data);			uint32_t offset_high = getbe32(data);
	offset_flags \|= ((uint64_t)offset_high) << 32;			offset_flags \|= ((uint64_t)offset_high) << 32;
	}			}

	comp_len = getbe32(data + 8);			comp_len = getbe32(data + 8);
	uncomp_len = getbe32(data + 12);			uncomp_len = getbe32(data + 12);
	base_rev = getbe32(data + 16);			base_rev = getbe32(data + 16);
	link_rev = getbe32(data + 20);			link_rev = getbe32(data + 20);
	parent_1 = getbe32(data + 24);			parent_1 = getbe32(data + 24);
	parent_2 = getbe32(data + 28);			parent_2 = getbe32(data + 28);
	c_node_id = data + 32;			c_node_id = data + 32;

	entry = Py_BuildValue(tuple_format, offset_flags, comp_len, uncomp_len,			return Py_BuildValue(tuple_format, offset_flags, comp_len, uncomp_len,
	base_rev, link_rev, parent_1, parent_2, c_node_id,			base_rev, link_rev, parent_1, parent_2, c_node_id,
	(Py_ssize_t)20);			(Py_ssize_t)20);

	if (entry) {
	PyObject_GC_UnTrack(entry);
	Py_INCREF(entry);
	}

	self->cache[pos] = entry;

	return entry;
	}			}

	/*			/*
	* Return the 20-byte SHA of the node corresponding to the given rev.			* Return the 20-byte SHA of the node corresponding to the given rev.
	*/			*/
	static const char index_node(indexObject self, Py_ssize_t pos)			static const char index_node(indexObject self, Py_ssize_t pos)
	{			{
	Py_ssize_t length = index_length(self);			Py_ssize_t length = index_length(self);
	index_invalidate_added(self, 0);			index_invalidate_added(self, 0);
	if (self->ntrev > start)			if (self->ntrev > start)
	self->ntrev = (int)start;			self->ntrev = (int)start;
	} else if (self->added) {			} else if (self->added) {
	Py_CLEAR(self->added);			Py_CLEAR(self->added);
	}			}

	self->length = start;			self->length = start;
	if (start < self->raw_length) {			if (start < self->raw_length)
	if (self->cache) {
	Py_ssize_t i;
	for (i = start; i < self->raw_length; i++)
	Py_CLEAR(self->cache[i]);
	}
	self->raw_length = start;			self->raw_length = start;
	}
	goto done;			goto done;
	}			}

	if (self->ntinitialized) {			if (self->ntinitialized) {
	index_invalidate_added(self, start - self->length);			index_invalidate_added(self, start - self->length);
	if (self->ntrev > start)			if (self->ntrev > start)
	self->ntrev = (int)start;			self->ntrev = (int)start;
	}			}
	{			{
	PyObject data_obj, inlined_obj;			PyObject data_obj, inlined_obj;
	Py_ssize_t size;			Py_ssize_t size;

	/* Initialize before argument-checking to avoid index_dealloc() crash.			/* Initialize before argument-checking to avoid index_dealloc() crash.
	*/			*/
	self->raw_length = 0;			self->raw_length = 0;
	self->added = NULL;			self->added = NULL;
	self->cache = NULL;
	self->data = NULL;			self->data = NULL;
	memset(&self->buf, 0, sizeof(self->buf));			memset(&self->buf, 0, sizeof(self->buf));
	self->headrevs = NULL;			self->headrevs = NULL;
	self->filteredrevs = Py_None;			self->filteredrevs = Py_None;
	Py_INCREF(Py_None);			Py_INCREF(Py_None);
	self->ntinitialized = 0;			self->ntinitialized = 0;
	self->offsets = NULL;			self->offsets = NULL;

	static PyObject index_nodemap(indexObject self)			static PyObject index_nodemap(indexObject self)
	{			{
	Py_INCREF(self);			Py_INCREF(self);
	return (PyObject *)self;			return (PyObject *)self;
	}			}

	static void _index_clearcaches(indexObject *self)			static void _index_clearcaches(indexObject *self)
	{			{
	if (self->cache) {
	Py_ssize_t i;

	for (i = 0; i < self->raw_length; i++)
	Py_CLEAR(self->cache[i]);
	free(self->cache);
	self->cache = NULL;
	}
	if (self->offsets) {			if (self->offsets) {
	PyMem_Free((void *)self->offsets);			PyMem_Free((void *)self->offsets);
	self->offsets = NULL;			self->offsets = NULL;
	}			}
	if (self->ntinitialized) {			if (self->ntinitialized) {
	nt_dealloc(&self->nt);			nt_dealloc(&self->nt);
	}			}
	self->ntinitialized = 0;			self->ntinitialized = 0;

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	23058		Oct 6 2020, 7:35 AM	★	★
Diff 2	23388	rHG9c6f9a1a0d1ca4eae292ccc74eb935ef8ecfaa8c	Oct 6 2020, 7:34 AM	★	★