This is an archive of the discontinued Mercurial Phabricator instance.

hgwebdir: avoid systematic full garbage collection
ClosedPublic

Authored by gracinet on Jul 20 2021, 12:19 PM.

Download Raw Diff

Details

Reviewers

pulkit
av6

Group Reviewers

hg-reviewers

Commits

rHGde2e04fe4897: hgwebdir: avoid systematic full garbage collection

Summary

Forcing a systematic full garbage collection upon each request
can serioulsy harm performance. This is reported as
https://bz.mercurial-scm.org/show_bug.cgi?id=6075

With this change we're performing the full collection according
to a new setting, experimental.web.full-garbage-collection-rate.
The default value is 1, which doesn't change the behavior and will
allow us to test on real use cases. If the value is 0, no full garbage
collection occurs.

Regardless of the value of the setting, a partial garbage collection
still occurs upon each request (not attempting to collect objects from
the oldest generation). This should be enough to take care of
reference cycles that have been created by the last request
(assessment of this requires changing the setting, not to be 1).

In my experience chasing memory leaks in Mercurial servers,
the full collection never reclaimed any memory, but this is with
Python 3 and biased towards small repositories.

On the other hand, as explained in the Python developer docs [1],
frequent full collections are very harmful in terms of performance if
lots of objects survive the collection, and hence stay in the
oldest generation. Note that gc.collect() is indeed trying to
collect the oldest generation [2]. This happens usually in two cases:

unwanted lingering objects (i.e., an actual memory leak that the GC cannot do anything about). Sadly, we have lots of those these days.
desireable long-term objects, typically in caches (not inner caches carried by repositories, which should be collected with them). This is a subject of interest for the Heptapod project.

In short, the flat rate that this change still permits is
probably a bad idea in most cases, and the default value can
be tweaked later on (or even be set to 0) according to experiments
in the wild.

The test is inspired from test-hgwebdir-paths.py

[1] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation
[2] https://docs.python.org/3/library/gc.html#gc.collect

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

gracinet created this revision.Jul 20 2021, 12:19 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptJul 20 2021, 12:19 PM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

pulkit accepted this revision.Jul 20 2021, 3:19 PM

This revision is now accepted and ready to land.Jul 20 2021, 3:19 PM

Thank you for caring about hgweb, it doesn't get this treatment often.

It is possible, though, that a value like 100 or 1000 could be a good trade-off if someone has a repository for which the GC can actually mitigate excessive memory usage.

I feel that you're downplaying the problem. The original ff2370a70fe8 states that every raw-file request to e.g. firefox repo leaks ~100 MB per request, and I don't think people would like to have *each* hgwebdir process get to 10-100 GB before it gets gc'd.

Here's how to check if the issue is still present in the current code on python3:

hg serve --web-conf=foo.conf

[paths]
/ = /path/to/repos/*

I'm going to use hg-committed as an example repo because it's reasonably sized and readily available. Just browsing around in an hg-committed clone locally makes the hg serve process to quickly grow to 1 G rss over practically nothing (first page of log, directory listing, tags, branches, etc). Grows by 100 to 300 MB per request. Now, I know hgweb is supposed to serve only actual generated content from the repo and we're here making it serve static files as well, but this memory leaking behavior depends on the way hgweb is deployed, and even in perfect setups this problem can manifest itself if e.g. the WSGI runner decides to use multiple threads or adjust gc frequency (or if a random spider starts hitting all the URLs on the server). I haven't actually figured out what makes hgweb in gunicorn leak, even though it shouldn't be multithreaded, I think? It was a long time ago, but I remember that gc.collect() at least made hgweb processes manageable for a small vps.

desireable long-term objects, typically in caches. This is an area of interest of mine.

When I looked at why hg-committed repo takes so much memory, the biggest consumer by far was obsstore and its cached attributes. obsstore is not only fitting the entirety of .hg/store/obsstore (hundreds of thousands of obsmarkers) into memory in a not very memory-friendly format, it's doing it multiple times. obsstore.successors and obsstore.predecessors basically contain the same obsmarkers, just reorganized for different uses. They all use basic python structures. This takes about 300 MB of memory for every instance of hg-committed repo. And if you create an instance of that for every request, you get crazy memory consumption way before python figures out that maybe it should collect some unused objects.

In fact, python does eventually collect garbage on its own, but it takes like 15 requests. So full-garbage-collection-rate of 100 (let alone 1000) doesn't change anything, since the process will either collect on its own, or it'll grow in size so much that it gets a visit from OOM killer.

@av6 thanks for the detailed perspective. I will make testing along the lines you suggest tomorrow (it's late here, now), but this here is worrying:

I'm going to use hg-committed as an example repo because it's reasonably sized and readily available. Just browsing around in an hg-committed clone locally makes the hg serve process to quickly grow to 1 G rss over practically nothing (first page of log, directory listing, tags, branches, etc). Grows by 100 to 300 MB per request.

The thing is, unless I'm badly mistaken, hgwebdir creates a new localrepo object for each request (that's why I'd like it eventually to use a repo cache). So if those figures you're quoting are without this patch it means that

the GC call per request is useless
the leak is very worrying

So, was it with or without the patch applied? Was it with hgwebdir or just hg serve, by the way ?

In fact, python does eventually collect garbage on its own, but it takes like 15 requests. So full-garbage-collection-rate of 100 (let alone 1000) doesn't change anything, since the process will either collect on its own, or it'll grow in size so much that it gets a visit from OOM killer.

Perhaps a`gc.collect(generation=1)` would be in order. That one would be more acceptable on each request.

✅ refresh by Heptapod after a successful CI run (🐙 💚)

So, was it with or without the patch applied?

These were the figures to show what happens without ff2370a70fe8. More specifically, it's current hg built from public default of hg-committed, but with the gc.collect() line commented out. The leak is indeed very worrying, and currently this gc.collect() on every request is the only solution that we have.

Was it with hgwebdir or just hg serve, by the way ?

hgwebdir, which is used if you provide --web-conf to hg serve.

Perhaps a`gc.collect(generation=1)` would be in order.

I tried it and it was kinda middle of the road. Here's what hgwebdir's behavior with the previously described setup looks like (memory sizes approximate):
no gc: grows in size rapidly, shrinks rarely, growth happens on every request, easily gets to 2-2.5 G, then drops to 390 MB
gc.collect(generation=1): grows to 390 MB, stays there half of the time, but can easily grow to 1 G, drops to 70 MB every so often
gc.collect(): grows to 390 MB and usually stays there, occasionally spikes to 750 MB or drops to 70 MB

One thing to keep in mind is that this is not a production setup: a simple hg serve will serve static files as well, and this, I believe, gives python more chances to start a gc after a request. For production the memory leak problem is worse, because there are no static file requests (less possibility of a "natural" gc), but every request creates a repo (guaranteed memory consumption).

Bottom line, it looks to be more risky than I thought, and we don't have time to investigate before 5.9rc, so I've changed it to

not change the default behavior (new setting default is 1)
have gc.collect(generation=1) on each request.

Therefore, this can be used easily to experiment with various repos, and it still solves bug 6075, whose reporter can now use the
setting instead of patching out the gc.collect().

Also the setting is now in the experimental namespace.

Okay, let's compromise.

pulkit accepted this revision.Jul 21 2021, 4:55 PM

Should this go to stable or should it wait the end of the freeze? I'm not too sure.

gracinet added a commit: rHGde2e04fe4897: hgwebdir: avoid systematic full garbage collection.Jul 30 2021, 11:11 AM

Closed by commit rHGde2e04fe4897: hgwebdir: avoid systematic full garbage collection (authored by gracinet). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

		Path
M		mercurial/configitems.py (5 lines)
M		mercurial/hgweb/hgwebdir_mod.py (32 lines)
A	M	tests/test-hgwebdir-gc.py (49 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	29605		Jul 20 2021, 12:19 PM	★	★
Diff 2	29703		Jul 21 2021, 8:07 AM	★	★
Diff 3	29750	rHGde2e04fe4897a554b9ef433167f11ea4feb2e09c	Jul 20 2021, 11:20 AM	★	★

Diff 29750

mercurial/configitems.py

	)			)
	coreconfigitem(			coreconfigitem(
	b'experimental',			b'experimental',
	b'web.api.debugreflect',			b'web.api.debugreflect',
	default=False,			default=False,
	)			)
	coreconfigitem(			coreconfigitem(
	b'experimental',			b'experimental',
				b'web.full-garbage-collection-rate',
				default=1, # still forcing a full collection on each request
				)
				coreconfigitem(
				b'experimental',
	b'worker.wdir-get-thread-safe',			b'worker.wdir-get-thread-safe',
	default=False,			default=False,
	)			)
	coreconfigitem(			coreconfigitem(
	b'experimental',			b'experimental',
	b'worker.repository-upgrade',			b'worker.repository-upgrade',
	default=False,			default=False,
	)			)

mercurial/hgweb/hgwebdir_mod.py


	def __init__(self, conf, baseui=None):			def __init__(self, conf, baseui=None):
	self.conf = conf			self.conf = conf
	self.baseui = baseui			self.baseui = baseui
	self.ui = None			self.ui = None
	self.lastrefresh = 0			self.lastrefresh = 0
	self.motd = None			self.motd = None
	self.refresh()			self.refresh()
				self.requests_count = 0
	if not baseui:			if not baseui:
	# set up environment for new ui			# set up environment for new ui
	extensions.loadall(self.ui)			extensions.loadall(self.ui)
	extensions.populateui(self.ui)			extensions.populateui(self.ui)

	def refresh(self):			def refresh(self):
	if self.ui:			if self.ui:
	refreshinterval = self.ui.configint(b'web', b'refreshinterval')			refreshinterval = self.ui.configint(b'web', b'refreshinterval')
	repo = os.path.normpath(path)			repo = os.path.normpath(path)
	name = util.pconvert(repo)			name = util.pconvert(repo)
	if name.startswith(prefix):			if name.startswith(prefix):
	name = name[len(prefix) :]			name = name[len(prefix) :]
	repos.append((name.lstrip(b'/'), repo))			repos.append((name.lstrip(b'/'), repo))

	self.repos = repos			self.repos = repos
	self.ui = u			self.ui = u
				self.gc_full_collect_rate = self.ui.configint(
				b'experimental', b'web.full-garbage-collection-rate'
				)
				self.gc_full_collections_done = 0
	encoding.encoding = self.ui.config(b'web', b'encoding')			encoding.encoding = self.ui.config(b'web', b'encoding')
	self.style = self.ui.config(b'web', b'style')			self.style = self.ui.config(b'web', b'style')
	self.templatepath = self.ui.config(			self.templatepath = self.ui.config(
	b'web', b'templates', untrusted=False			b'web', b'templates', untrusted=False
	)			)
	self.stripecount = self.ui.config(b'web', b'stripes')			self.stripecount = self.ui.config(b'web', b'stripes')
	if self.stripecount:			if self.stripecount:
	self.stripecount = int(self.stripecount)			self.stripecount = int(self.stripecount)
	profile = self.ui.configbool(b'profiling', b'enabled')			profile = self.ui.configbool(b'profiling', b'enabled')
	with profiling.profile(self.ui, enabled=profile):			with profiling.profile(self.ui, enabled=profile):
	try:			try:
	for r in self._runwsgi(req, res):			for r in self._runwsgi(req, res):
	yield r			yield r
	finally:			finally:
	# There are known cycles in localrepository that prevent			# There are known cycles in localrepository that prevent
	# those objects (and tons of held references) from being			# those objects (and tons of held references) from being
	# collected through normal refcounting. We mitigate those			# collected through normal refcounting.
	# leaks by performing an explicit GC on every request.			# In some cases, the resulting memory consumption can
	# TODO remove this once leaks are fixed.			# be tamed by performing explicit garbage collections.
	# TODO only run this on requests that create localrepository			# In presence of actual leaks or big long-lived caches, the
	# instances instead of every request.			# impact on performance of such collections can become a
				# problem, hence the rate shouldn't be set too low.
				# See "Collecting the oldest generation" in
				# https://devguide.python.org/garbage_collector
				# for more about such trade-offs.
				rate = self.gc_full_collect_rate

				# this is not thread safe, but the consequence (skipping
				# a garbage collection) is arguably better than risking
				# to have several threads perform a collection in parallel
				# (long useless wait on all threads).
				self.requests_count += 1
				if rate > 0 and self.requests_count % rate == 0:
	gc.collect()			gc.collect()
				self.gc_full_collections_done += 1
				else:
				gc.collect(generation=1)

	def _runwsgi(self, req, res):			def _runwsgi(self, req, res):
	try:			try:
	self.refresh()			self.refresh()

	csp, nonce = cspvalues(self.ui)			csp, nonce = cspvalues(self.ui)
	if csp:			if csp:
	res.headers[b'Content-Security-Policy'] = csp			res.headers[b'Content-Security-Policy'] = csp

tests/test-hgwebdir-gc.py

This file was added.

				from __future__ import absolute_import

				import os
				from mercurial.hgweb import hgwebdir_mod

				hgwebdir = hgwebdir_mod.hgwebdir

				os.mkdir(b'webdir')
				os.chdir(b'webdir')

				webdir = os.path.realpath(b'.')


				def trivial_response(req, res):
				return []


				def make_hgwebdir(gc_rate=None):
				config = os.path.join(webdir, b'hgwebdir.conf')
				with open(config, 'wb') as configfile:
				configfile.write(b'[experimental]\n')
				if gc_rate is not None:
				configfile.write(b'web.full-garbage-collection-rate=%d\n' % gc_rate)
				hg_wd = hgwebdir(config)
				hg_wd._runwsgi = trivial_response
				return hg_wd


				def process_requests(webdir_instance, number):
				# we don't care for now about passing realistic arguments
				for _ in range(number):
				for chunk in webdir_instance.run_wsgi(None, None):
				pass


				without_gc = make_hgwebdir(gc_rate=0)
				process_requests(without_gc, 5)
				assert without_gc.requests_count == 5
				assert without_gc.gc_full_collections_done == 0

				with_gc = make_hgwebdir(gc_rate=2)
				process_requests(with_gc, 5)
				assert with_gc.requests_count == 5
				assert with_gc.gc_full_collections_done == 2

				with_systematic_gc = make_hgwebdir() # default value of the setting
				process_requests(with_systematic_gc, 3)
				assert with_systematic_gc.requests_count == 3
				assert with_systematic_gc.gc_full_collections_done == 3