This is an archive of the discontinued Mercurial Phabricator instance.

Differential D9278

transaction: split new files into a separate set
ClosedPublic

Authored by joerg.sonnenberger on Nov 7 2020, 4:32 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

hg-reviewers

Commits

rHGec73a6a75985: transaction: split new files into a separate set
rHGfae02ffcbae8: transaction: split new files into a separate set

Summary

Journal entries with size 0 are common as they represent new revlog
files. Move them from the dictionary into a set as the former is more
dense. This reduces peak RSS by 70MB for the NetBSD test repository with
around 450k files under .hg/store.

Diff Detail

Repository

rHG Mercurial

Branch

default

Lint

No Linters Available

Unit

No Unit Test Coverage

Event Timeline

joerg.sonnenberger created this revision.Nov 7 2020, 4:32 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptNov 7 2020, 4:32 PM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

Why do we have to maintain the in-memory records at all? Don't we write these to the journal and journal.backupfiles files? What's wrong with reading this file handle? We only need to perform a findoffset() when incurring a split revlog or during rollback, right? Is there a noticeable performance overhead to performing that linear scan?

I've been wondering about that question as well. In theory, we could run into O(n^2) behavior when there are many files that end up being split. I've decided to take the step back from the full dropping in D9237 as most transactions are much smaller than the initial clone and therefore are not as sensitive to the memory use.

joerg.sonnenberger updated this revision to Diff 23433.Nov 8 2020, 10:00 AM

joerg.sonnenberger added a commit: rHGfae02ffcbae8: transaction: split new files into a separate set.Nov 18 2020, 6:07 PM

This revision was not accepted when it landed; it landed in state Needs Review.

Closed by commit rHGfae02ffcbae8: transaction: split new files into a separate set (authored by joerg.sonnenberger). · Explain Why

This revision was automatically updated to reflect the committed changes.

joerg.sonnenberger added a commit: rHGec73a6a75985: transaction: split new files into a separate set.Nov 19 2020, 6:40 PM

Revision Contents
Changeset List

			Path	Packages
M			mercurial/repair.py (2 lines)
M			mercurial/transaction.py (46 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	23428		Nov 7 2020, 4:32 PM	★	★
Diff 2	23433		Nov 8 2020, 10:00 AM	★	★
Diff 3	23556	rHGfae02ffcbae8638236cb30e5b469c0baba279129	Nov 7 2020, 4:31 PM	★	★

Commit	Parents	Author	Summary	Date
cb3cc592178c	1ffe1568b0ff	Joerg Sonnenberger		Nov 7 2020, 4:31 PM

Status	Author	Revision
Closed	joerg.sonnenberger	D9278 transaction: split new files into a separate set
Closed	joerg.sonnenberger	D9277 transaction: change list of journal entries into a dictionary
Closed	joerg.sonnenberger	D9276 transaction: rename find to findoffset and drop backup file support
Closed	joerg.sonnenberger	D9275 transaction: drop per-file extra data support

Diff 23428

mercurial/repair.py

	with ui.uninterruptible():			with ui.uninterruptible():
	try:			try:
	with repo.transaction(b"strip") as tr:			with repo.transaction(b"strip") as tr:
	# TODO this code violates the interface abstraction of the			# TODO this code violates the interface abstraction of the
	# transaction and makes assumptions that file storage is			# transaction and makes assumptions that file storage is
	# using append-only files. We'll need some kind of storage			# using append-only files. We'll need some kind of storage
	# API to handle stripping for us.			# API to handle stripping for us.
	oldfiles = set(tr._offsetmap.keys())			oldfiles = set(tr._offsetmap.keys())
				oldfiles.update(tr._newfiles)

	tr.startgroup()			tr.startgroup()
	cl.strip(striprev, tr)			cl.strip(striprev, tr)
	stripmanifest(repo, striprev, tr, files)			stripmanifest(repo, striprev, tr, files)

	for fn in files:			for fn in files:
	repo.file(fn).strip(striprev, tr)			repo.file(fn).strip(striprev, tr)
	tr.endgroup()			tr.endgroup()

	newfiles = set(tr._offsetmap.keys())			newfiles = set(tr._offsetmap.keys())
				newfiles.update(tr._newfiles)
	newfiles.difference_update(oldfiles)			newfiles.difference_update(oldfiles)

	# The processing order doesn't matter during normal			# The processing order doesn't matter during normal
	# operation, but the test-repair-strip.t test case			# operation, but the test-repair-strip.t test case
	# inserts faults and it benefits from the sorting.			# inserts faults and it benefits from the sorting.
	for file in sorted(newfiles):			for file in sorted(newfiles):
	troffset = tr.findoffset(file)			troffset = tr.findoffset(file)
	with repo.svfs(file, b'a', checkambig=True) as fp:			with repo.svfs(file, b'a', checkambig=True) as fp:

mercurial/transaction.py

	# a vfs to the store content			# a vfs to the store content
	self._opener = opener			self._opener = opener
	# a map to access file in various {location -> vfs}			# a map to access file in various {location -> vfs}
	vfsmap = vfsmap.copy()			vfsmap = vfsmap.copy()
	vfsmap[b''] = opener # set default value			vfsmap[b''] = opener # set default value
	self._vfsmap = vfsmap			self._vfsmap = vfsmap
	self._after = after			self._after = after
	self._offsetmap = {}			self._offsetmap = {}
				self._newfiles = set()
	self._journal = journalname			self._journal = journalname
	self._undoname = undoname			self._undoname = undoname
	self._queue = []			self._queue = []
	# A callback to do something just after releasing transaction.			# A callback to do something just after releasing transaction.
	if releasefn is None:			if releasefn is None:
	releasefn = lambda tr, success: None			releasefn = lambda tr, success: None
	self._releasefn = releasefn			self._releasefn = releasefn

	sees either none or all of the strip actions to be done."""			sees either none or all of the strip actions to be done."""
	q = self._queue.pop()			q = self._queue.pop()
	for f, o in q:			for f, o in q:
	self._addentry(f, o)			self._addentry(f, o)

	@active			@active
	def add(self, file, offset):			def add(self, file, offset):
	"""record the state of an append-only file before update"""			"""record the state of an append-only file before update"""
	if file in self._offsetmap or file in self._backupmap:			if (
				file in self._newfiles
				or file in self._offsetmap
				or file in self._backupmap
				):
	return			return
	if self._queue:			if self._queue:
	self._queue[-1].append((file, offset))			self._queue[-1].append((file, offset))
	return			return

	self._addentry(file, offset)			self._addentry(file, offset)

	def _addentry(self, file, offset):			def _addentry(self, file, offset):
	"""add a append-only entry to memory and on-disk state"""			"""add a append-only entry to memory and on-disk state"""
	if file in self._offsetmap or file in self._backupmap:			if (
				file in self._newfiles
				or file in self._offsetmap
				or file in self._backupmap
				):
	return			return
				if offset:
	self._offsetmap[file] = offset			self._offsetmap[file] = offset
				else:
				self._newfiles.add(file)
	# add enough data to the journal to do the truncate			# add enough data to the journal to do the truncate
	self._file.write(b"%s\0%d\n" % (file, offset))			self._file.write(b"%s\0%d\n" % (file, offset))
	self._file.flush()			self._file.flush()

	@active			@active
	def addbackup(self, file, hardlink=True, location=b''):			def addbackup(self, file, hardlink=True, location=b''):
	"""Adds a backup of the file to the transaction			"""Adds a backup of the file to the transaction

	Calling addbackup() creates a hardlink backup of the specified file			Calling addbackup() creates a hardlink backup of the specified file
	that is used to recover the file in the event of the transaction			that is used to recover the file in the event of the transaction
	aborting.			aborting.

	* `file`: the file path, relative to .hg/store			* `file`: the file path, relative to .hg/store
	* `hardlink`: use a hardlink to quickly create the backup			* `hardlink`: use a hardlink to quickly create the backup
	"""			"""
	if self._queue:			if self._queue:
	msg = b'cannot use transaction.addbackup inside "group"'			msg = b'cannot use transaction.addbackup inside "group"'
	raise error.ProgrammingError(msg)			raise error.ProgrammingError(msg)

	if file in self._offsetmap or file in self._backupmap:			if (
				file in self._newfiles
				or file in self._offsetmap
				or file in self._backupmap
				):
	return			return
	vfs = self._vfsmap[location]			vfs = self._vfsmap[location]
	dirname, filename = vfs.split(file)			dirname, filename = vfs.split(file)
	backupfilename = b"%s.backup.%s" % (self._journal, filename)			backupfilename = b"%s.backup.%s" % (self._journal, filename)
	backupfile = vfs.reljoin(dirname, backupfilename)			backupfile = vfs.reljoin(dirname, backupfilename)
	if vfs.exists(file):			if vfs.exists(file):
	filepath = vfs.join(file)			filepath = vfs.join(file)
	backuppath = vfs.join(backupfile)			backuppath = vfs.join(backupfile)
	del files[:]			del files[:]
	finally:			finally:
	for f in files:			for f in files:
	f.discard()			f.discard()
	return any			return any

	@active			@active
	def findoffset(self, file):			def findoffset(self, file):
				if file in self._newfiles:
				return 0
	return self._offsetmap.get(file)			return self._offsetmap.get(file)

	@active			@active
	def replace(self, file, offset):			def replace(self, file, offset):
	'''			'''
	replace can only replace already committed entries			replace can only replace already committed entries
	that are not pending in the queue			that are not pending in the queue
	'''			'''
				if file in self._newfiles:
	if file not in self._offsetmap:			if not offset:
	raise KeyError(file)			return
				self._newfiles.remove(file)
				self._offsetmap[file] = offset
				elif file in self._offsetmap:
				if not offset:
				del self._offsetmap[file]
				self._newfiles.add(file)
				else:
	self._offsetmap[file] = offset			self._offsetmap[file] = offset
				else:
				raise KeyError(file)
	self._file.write(b"%s\0%d\n" % (file, offset))			self._file.write(b"%s\0%d\n" % (file, offset))
	self._file.flush()			self._file.flush()

	@active			@active
	def nest(self, name='<unnamed>'):			def nest(self, name='<unnamed>'):
	self._count += 1			self._count += 1
	self._usages += 1			self._usages += 1
	self._names.append(name)			self._names.append(name)
	except (IOError, OSError, error.Abort) as inst:			except (IOError, OSError, error.Abort) as inst:
	if not c:			if not c:
	raise			raise
	# Abort may be raise by read only opener			# Abort may be raise by read only opener
	self._report(			self._report(
	b"couldn't remove %s: %s\n" % (vfs.join(b), inst)			b"couldn't remove %s: %s\n" % (vfs.join(b), inst)
	)			)
	self._offsetmap = {}			self._offsetmap = {}
				self._newfiles = set()
	self._writeundo()			self._writeundo()
	if self._after:			if self._after:
	self._after()			self._after()
	self._after = None # Help prevent cycles.			self._after = None # Help prevent cycles.
	if self._opener.isfile(self._backupjournal):			if self._opener.isfile(self._backupjournal):
	self._opener.unlink(self._backupjournal)			self._opener.unlink(self._backupjournal)
	if self._opener.isfile(self._journal):			if self._opener.isfile(self._journal):
	self._opener.unlink(self._journal)			self._opener.unlink(self._journal)
	undobackupfile.close()			undobackupfile.close()

	def _abort(self):			def _abort(self):
	self._count = 0			self._count = 0
	self._usages = 0			self._usages = 0
	self._file.close()			self._file.close()
	self._backupsfile.close()			self._backupsfile.close()
	entries = list(pycompat.iteritems(self._offsetmap))			entries = list(pycompat.iteritems(self._offsetmap))
				for file in self._newfiles:
				entries.append((file, 0))
	entries.sort()			entries.sort()

	try:			try:
	if not self._offsetmap and not self._backupentries:			if not entries and not self._backupentries:
	if self._backupjournal:			if self._backupjournal:
	self._opener.unlink(self._backupjournal)			self._opener.unlink(self._backupjournal)
	if self._journal:			if self._journal:
	self._opener.unlink(self._journal)			self._opener.unlink(self._journal)
	return			return

	self._report(_(b"transaction abort!\n"))			self._report(_(b"transaction abort!\n"))