mercurial/revlog.py
2338–2341	Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ?

mercurial/revlog.py
2338–2341	We do store the absolute offset, the point here is to find the index after the last entry's (potential) sidedata. Perhaps this variable should be named `prev_sidedata_offset` to make that clearer?

mercurial/revlog.py
2338–2341	So we are looking to the offset of the first byte that does not contains data? If so why are we looking for side data in particular and not just actual data too? If not, what is going on here ?

mercurial/revlog.py
2338–2341	Revlog data files are still append-only (even with the re-writing mechanism of the next patches in the stack), and the sidedata is the last thing that is written for any given entry. So if you want to find the offset of all data (in the general sense of the term), then you look for the end of the sidedata of the last entry. Is there something I missed about there being a simpler solution, or should I add a comment explaining this?

mercurial/revlog.py
2338–2341	Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual data instead, isn't it? and actually if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data… etc. So the current approach seems quite fragile to me. It need a least a clear comment explaining what is it trying to do. And probably jus to be moved within a function with a clear semantic and docstring. The "last offset" information could be stored in the docket to make all this simpler soonish.

mercurial/revlog.py
2338–2341	if the last entry did not have any sidedata, we need to take look at its actual data instead Right, hence the check for the sidedata offset being 0, which falls back to the regular end. if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data If the sidedata needs to be computer after the fact, it means - for now - that it doesn't currently have sidedata, so we'll write (correctly) after the end of the data, do so for however many revisions in the group, then compute their potential respective sidedata and append that to the data file one right after the other (so, separate from their data, but always after). Is that clearer? If so I'll explain better in code.

mercurial/revlog.py
2338–2341	What is the case I described (last revision with no sidedata, previous one with sidedata post-computed) happens for data committed before we start this write.

mercurial/revlog.py
2338–2341	Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the index as a first step, pending the use of a docket file, so at least it's correct in all cases.

mercurial/revlog.py
2429	before getting out of experimental actually. We should add the TODO next the associated config in config item too.
2441	We should take the max of both no matter what would we not ? `offset = max(offset, self.end(rev), self.end_sidedata(rev))` or something similar.

			Path	Packages
M			mercurial/revlog.py (28 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	25670		Feb 19 2021, 6:15 AM	★	★
Diff 2	25992		Mar 1 2021, 11:51 AM	★	★
Diff 3	26088		Mar 4 2021, 10:18 AM	★	★
Diff 4	26253		Mar 12 2021, 6:33 AM	★	★
Diff 5	26331		Mar 15 2021, 6:24 AM	★	★
Diff 6	26376	rHG4cd214c9948d40d006fc09df0c20f090f387d426	Feb 19 2021, 5:07 AM	★	★

Commit	Parents	Author	Summary	Date
7de247375b2a	b70c8c209dab	Raphaël Gomès		Feb 19 2021, 5:07 AM

Status	Author	Revision
Closed	Alphare	D10106 requirements: also add a generaldelta constant
Closed	Alphare	D10105 requirements: add constant for revlog v1 requirement
Closed	Alphare	D10024 error: add `hint` attribute to `SidedataHashError`
Closed	Alphare	D10023 changegroup: use the local variable instead of reaching through self
Closed	Alphare	D10216 configitems: add TODOs blocking the move out of experimental for revlogv2
Closed	Alphare	D10032 sidedata-exchange: rewrite sidedata on-the-fly whenever possible
Closed	Alphare	D10031 revlog-index: add `replace_sidedata_info` method
Closed	Alphare	D10030 revlogv2: temporarily forbid inline revlogs
Closed	Alphare	D10029 changegroupv4: add sidedata helpers
Closed	Alphare	D10151 revlog: add attribute on revlogs that specifies its kind
Closed	Alphare	D10028 sidedata-exchange: add `wanted_sidedata` and `sidedata_computers` to repos
Closed	Alphare	D10027 delta: add sidedata field to revision delta
Closed	Alphare	D10026 changegroup: add v4 changegroup for revlog v2 exchange
Closed	Alphare	D10025 revlogv2: don't assume that the sidedata of the last rev is right after data
Closed	Alphare	D9993 sidedata: move to new sidedata storage in revlogv2
Closed	Alphare	D9846 cext: add support for revlogv2
Closed	Alphare	D9845 bitmanipulation: add utils to read/write bigendian 64bit integers
Closed	Alphare	D10113 format: remove sidedata format variant
Closed	Alphare	D9844 revlogv2: allow upgrading to v2
Closed	Alphare	D9843 revlog: introduce v2 format
Closed	Alphare	D10109 requirements: also add a fncache constant
Closed	Alphare	D10108 requirements: also add a store constant
Closed	Alphare	D10107 requirements: also add a dotencode constant
Closed	Alphare	D10104 pure-parsers: document index class constants

Diff 26253

mercurial/revlog.py

	except IndexError:			except IndexError:
	if rev == wdirrev:			if rev == wdirrev:
	raise error.WdirUnsupported			raise error.WdirUnsupported
	raise			raise

	# Derived from index values.			# Derived from index values.

	def end(self, rev):			def end(self, rev):
	return self.start(rev) + self.length(rev) + self.sidedata_length(rev)			return self.start(rev) + self.length(rev)

	def parents(self, node):			def parents(self, node):
	i = self.index			i = self.index
	d = i[self.rev(node)]			d = i[self.rev(node)]
	return i[d[5]][7], i[d[6]][7] # map revisions to nodes inline			return i[d[5]][7], i[d[6]][7] # map revisions to nodes inline

	def chainlen(self, rev):			def chainlen(self, rev):
	return self._chaininfo(rev)[0]			return self._chaininfo(rev)[0]
	fh = ifh			fh = ifh
	else:			else:
	fh = dfh			fh = dfh

	btext = [rawtext]			btext = [rawtext]

	curr = len(self)			curr = len(self)
	prev = curr - 1			prev = curr - 1
	offset = self.end(prev)
				offset = self._get_data_offset(prev)

	if self._concurrencychecker:			if self._concurrencychecker:
	if self._inline:			if self._inline:
	# offset is "as if" it were in the .d file, so we need to add on			# offset is "as if" it were in the .d file, so we need to add on
	# the size of the entry metadata.			# the size of the entry metadata.
	self._concurrencychecker(			self._concurrencychecker(
				marmouteUnsubmitted Not Done Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ? marmoute: Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ?
				AlphareAuthorUnsubmitted Done We do store the absolute offset, the point here is to find the index after the last entry's (potential) sidedata. Perhaps this variable should be named `prev_sidedata_offset` to make that clearer? Alphare: We do store the absolute offset, the point here is to find the index after the last entry's…
				marmouteUnsubmitted Not Done So we are looking to the offset of the first byte that does not contains data? If so why are we looking for side data in particular and not just actual data too? If not, what is going on here ? marmoute: So we are looking to the offset of the first byte that does not contains data? If so why are…
				AlphareAuthorUnsubmitted Done Revlog data files are still append-only (even with the re-writing mechanism of the next patches in the stack), and the sidedata is the last thing that is written for any given entry. So if you want to find the offset of all data (in the general sense of the term), then you look for the end of the sidedata of the last entry. Is there something I missed about there being a simpler solution, or should I add a comment explaining this? Alphare: Revlog data files are still append-only (even with the re-writing mechanism of the next patches…
				marmouteUnsubmitted Not Done Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual data instead, isn't it? and actually if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data… etc. So the current approach seems quite fragile to me. It need a least a clear comment explaining what is it trying to do. And probably jus to be moved within a function with a clear semantic and docstring. The "last offset" information could be stored in the docket to make all this simpler soonish. marmoute: Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual…
				AlphareAuthorUnsubmitted Done if the last entry did not have any sidedata, we need to take look at its actual data instead Right, hence the check for the sidedata offset being 0, which falls back to the regular end. if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data If the sidedata needs to be computer after the fact, it means - for now - that it doesn't currently have sidedata, so we'll write (correctly) after the end of the data, do so for however many revisions in the group, then compute their potential respective sidedata and append that to the data file one right after the other (so, separate from their data, but always after). Is that clearer? If so I'll explain better in code. Alphare: > if the last entry did not have any sidedata, we need to take look at its actual data instead…
				marmouteUnsubmitted Not Done What is the case I described (last revision with no sidedata, previous one with sidedata post-computed) happens for data committed before we start this write. marmoute: What is the case I described (last revision with no sidedata, previous one with sidedata post…
				AlphareAuthorUnsubmitted Done Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the index as a first step, pending the use of a docket file, so at least it's correct in all cases. Alphare: Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the…
	ifh, self.indexfile, offset + curr * self._io.size			ifh, self.indexfile, offset + curr * self._io.size
	)			)
	else:			else:
	# Entries in the .i are a consistent size.			# Entries in the .i are a consistent size.
	self._concurrencychecker(			self._concurrencychecker(
	ifh, self.indexfile, curr * self._io.size			ifh, self.indexfile, curr * self._io.size
	)			)
	self._concurrencychecker(dfh, self.datafile, offset)			self._concurrencychecker(dfh, self.datafile, offset)
	if alwayscache and rawtext is None:			if alwayscache and rawtext is None:
	rawtext = deltacomputer.buildtext(revinfo, fh)			rawtext = deltacomputer.buildtext(revinfo, fh)

	if type(rawtext) == bytes: # only accept immutable objects			if type(rawtext) == bytes: # only accept immutable objects
	self._revisioncache = (node, curr, rawtext)			self._revisioncache = (node, curr, rawtext)
	self._chainbasecache[curr] = deltainfo.chainbase			self._chainbasecache[curr] = deltainfo.chainbase
	return curr			return curr

				def _get_data_offset(self, prev):
				"""Returns the current offset in the (in-transaction) data file.
				Versions < 2 of the revlog can get this 0(1), revlog v2 needs a docket
				file to store that information: since sidedata can be rewritten to the
				end of the data file within a transaction, you can have cases where, for
				example, rev `n` does not have sidedata while rev `n - 1` does, leading
				to `n - 1`'s sidedata being written after `n`'s data.

				TODO cache this in a docket file before 5.8."""
				marmouteUnsubmitted Done before getting out of experimental actually. We should add the TODO next the associated config in config item too. marmoute: before getting out of experimental actually. We should add the TODO next the associated config…
				if self.version & 0xFFFF != REVLOGV2:
				return self.end(prev)

				offset = 0
				for rev, entry in enumerate(self.index):
				sidedata_end = entry[8] + entry[9]
				if sidedata_end == 0:
				# Sidedata for a previous rev has potentially been written after
				# this rev's end, so take the max.
				offset = max(self.end(rev), offset)
				else:
				offset = sidedata_end
				marmouteUnsubmitted Done We should take the max of both no matter what would we not ? `offset = max(offset, self.end(rev), self.end_sidedata(rev))` or something similar. marmoute: We should take the max of both no matter what would we not ? `offset = max(offset, self.end…
				return offset

	def _writeentry(			def _writeentry(
	self, transaction, ifh, dfh, entry, data, link, offset, sidedata			self, transaction, ifh, dfh, entry, data, link, offset, sidedata
	):			):
	# Files opened in a+ mode have inconsistent behavior on various			# Files opened in a+ mode have inconsistent behavior on various
	# platforms. Windows requires that a file positioning call be made			# platforms. Windows requires that a file positioning call be made
	# when the file handle transitions between reads and writes. See			# when the file handle transitions between reads and writes. See
	# 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other			# 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
	# platforms, Python or the platform itself can be buggy. Some versions			# platforms, Python or the platform itself can be buggy. Some versions

This is an archive of the discontinued Mercurial Phabricator instance.

revlogv2: don't assume that the sidedata of the last rev is right after data
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 26253

mercurial/revlog.py

This is an archive of the discontinued Mercurial Phabricator instance.

revlogv2: don't assume that the sidedata of the last rev is right after dataClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 26253

mercurial/revlog.py

revlogv2: don't assume that the sidedata of the last rev is right after data
ClosedPublic

Revision Contents
Changeset List