mercurial/revlog.py
2352–2355	Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ?

mercurial/revlog.py
2352–2355	We do store the absolute offset, the point here is to find the index after the last entry's (potential) sidedata. Perhaps this variable should be named `prev_sidedata_offset` to make that clearer?

mercurial/revlog.py
2352–2355	So we are looking to the offset of the first byte that does not contains data? If so why are we looking for side data in particular and not just actual data too? If not, what is going on here ?

mercurial/revlog.py
2352–2355	Revlog data files are still append-only (even with the re-writing mechanism of the next patches in the stack), and the sidedata is the last thing that is written for any given entry. So if you want to find the offset of all data (in the general sense of the term), then you look for the end of the sidedata of the last entry. Is there something I missed about there being a simpler solution, or should I add a comment explaining this?

mercurial/revlog.py
2352–2355	Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual data instead, isn't it? and actually if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data… etc. So the current approach seems quite fragile to me. It need a least a clear comment explaining what is it trying to do. And probably jus to be moved within a function with a clear semantic and docstring. The "last offset" information could be stored in the docket to make all this simpler soonish.

mercurial/revlog.py
2352–2355	if the last entry did not have any sidedata, we need to take look at its actual data instead Right, hence the check for the sidedata offset being 0, which falls back to the regular end. if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data If the sidedata needs to be computer after the fact, it means - for now - that it doesn't currently have sidedata, so we'll write (correctly) after the end of the data, do so for however many revisions in the group, then compute their potential respective sidedata and append that to the data file one right after the other (so, separate from their data, but always after). Is that clearer? If so I'll explain better in code.

mercurial/revlog.py
2352–2355	What is the case I described (last revision with no sidedata, previous one with sidedata post-computed) happens for data committed before we start this write.

mercurial/revlog.py
2352–2355	Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the index as a first step, pending the use of a docket file, so at least it's correct in all cases.

mercurial/revlog.py
2458	before getting out of experimental actually. We should add the TODO next the associated config in config item too.
2470	We should take the max of both no matter what would we not ? `offset = max(offset, self.end(rev), self.end_sidedata(rev))` or something similar.

			Path	Packages
M			mercurial/revlog.py (11 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	25670		Feb 19 2021, 6:15 AM	★	★
Diff 2	25992		Mar 1 2021, 11:51 AM	★	★
Diff 3	26088		Mar 4 2021, 10:18 AM	★	★
Diff 4	26253		Mar 12 2021, 6:33 AM	★	★
Diff 5	26331		Mar 15 2021, 6:24 AM	★	★
Diff 6	26376	rHG4cd214c9948d40d006fc09df0c20f090f387d426	Feb 19 2021, 5:07 AM	★	★

Commit	Parents	Author	Summary	Date
68cfa31cc3a8	ddcd1014bf1f	Raphaël Gomès		Feb 19 2021, 5:07 AM

Status	Author	Revision
Closed	Alphare	D10106 requirements: also add a generaldelta constant
Closed	Alphare	D10105 requirements: add constant for revlog v1 requirement
Closed	Alphare	D10024 error: add `hint` attribute to `SidedataHashError`
Closed	Alphare	D10023 changegroup: use the local variable instead of reaching through self
Closed	Alphare	D10216 configitems: add TODOs blocking the move out of experimental for revlogv2
Closed	Alphare	D10032 sidedata-exchange: rewrite sidedata on-the-fly whenever possible
Closed	Alphare	D10031 revlog-index: add `replace_sidedata_info` method
Closed	Alphare	D10030 revlogv2: temporarily forbid inline revlogs
Closed	Alphare	D10029 changegroupv4: add sidedata helpers
Closed	Alphare	D10151 revlog: add attribute on revlogs that specifies its kind
Closed	Alphare	D10028 sidedata-exchange: add `wanted_sidedata` and `sidedata_computers` to repos
Closed	Alphare	D10027 delta: add sidedata field to revision delta
Closed	Alphare	D10026 changegroup: add v4 changegroup for revlog v2 exchange
Closed	Alphare	D10025 revlogv2: don't assume that the sidedata of the last rev is right after data
Closed	Alphare	D9993 sidedata: move to new sidedata storage in revlogv2
Closed	Alphare	D9846 cext: add support for revlogv2
Closed	Alphare	D9845 bitmanipulation: add utils to read/write bigendian 64bit integers
Closed	Alphare	D10113 format: remove sidedata format variant
Closed	Alphare	D9844 revlogv2: allow upgrading to v2
Closed	Alphare	D9843 revlog: introduce v2 format
Closed	Alphare	D10109 requirements: also add a fncache constant
Closed	Alphare	D10108 requirements: also add a store constant
Closed	Alphare	D10107 requirements: also add a dotencode constant
Closed	Alphare	D10104 pure-parsers: document index class constants

Diff 25992

mercurial/revlog.py

	fh = ifh			fh = ifh
	else:			else:
	fh = dfh			fh = dfh

	btext = [rawtext]			btext = [rawtext]

	curr = len(self)			curr = len(self)
	prev = curr - 1			prev = curr - 1

				if self.version & 0xFFFF == REVLOGV2:
				prev_node = self.index[prev]
				sidedata_offset = prev_node[10]
				if sidedata_offset == 0:
				offset = self.end(prev)
				else:
				offset = sidedata_offset + prev_node[11]
				marmouteUnsubmitted Not Done Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ? marmoute: Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ?
				AlphareAuthorUnsubmitted Done We do store the absolute offset, the point here is to find the index after the last entry's (potential) sidedata. Perhaps this variable should be named `prev_sidedata_offset` to make that clearer? Alphare: We do store the absolute offset, the point here is to find the index after the last entry's…
				marmouteUnsubmitted Not Done So we are looking to the offset of the first byte that does not contains data? If so why are we looking for side data in particular and not just actual data too? If not, what is going on here ? marmoute: So we are looking to the offset of the first byte that does not contains data? If so why are…
				AlphareAuthorUnsubmitted Done Revlog data files are still append-only (even with the re-writing mechanism of the next patches in the stack), and the sidedata is the last thing that is written for any given entry. So if you want to find the offset of all data (in the general sense of the term), then you look for the end of the sidedata of the last entry. Is there something I missed about there being a simpler solution, or should I add a comment explaining this? Alphare: Revlog data files are still append-only (even with the re-writing mechanism of the next patches…
				marmouteUnsubmitted Not Done Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual data instead, isn't it? and actually if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data… etc. So the current approach seems quite fragile to me. It need a least a clear comment explaining what is it trying to do. And probably jus to be moved within a function with a clear semantic and docstring. The "last offset" information could be stored in the docket to make all this simpler soonish. marmoute: Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual…
				AlphareAuthorUnsubmitted Done if the last entry did not have any sidedata, we need to take look at its actual data instead Right, hence the check for the sidedata offset being 0, which falls back to the regular end. if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data If the sidedata needs to be computer after the fact, it means - for now - that it doesn't currently have sidedata, so we'll write (correctly) after the end of the data, do so for however many revisions in the group, then compute their potential respective sidedata and append that to the data file one right after the other (so, separate from their data, but always after). Is that clearer? If so I'll explain better in code. Alphare: > if the last entry did not have any sidedata, we need to take look at its actual data instead…
				marmouteUnsubmitted Not Done What is the case I described (last revision with no sidedata, previous one with sidedata post-computed) happens for data committed before we start this write. marmoute: What is the case I described (last revision with no sidedata, previous one with sidedata post…
				AlphareAuthorUnsubmitted Done Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the index as a first step, pending the use of a docket file, so at least it's correct in all cases. Alphare: Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the…
				else:
	offset = self.end(prev)			offset = self.end(prev)

	if self._concurrencychecker:			if self._concurrencychecker:
	if self._inline:			if self._inline:
	# offset is "as if" it were in the .d file, so we need to add on			# offset is "as if" it were in the .d file, so we need to add on
	# the size of the entry metadata.			# the size of the entry metadata.
	self._concurrencychecker(			self._concurrencychecker(
	ifh, self.indexfile, offset + curr * self._io.size			ifh, self.indexfile, offset + curr * self._io.size
	)			)
	def _writeentry(			def _writeentry(
	self, transaction, ifh, dfh, entry, data, link, offset, sidedata			self, transaction, ifh, dfh, entry, data, link, offset, sidedata
	):			):
	# Files opened in a+ mode have inconsistent behavior on various			# Files opened in a+ mode have inconsistent behavior on various
	# platforms. Windows requires that a file positioning call be made			# platforms. Windows requires that a file positioning call be made
	# when the file handle transitions between reads and writes. See			# when the file handle transitions between reads and writes. See
	# 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other			# 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
	# platforms, Python or the platform itself can be buggy. Some versions			# platforms, Python or the platform itself can be buggy. Some versions
	# of Solaris have been observed to not append at the end of the file			# of Solaris have been observed to not append at the end of the file
				marmouteUnsubmitted Done before getting out of experimental actually. We should add the TODO next the associated config in config item too. marmoute: before getting out of experimental actually. We should add the TODO next the associated config…
	# if the file was seeked to before the end. See issue4943 for more.			# if the file was seeked to before the end. See issue4943 for more.
	#			#
	# We work around this issue by inserting a seek() before writing.			# We work around this issue by inserting a seek() before writing.
	# Note: This is likely not necessary on Python 3. However, because			# Note: This is likely not necessary on Python 3. However, because
	# the file handle is reused for reads and may be seeked there, we need			# the file handle is reused for reads and may be seeked there, we need
	# to be careful before changing this.			# to be careful before changing this.
	ifh.seek(0, os.SEEK_END)			ifh.seek(0, os.SEEK_END)
	if dfh:			if dfh:
	dfh.seek(0, os.SEEK_END)			dfh.seek(0, os.SEEK_END)

	curr = len(self) - 1			curr = len(self) - 1
	if not self._inline:			if not self._inline:
				marmouteUnsubmitted Done We should take the max of both no matter what would we not ? `offset = max(offset, self.end(rev), self.end_sidedata(rev))` or something similar. marmoute: We should take the max of both no matter what would we not ? `offset = max(offset, self.end…
	transaction.add(self.datafile, offset)			transaction.add(self.datafile, offset)
	transaction.add(self.indexfile, curr * len(entry))			transaction.add(self.indexfile, curr * len(entry))
	if data[0]:			if data[0]:
	dfh.write(data[0])			dfh.write(data[0])
	dfh.write(data[1])			dfh.write(data[1])
	if sidedata:			if sidedata:
	dfh.write(sidedata)			dfh.write(sidedata)
	ifh.write(entry)			ifh.write(entry)

This is an archive of the discontinued Mercurial Phabricator instance.

revlogv2: don't assume that the sidedata of the last rev is right after data
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 25992

mercurial/revlog.py

This is an archive of the discontinued Mercurial Phabricator instance.

revlogv2: don't assume that the sidedata of the last rev is right after dataClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 25992

mercurial/revlog.py

revlogv2: don't assume that the sidedata of the last rev is right after data
ClosedPublic

Revision Contents
Changeset List