This is an archive of the discontinued Mercurial Phabricator instance.

changing-files: rework the way we store changed files in side-data
ClosedPublic

Authored by marmoute on Sep 26 2020, 8:10 AM.

Download Raw Diff

Details

Reviewers

Alphare
pulkit

Group Reviewers

hg-reviewers

Commits

rHG9a6b409b8ebc: changing-files: rework the way we store changed files in side-data

Summary

We need to store new data so this is a good opportunity to rework this fully.

We directly store the list of affected file in the side data:

This avoid having to fetch and parse the files list in the revision in addition to the sidedata. Making the data more self sufficient.

This work around situation where that files field contains wrong information, and open the way to other bug fixing (eg: issue6219)

The format (fixed initial index, sorted files) allow for fast lookup of filename within the structure.

This unify the storage of affected files and copies sources and destination, limiting the number filename stored redundantly.

This prepare for the fact we should drop the files as soon as we do any change affecting the revision schema.

This rely on compression to avoid a significant increase of the changelog.d. More testing on this will be done before we freeze the final format.

We can store additional data:

The new "merged" field,

A future "salvaged" set recording files that might have been deleted but have were still present in the final result.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

marmoute created this revision.Sep 26 2020, 8:10 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptSep 26 2020, 8:10 AM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

marmoute added a child revision: D9091: changing-files: drop the now useless changelogrevision argument.Sep 26 2020, 8:10 AM

pulkit added a subscriber: pulkit.Oct 1 2020, 10:33 AM

pulkit added inline comments.

mercurial/metadata.py
397–430	nit: I guess we can directly use `files.*` attributes.
401	nit: we can preserve this line of documentation.
490–498	This diff can ideally be in a different patch as it's not related to storage rework.
mercurial/revlogutils/sidedata.py
56	Coming from documenting mergestate constants, I think it will be nice to have documentation for all these keys as a followup.

Pretty straightforward and clear code, thanks, I have a few remarks that block me from accepting, but other than that it looks good.

mercurial/helptext/internals/revlogs.txt
262	Is there a reason to use big-endian instead of little-endian (as x86 and x86-64 are little-endian)? I had the same question asked when I proposed the first draft of the dirstate cache.
267	unsigned, I assume?
297	Same as above, and below
308	Please fix the typos in this sentence, it's kind of hard to read.
mercurial/metadata.py
430	The `in` + `get` seems wasteful, but I'm guessing you don't expect people to actually use the pure version?
446–450	I think we could benefit from some sort of asserts when reading sizes or in case of overflow.

This revision now requires changes to proceed.Oct 1 2020, 11:31 AM

marmoute added inline comments.Oct 1 2020, 12:01 PM

mercurial/helptext/internals/revlogs.txt
262	My three main reason to go for big-endian Big endian is more standard (network encoding) and everything else in Mercurial use big endian. So sticking with the current practice is better for consistency. x86 process are usually very good at dealing with big endian data, and I have never seen any significant slowdown for using BE data. Using Big Endian for storage usually make sure people though about Endianness for storage before it is too late. Avoiding releasing inconsistent endianess in the wild, corrupting user data.
267	sure :-/
308	Will do. I think it just need a: `If now copy` → `If no copy/` and `the value or this field` → `the value of this field`. do you see anything else ?
mercurial/metadata.py
397–430	We could, but we do not have caching in the object yet. But it is coming right after. I'll follow up with a cleanup.
401	Indeed, we should. This probably got dropped while manipulating patches.
430	I don't expect this to be a major bottleneck. I can do some performance measurement once the dust settle. but I do not expect to find anything special.
446–450	Good point.
490–498	Which diff are you thinking about ? the specific `filesmerged = computechangesetfilesmerged(ctx)` line ? If so the value would not be used anywhere without this change.

Alphare added inline comments.Oct 1 2020, 12:06 PM

mercurial/helptext/internals/revlogs.txt
262	Sure, makes sense, thanks!
308	s/irrevant/irrelevant/ also

marmoute edited parent revisions, added: D9143: changing-files: fix docstring; removed: D9089: changing-files: add a utility to compute the merged files post-commit.Oct 2 2020, 3:13 AM

marmoute updated this revision to Diff 22978.

Alphare accepted this revision.Oct 2 2020, 3:32 AM

marmoute updated this revision to Diff 23007.Oct 2 2020, 12:39 PM

pulkit accepted this revision.Oct 6 2020, 4:18 AM

This revision is now accepted and ready to land.Oct 6 2020, 4:18 AM

marmoute added a commit: rHG9a6b409b8ebc: changing-files: rework the way we store changed files in side-data.Oct 6 2020, 5:03 AM

Closed by commit rHG9a6b409b8ebc: changing-files: rework the way we store changed files in side-data (authored by marmoute). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/helptext/internals/revlogs.txt (72 lines)
M			mercurial/metadata.py (165 lines)
M			mercurial/revlogutils/sidedata.py (1 line)
M			tests/test-copies-in-changeset.t (71 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	22865		Sep 26 2020, 8:10 AM	★	★
Diff 2	22978		Oct 2 2020, 3:13 AM	★	★
Diff 3	23007		Oct 2 2020, 12:39 PM	★	★
Diff 4	23045	rHG9a6b409b8ebcfbf1c1ed61a69b3c62ae1dbf73f3	Sep 15 2020, 4:55 AM	★	★

Status	Author	Revision
Closed	marmoute	D9141 copies: move `merged` testing sooner
Closed	marmoute	D9140 copies: return None instead of ChangingFiles when relevant
Closed	marmoute	D9139 copies: add a HASCOPIESINFO flag to highlight rev with useful data
Closed	marmoute	D9130 salvaged: properly deal with salvaged file during copy tracing
Closed	marmoute	D9129 salvaged: persist the salvaged set on disk
Closed	marmoute	D9128 changing-files: add clean computation of changed file for merges
Closed	marmoute	D9127 changing-files: add clean computation of changed files for linear changesets
Closed	marmoute	D9126 changing-files: add clean computation of changed files for roots
Closed	marmoute	D9125 changing-files: add a debug command display changed files
Closed	marmoute	D9124 side-data: add a test to check sidedata upgrade
Closed	marmoute	D9123 changing-files: split the changing files computation from encoding
Closed	marmoute	D9120 salvaged: record salvaged in ChangingFiles at commit time
Closed	marmoute	D9119 salvaged: track removal-candidates in more cases
Closed	marmoute	D9122 salvaged: explicitly skip salvaged file while encoding
Closed	marmoute	D9118 changing-files: add a "salvaged" set to track file that were not removed
Closed	marmoute	D9117 copies: directly pass a changes object to the copy tracing code
Closed	marmoute	D9116 copies: no longer change the sidedata flag
Closed	marmoute	D9115 copies: use dedicated `_revinfo_getter` function and call
Closed	marmoute	D9114 copies: make two version of the changeset centric algorithm
Closed	marmoute	D9113 copies: rename some function to the new naming scheme
Closed	marmoute	D9112 changing-files: cache the various property
Closed	marmoute	D9111 changing-files: always use `mark_touched` to update the touched set
Closed	marmoute	D9092 changing-files: retrieve changelogrevision.files from the sidedata block
Closed	marmoute	D9091 changing-files: drop the now useless changelogrevision argument
Closed	marmoute	D9090 changing-files: rework the way we store changed files in side-data
Closed	marmoute	D9143 changing-files: fix docstring

Diff 23045

mercurial/helptext/internals/revlogs.txt


	Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1			Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
	hash of a revision:			hash of a revision:

	1. Hash the parent nodes			1. Hash the parent nodes
	2. Hash the fulltext of the revision			2. Hash the fulltext of the revision

	The 20 byte node ids of the parents are fed into the hasher in ascending order.			The 20 byte node ids of the parents are fed into the hasher in ascending order.

				Changed Files side-data
				=======================

				(This feature is in active development and its behavior is not frozen yet. It
				should not be used in any production repository)

				When the `exp-copies-sidedata-changeset` requirement is in use, information
				related to the changed files will be stored as "side-data" for every changeset
				in the changelog.

				These data contains the following information:

				* set of files actively added by the changeset
				* set of files actively removed by the changeset
				* set of files actively merged by the changeset
				* set of files actively touched by he changeset
				* mapping of copy-source, copy-destination from first parent (p1)
				* mapping of copy-source, copy-destination from second parent (p2)

				The block itself is big-endian data, formatted in three sections: header, index,
				AlphareUnsubmitted Not Done Is there a reason to use big-endian instead of little-endian (as x86 and x86-64 are little-endian)? I had the same question asked when I proposed the first draft of the dirstate cache. Alphare: Is there a reason to use big-endian instead of little-endian (as x86 and x86-64 are little…
				marmouteAuthorUnsubmitted Done My three main reason to go for big-endian Big endian is more standard (network encoding) and everything else in Mercurial use big endian. So sticking with the current practice is better for consistency. x86 process are usually very good at dealing with big endian data, and I have never seen any significant slowdown for using BE data. Using Big Endian for storage usually make sure people though about Endianness for storage before it is too late. Avoiding releasing inconsistent endianess in the wild, corrupting user data. marmoute: My three main reason to go for big-endian Big endian is more standard (network encoding) and…
				AlphareUnsubmitted Not Done Sure, makes sense, thanks! Alphare: Sure, makes sense, thanks!
				and data. See below for details:

				Header:

				4 bytes: unsigned integer
				AlphareUnsubmitted Not Done unsigned, I assume? Alphare: unsigned, I assume?
				marmouteAuthorUnsubmitted Done sure :-/ marmoute: sure :-/

				total number of entry in the index

				Index:

				The index contains an entry for every involved filename. It is sorted by
				filename. The entry use the following format:

				1 byte: bits field

				This byte hold two different bit fields:

				The 2 lower bits carry copy information:

				`00`: file has not copy information,
				`10`: file is copied from a p1 source,
				`11`: file is copied from a p2 source.

				The 3 next bits carry action information.

				`000`: file was untouched, it exist in the index as copy source,
				`001`: file was actively added
				`010`: file was actively merged
				`011`: file was actively removed
				`100`: reserved for future use
				`101`: file was actively touched in any other way

				(The last 2 bites are unused)

				4 bytes: unsigned integer
				AlphareUnsubmitted Not Done Same as above, and below Alphare: Same as above, and below

				Address (in bytes) of the end of the associated filename in the data
				block. (This is the address of the first byte not part of the filename)

				The start of the filename can be retrieve by reading that field for the
				previous index entry. The filename of the first entry starts at zero.

				4 bytes: unsigned integer

				Index (in this very index) of the source of the copy (when a copy is
				happening). If no copy is happening the value of this field is
				AlphareUnsubmitted Not Done Please fix the typos in this sentence, it's kind of hard to read. Alphare: Please fix the typos in this sentence, it's kind of hard to read.
				marmouteAuthorUnsubmitted Done Will do. I think it just need a: `If now copy` → `If no copy/` and `the value or this field` → `the value of this field`. do you see anything else ? marmoute: Will do. I think it just need a: `If now copy` → `If no copy/` and `the value or this field` →…
				AlphareUnsubmitted Not Done s/irrevant/irrelevant/ also Alphare: s/irrevant/irrelevant/ also
				irrelevant and could have any value. It is set to zero by convention

				Data:

				raw bytes block containing all filename concatenated without any separator.

mercurial/metadata.py

	# metadata.py -- code related to various metadata computation and access.			# metadata.py -- code related to various metadata computation and access.
	#			#
	# Copyright 2019 Google, Inc <martinvonz@google.com>			# Copyright 2019 Google, Inc <martinvonz@google.com>
	# Copyright 2020 Pierre-Yves David <pierre-yves.david@octobus.net>			# Copyright 2020 Pierre-Yves David <pierre-yves.david@octobus.net>
	#			#
	# This software may be used and distributed according to the terms of the			# This software may be used and distributed according to the terms of the
	# GNU General Public License version 2 or any later version.			# GNU General Public License version 2 or any later version.
	from __future__ import absolute_import, print_function			from __future__ import absolute_import, print_function

	import multiprocessing			import multiprocessing
				import struct

	from . import (			from . import (
	error,			error,
	node,			node,
	pycompat,			pycompat,
	util,			util,
	)			)

	subset.append(files[i])			subset.append(files[i])
	return subset			return subset
	except (ValueError, IndexError):			except (ValueError, IndexError):
	# Perhaps someone had chosen the same key name (e.g. "added") and			# Perhaps someone had chosen the same key name (e.g. "added") and
	# used different syntax for the value.			# used different syntax for the value.
	return None			return None


				# see mercurial/helptext/internals/revlogs.txt for details about the format

				ACTION_MASK = int("111" "00", 2)
				# note: untouched file used as copy source will as `000` for this mask.
				ADDED_FLAG = int("001" "00", 2)
				MERGED_FLAG = int("010" "00", 2)
				REMOVED_FLAG = int("011" "00", 2)
				# `100` is reserved for future use
				TOUCHED_FLAG = int("101" "00", 2)

				COPIED_MASK = int("11", 2)
				COPIED_FROM_P1_FLAG = int("10", 2)
				COPIED_FROM_P2_FLAG = int("11", 2)

				# structure is <flag><filename-end><copy-source>
				INDEX_HEADER = struct.Struct(">L")
				INDEX_ENTRY = struct.Struct(">bLL")


	def encode_files_sidedata(files):			def encode_files_sidedata(files):
	sortedfiles = sorted(files.touched)			all_files = set(files.touched)
	sidedata = {}			all_files.update(files.copied_from_p1.values())
	p1copies = files.copied_from_p1			all_files.update(files.copied_from_p2.values())
	if p1copies:			all_files = sorted(all_files)
	p1copies = encodecopies(sortedfiles, p1copies)			file_idx = {f: i for (i, f) in enumerate(all_files)}
	sidedata[sidedatamod.SD_P1COPIES] = p1copies			file_idx[None] = 0
	p2copies = files.copied_from_p2
	if p2copies:			chunks = [INDEX_HEADER.pack(len(all_files))]
	p2copies = encodecopies(sortedfiles, p2copies)
	sidedata[sidedatamod.SD_P2COPIES] = p2copies			filename_length = 0
	filesadded = files.added			for f in all_files:
	if filesadded:			filename_size = len(f)
	filesadded = encodefileindices(sortedfiles, filesadded)			filename_length += filename_size
	sidedata[sidedatamod.SD_FILESADDED] = filesadded			flag = 0
	filesremoved = files.removed			if f in files.added:
	if filesremoved:			flag \|= ADDED_FLAG
	filesremoved = encodefileindices(sortedfiles, filesremoved)			elif f in files.merged:
	sidedata[sidedatamod.SD_FILESREMOVED] = filesremoved			flag \|= MERGED_FLAG
	if not sidedata:			elif f in files.removed:
	sidedata = None			flag \|= REMOVED_FLAG
	return sidedata			elif f in files.touched:
				flag \|= TOUCHED_FLAG

				copy = None
				if f in files.copied_from_p1:
				flag \|= COPIED_FROM_P1_FLAG
				copy = files.copied_from_p1.get(f)
				elif f in files.copied_from_p2:
				copy = files.copied_from_p2.get(f)
				flag \|= COPIED_FROM_P2_FLAG
				copy_idx = file_idx[copy]
				chunks.append(INDEX_ENTRY.pack(flag, filename_length, copy_idx))
				chunks.extend(all_files)
				return {sidedatamod.SD_FILES: b''.join(chunks)}
				pulkitUnsubmitted Not Done nit: I guess we can directly use `files.` attributes. pulkit:* nit: I guess we can directly use `files.*` attributes.
				marmouteAuthorUnsubmitted Done We could, but we do not have caching in the object yet. But it is coming right after. I'll follow up with a cleanup. marmoute: We could, but we do not have caching in the object yet. But it is coming right after. I'll…
				AlphareUnsubmitted Not Done The `in` + `get` seems wasteful, but I'm guessing you don't expect people to actually use the pure version? Alphare: The `in` + `get` seems wasteful, but I'm guessing you don't expect people to actually use the…
				marmouteAuthorUnsubmitted Done I don't expect this to be a major bottleneck. I can do some performance measurement once the dust settle. but I do not expect to find anything special. marmoute: I don't expect this to be a major bottleneck. I can do some performance measurement once the…


	def decode_files_sidedata(changelogrevision, sidedata):			def decode_files_sidedata(changelogrevision, sidedata):
	"""Return a ChangingFiles instance from a changelogrevision using sidata			md = ChangingFiles()
	pulkitUnsubmitted Not Done nit: we can preserve this line of documentation. pulkit: nit: we can preserve this line of documentation.
	marmouteAuthorUnsubmitted Done Indeed, we should. This probably got dropped while manipulating patches. marmoute: Indeed, we should. This probably got dropped while manipulating patches.
	"""			raw = sidedata.get(sidedatamod.SD_FILES)
	touched = changelogrevision.files

	rawindices = sidedata.get(sidedatamod.SD_FILESADDED)			if raw is None:
	added = decodefileindices(touched, rawindices)			return md

	rawindices = sidedata.get(sidedatamod.SD_FILESREMOVED)			copies = []
	removed = decodefileindices(touched, rawindices)			all_files = []

	rawcopies = sidedata.get(sidedatamod.SD_P1COPIES)			assert len(raw) >= INDEX_HEADER.size
	p1_copies = decodecopies(touched, rawcopies)			total_files = INDEX_HEADER.unpack_from(raw, 0)[0]

	rawcopies = sidedata.get(sidedatamod.SD_P2COPIES)			offset = INDEX_HEADER.size
	p2_copies = decodecopies(touched, rawcopies)			file_offset_base = offset + (INDEX_ENTRY.size * total_files)
				file_offset_last = file_offset_base
	return ChangingFiles(
	touched=touched,			assert len(raw) >= file_offset_base
				AlphareUnsubmitted Not Done I think we could benefit from some sort of asserts when reading sizes or in case of overflow. Alphare: I think we could benefit from some sort of asserts when reading sizes or in case of overflow.
				marmouteAuthorUnsubmitted Done Good point. marmoute: Good point.
	added=added,
	removed=removed,			for idx in range(total_files):
	p1_copies=p1_copies,			flag, file_end, copy_idx = INDEX_ENTRY.unpack_from(raw, offset)
	p2_copies=p2_copies,			file_end += file_offset_base
	)			filename = raw[file_offset_last:file_end]
				filesize = file_end - file_offset_last
				assert len(filename) == filesize
				offset += INDEX_ENTRY.size
				file_offset_last = file_end
				all_files.append(filename)
				if flag & ACTION_MASK == ADDED_FLAG:
				md.mark_added(filename)
				elif flag & ACTION_MASK == MERGED_FLAG:
				md.mark_merged(filename)
				elif flag & ACTION_MASK == REMOVED_FLAG:
				md.mark_removed(filename)
				elif flag & ACTION_MASK == TOUCHED_FLAG:
				md.mark_touched(filename)

				copied = None
				if flag & COPIED_MASK == COPIED_FROM_P1_FLAG:
				copied = md.mark_copied_from_p1
				elif flag & COPIED_MASK == COPIED_FROM_P2_FLAG:
				copied = md.mark_copied_from_p2

				if copied is not None:
				copies.append((copied, filename, copy_idx))

				for copied, filename, copy_idx in copies:
				copied(all_files[copy_idx], filename)

				return md


	def _getsidedata(srcrepo, rev):			def _getsidedata(srcrepo, rev):
	ctx = srcrepo[rev]			ctx = srcrepo[rev]
	filescopies = computechangesetcopies(ctx)			filescopies = computechangesetcopies(ctx)
	filesadded = computechangesetfilesadded(ctx)			filesadded = computechangesetfilesadded(ctx)
	filesremoved = computechangesetfilesremoved(ctx)			filesremoved = computechangesetfilesremoved(ctx)
	sidedata = {}			filesmerged = computechangesetfilesmerged(ctx)
	if any([filescopies, filesadded, filesremoved]):			files = ChangingFiles()
	sortedfiles = sorted(ctx.files())			files.update_touched(ctx.files())
	p1copies, p2copies = filescopies			files.update_added(filesadded)
	p1copies = encodecopies(sortedfiles, p1copies)			files.update_removed(filesremoved)
	p2copies = encodecopies(sortedfiles, p2copies)			files.update_merged(filesmerged)
	filesadded = encodefileindices(sortedfiles, filesadded)			files.update_copies_from_p1(filescopies[0])
	filesremoved = encodefileindices(sortedfiles, filesremoved)			files.update_copies_from_p2(filescopies[1])
	if p1copies:			return encode_files_sidedata(files)
				pulkitUnsubmitted Not Done This diff can ideally be in a different patch as it's not related to storage rework. pulkit: This diff can ideally be in a different patch as it's not related to storage rework.
				marmouteAuthorUnsubmitted Done Which diff are you thinking about ? the specific `filesmerged = computechangesetfilesmerged(ctx)` line ? If so the value would not be used anywhere without this change. marmoute: Which diff are you thinking about ? the specific `filesmerged = computechangesetfilesmerged…
	sidedata[sidedatamod.SD_P1COPIES] = p1copies
	if p2copies:
	sidedata[sidedatamod.SD_P2COPIES] = p2copies
	if filesadded:
	sidedata[sidedatamod.SD_FILESADDED] = filesadded
	if filesremoved:
	sidedata[sidedatamod.SD_FILESREMOVED] = filesremoved
	return sidedata


	def getsidedataadder(srcrepo, destrepo):			def getsidedataadder(srcrepo, destrepo):
	use_w = srcrepo.ui.configbool(b'experimental', b'worker.repository-upgrade')			use_w = srcrepo.ui.configbool(b'experimental', b'worker.repository-upgrade')
	if pycompat.iswindows or not use_w:			if pycompat.iswindows or not use_w:
	return _get_simple_sidedata_adder(srcrepo, destrepo)			return _get_simple_sidedata_adder(srcrepo, destrepo)
	else:			else:
	return _get_worker_sidedata_adder(srcrepo, destrepo)			return _get_worker_sidedata_adder(srcrepo, destrepo)

mercurial/revlogutils/sidedata.py

	SD_TEST6 = 6			SD_TEST6 = 6
	SD_TEST7 = 7			SD_TEST7 = 7

	# key to store copies related information			# key to store copies related information
	SD_P1COPIES = 8			SD_P1COPIES = 8
	SD_P2COPIES = 9			SD_P2COPIES = 9
	SD_FILESADDED = 10			SD_FILESADDED = 10
	SD_FILESREMOVED = 11			SD_FILESREMOVED = 11
				SD_FILES = 12
				pulkitUnsubmitted Not Done Coming from documenting mergestate constants, I think it will be nice to have documentation for all these keys as a followup. pulkit: Coming from documenting mergestate constants, I think it will be nice to have documentation for…

	# internal format constant			# internal format constant
	SIDEDATA_HEADER = struct.Struct('>H')			SIDEDATA_HEADER = struct.Struct('>H')
	SIDEDATA_ENTRY = struct.Struct('>HL20s')			SIDEDATA_ENTRY = struct.Struct('>HL20s')


	def sidedatawriteprocessor(rl, text, sidedata):			def sidedatawriteprocessor(rl, text, sidedata):
	sidedata = list(sidedata.items())			sidedata = list(sidedata.items())

tests/test-copies-in-changeset.t

	1			1
	2			2

	p1copies: 0\x00a (esc)			p1copies: 0\x00a (esc)
	1\x00a (esc)			1\x00a (esc)
	2\x00a (esc)			2\x00a (esc)
	#else			#else
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	2 sidedata entries			1 sidedata entries
	entry-0010 size 11			entry-0014 size 44
	'0\x00a\n1\x00a\n2\x00a'			'\x00\x00\x00\x04\x00\x00\x00\x00\x01\x00\x00\x00\x00\x06\x00\x00\x00\x02\x00\x00\x00\x00\x06\x00\x00\x00\x03\x00\x00\x00\x00\x06\x00\x00\x00\x04\x00\x00\x00\x00abcd'
	entry-0012 size 5
	'0\n1\n2'
	#endif			#endif

	$ hg showcopies			$ hg showcopies
	a -> b			a -> b
	a -> c			a -> c
	a -> d			a -> d

	#if extra			#if extra
	files: b b2			files: b b2
	filesadded: 1			filesadded: 1
	filesremoved: 0			filesremoved: 0

	p1copies: 1\x00b (esc)			p1copies: 1\x00b (esc)

	#else			#else
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	3 sidedata entries			1 sidedata entries
	entry-0010 size 3			entry-0014 size 25
	'1\x00b'			'\x00\x00\x00\x02\x0c\x00\x00\x00\x01\x00\x00\x00\x00\x06\x00\x00\x00\x03\x00\x00\x00\x00bb2'
	entry-0012 size 1
	'1'
	entry-0013 size 1
	'0'
	#endif			#endif

	$ hg showcopies			$ hg showcopies
	b -> b2			b -> b2


	Rename onto existing file. This should get recorded in the changeset files list and in the extras,			Rename onto existing file. This should get recorded in the changeset files list and in the extras,
	even though there is no filelog entry.			even though there is no filelog entry.
	$ hg changesetcopies			$ hg changesetcopies
	files: c			files: c

	p1copies: 0\x00b2 (esc)			p1copies: 0\x00b2 (esc)

	#else			#else
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	1 sidedata entries			1 sidedata entries
	entry-0010 size 4			entry-0014 size 25
	'0\x00b2'			'\x00\x00\x00\x02\x00\x00\x00\x00\x02\x00\x00\x00\x00\x16\x00\x00\x00\x03\x00\x00\x00\x00b2c'
	#endif			#endif

	$ hg showcopies			$ hg showcopies
	b2 -> c			b2 -> c

	#if extra			#if extra

	$ hg debugindex c			$ hg debugindex c
	2			2

	p1copies: 0\x00a (esc)			p1copies: 0\x00a (esc)
	2\x00f (esc)			2\x00f (esc)
	p2copies: 1\x00d (esc)			p2copies: 1\x00d (esc)

	#else			#else
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	3 sidedata entries			1 sidedata entries
	entry-0010 size 7			entry-0014 size 64
	'0\x00a\n2\x00f'			'\x00\x00\x00\x06\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x06\x00\x00\x00\x04\x00\x00\x00\x00\x07\x00\x00\x00\x05\x00\x00\x00\x01\x06\x00\x00\x00\x06\x00\x00\x00\x02adfghi'
	entry-0011 size 3
	'1\x00d'
	entry-0012 size 5
	'0\n1\n2'
	#endif			#endif

	$ hg showcopies			$ hg showcopies
	a -> g			a -> g
	d -> h			d -> h
	f -> i			f -> i

	Test writing to both changeset and filelog			Test writing to both changeset and filelog

	$ hg cp a j			$ hg cp a j
	#if extra			#if extra
	$ hg ci -m 'copy a to j' --config experimental.copies.write-to=compatibility			$ hg ci -m 'copy a to j' --config experimental.copies.write-to=compatibility
	$ hg changesetcopies			$ hg changesetcopies
	files: j			files: j
	filesadded: 0			filesadded: 0
	filesremoved:			filesremoved:

	p1copies: 0\x00a (esc)			p1copies: 0\x00a (esc)
	p2copies:			p2copies:
	#else			#else
	$ hg ci -m 'copy a to j'			$ hg ci -m 'copy a to j'
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	2 sidedata entries			1 sidedata entries
	entry-0010 size 3			entry-0014 size 24
	'0\x00a'			'\x00\x00\x00\x02\x00\x00\x00\x00\x01\x00\x00\x00\x00\x06\x00\x00\x00\x02\x00\x00\x00\x00aj'
	entry-0012 size 1
	'0'
	#endif			#endif
	$ hg debugdata j 0			$ hg debugdata j 0
	\x01 (esc)			\x01 (esc)
	copy: a			copy: a
	copyrev: b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3			copyrev: b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3
	\x01 (esc)			\x01 (esc)
	a			a
	$ hg showcopies			$ hg showcopies
	saved backup bundle to $TESTTMP/repo/.hg/strip-backup/--amend.hg (glob)			saved backup bundle to $TESTTMP/repo/.hg/strip-backup/--amend.hg (glob)
	$ hg changesetcopies			$ hg changesetcopies
	files: j			files: j

	#else			#else
	$ hg ci --amend -m 'copy a to j, v2'			$ hg ci --amend -m 'copy a to j, v2'
	saved backup bundle to $TESTTMP/repo/.hg/strip-backup/--amend.hg (glob)			saved backup bundle to $TESTTMP/repo/.hg/strip-backup/--amend.hg (glob)
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	2 sidedata entries			1 sidedata entries
	entry-0010 size 3			entry-0014 size 24
	'0\x00a'			'\x00\x00\x00\x02\x00\x00\x00\x00\x01\x00\x00\x00\x00\x06\x00\x00\x00\x02\x00\x00\x00\x00aj'
	entry-0012 size 1
	'0'
	#endif			#endif
	$ hg showcopies --config experimental.copies.read-from=filelog-only			$ hg showcopies --config experimental.copies.read-from=filelog-only
	a -> j			a -> j
	The entries should be written to extras even if they're empty (so the client			The entries should be written to extras even if they're empty (so the client
	won't have to fall back to reading from filelogs)			won't have to fall back to reading from filelogs)
	$ echo x >> j			$ echo x >> j
	#if extra			#if extra
	$ hg ci -m 'modify j' --config experimental.copies.write-to=compatibility			$ hg ci -m 'modify j' --config experimental.copies.write-to=compatibility
	$ hg changesetcopies			$ hg changesetcopies
	files: j			files: j
	filesadded:			filesadded:
	filesremoved:			filesremoved:

	p1copies:			p1copies:
	p2copies:			p2copies:
	#else			#else
	$ hg ci -m 'modify j'			$ hg ci -m 'modify j'
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
				1 sidedata entries
				entry-0014 size 14
				'\x00\x00\x00\x01\x14\x00\x00\x00\x01\x00\x00\x00\x00j'
	#endif			#endif

	Test writing only to filelog			Test writing only to filelog

	$ hg cp a k			$ hg cp a k
	#if extra			#if extra
	$ hg ci -m 'copy a to k' --config experimental.copies.write-to=filelog-only			$ hg ci -m 'copy a to k' --config experimental.copies.write-to=filelog-only

	$ hg changesetcopies			$ hg changesetcopies
	files: k			files: k

	#else			#else
	$ hg ci -m 'copy a to k'			$ hg ci -m 'copy a to k'
	$ hg debugsidedata -c -v -- -1			$ hg debugsidedata -c -v -- -1
	2 sidedata entries			1 sidedata entries
	entry-0010 size 3			entry-0014 size 24
	'0\x00a'			'\x00\x00\x00\x02\x00\x00\x00\x00\x01\x00\x00\x00\x00\x06\x00\x00\x00\x02\x00\x00\x00\x00ak'
	entry-0012 size 1
	'0'
	#endif			#endif

	$ hg debugdata k 0			$ hg debugdata k 0
	\x01 (esc)			\x01 (esc)
	copy: a			copy: a
	copyrev: b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3			copyrev: b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3
	\x01 (esc)			\x01 (esc)
	a			a
	sidedata: yes yes no			sidedata: yes yes no
	persistent-nodemap: no no no			persistent-nodemap: no no no
	copies-sdc: yes yes no			copies-sdc: yes yes no
	plain-cl-delta: yes yes yes			plain-cl-delta: yes yes yes
	compression: zlib zlib zlib			compression: zlib zlib zlib
	compression-level: default default default			compression-level: default default default
	$ hg debugsidedata -c -- 0			$ hg debugsidedata -c -- 0
	1 sidedata entries			1 sidedata entries
	entry-0012 size 1			entry-0014 size 14
	$ hg debugsidedata -c -- 1			$ hg debugsidedata -c -- 1
	1 sidedata entries			1 sidedata entries
	entry-0013 size 1			entry-0014 size 14
	$ hg debugsidedata -m -- 0			$ hg debugsidedata -m -- 0
	$ cat << EOF > .hg/hgrc			$ cat << EOF > .hg/hgrc
	> [format]			> [format]
	> exp-use-side-data = yes			> exp-use-side-data = yes
	> exp-use-copies-side-data-changeset = no			> exp-use-copies-side-data-changeset = no
	> EOF			> EOF
	$ hg debugupgraderepo --run --quiet --no-backup > /dev/null			$ hg debugupgraderepo --run --quiet --no-backup > /dev/null
	$ hg debugformat -v			$ hg debugformat -v
	format-variant repo config default			format-variant repo config default
	fncache: yes yes yes			fncache: yes yes yes
	dotencode: yes yes yes			dotencode: yes yes yes
	generaldelta: yes yes yes			generaldelta: yes yes yes
	sparserevlog: yes yes yes			sparserevlog: yes yes yes
	sidedata: yes yes no			sidedata: yes yes no
	persistent-nodemap: no no no			persistent-nodemap: no no no
	copies-sdc: no no no			copies-sdc: no no no
	plain-cl-delta: yes yes yes			plain-cl-delta: yes yes yes
	compression: zlib zlib zlib			compression: zlib zlib zlib
	compression-level: default default default			compression-level: default default default
	$ hg debugsidedata -c -- 0			$ hg debugsidedata -c -- 0
				1 sidedata entries
				entry-0014 size 14
	$ hg debugsidedata -c -- 1			$ hg debugsidedata -c -- 1
				1 sidedata entries
				entry-0014 size 14
	$ hg debugsidedata -m -- 0			$ hg debugsidedata -m -- 0

	upgrading			upgrading

	$ cat << EOF > .hg/hgrc			$ cat << EOF > .hg/hgrc
	> [format]			> [format]
	> exp-use-copies-side-data-changeset = yes			> exp-use-copies-side-data-changeset = yes
	> EOF			> EOF
	$ hg debugupgraderepo --run --quiet --no-backup > /dev/null			$ hg debugupgraderepo --run --quiet --no-backup > /dev/null
	$ hg debugformat -v			$ hg debugformat -v
	format-variant repo config default			format-variant repo config default
	fncache: yes yes yes			fncache: yes yes yes
	dotencode: yes yes yes			dotencode: yes yes yes
	generaldelta: yes yes yes			generaldelta: yes yes yes
	sparserevlog: yes yes yes			sparserevlog: yes yes yes
	sidedata: yes yes no			sidedata: yes yes no
	persistent-nodemap: no no no			persistent-nodemap: no no no
	copies-sdc: yes yes no			copies-sdc: yes yes no
	plain-cl-delta: yes yes yes			plain-cl-delta: yes yes yes
	compression: zlib zlib zlib			compression: zlib zlib zlib
	compression-level: default default default			compression-level: default default default
	$ hg debugsidedata -c -- 0			$ hg debugsidedata -c -- 0
	1 sidedata entries			1 sidedata entries
	entry-0012 size 1			entry-0014 size 14
	$ hg debugsidedata -c -- 1			$ hg debugsidedata -c -- 1
	1 sidedata entries			1 sidedata entries
	entry-0013 size 1			entry-0014 size 14
	$ hg debugsidedata -m -- 0			$ hg debugsidedata -m -- 0

	#endif			#endif

	$ cd ..			$ cd ..