copies-rust: extract generic map merge logic from merge_copies_dict
ClosedPublic

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	24615		Jan 6 2021, 9:12 AM	★	★
Diff 2	24682		Jan 8 2021, 6:16 PM	★	★
Diff 3	25702		Feb 22 2021, 9:14 AM	★	★
Diff 4	25761		Feb 22 2021, 11:22 AM	★	★
Diff 5	25800		Feb 22 2021, 5:12 PM	★	★
Diff 6	25861	rHG26d0acbc6ccee899ae274e822830637afb5dee01	Dec 23 2020, 5:48 AM	★	★

Commit	Parents	Author	Summary	Date
53b0aab6c386	c55914e492f6	Simon Sapin		Dec 23 2020, 5:48 AM

Status	Author	Revision
Closed	marmoute	D9591 copies: rename value/other variable to minor/major for clarity
Closed	marmoute	D9590 copies: extract value comparison in the python copy tracing
Closed	marmoute	D9607 hghave: add some official category for known-bad and missing-good output
Closed	marmoute	D9608 copies: stop attempt to avoid extra dict copies around branching
Closed	marmoute	D9592 copies: deal with the "same revision" special case earlier
Closed	marmoute	D9589 copies-tests: update to null in test-copies-chain-merge.t
Closed	marmoute	D9588 copies-tests: add a summary of all cases created in test-copies-chain-merge.t
Closed	SimonSapin	D9686 copies-rust: send PyBytes values back be dropped ino the parent thread
Closed	SimonSapin	D9685 copies-rust: introduce PyBytesWithData to reduce GIL requirement
Closed	SimonSapin	D9684 copies-rust: move CPU-heavy Rust processing into a child thread
Closed	SimonSapin	D9683 copies-rust: split up combine_changeset_copies function into a struct
Closed	SimonSapin	D9682 copies-rust: extract generic map merge logic from merge_copies_dict
Closed	marmoute	D9656 copies-rust: use imrs::OrdSet instead of imrs::HashSet
Closed	marmoute	D9655 copies-rust: use simpler overwrite when value on both side are identical
Closed	marmoute	D9654 copies-rust: make more use of the new comparison property
Closed	marmoute	D9653 copies-rust: implement PartialEqual manually
Closed	marmoute	D9652 copies-rust: record "overwritten" information from both side on delete
Closed	marmoute	D9651 copies-rust: refactor the "deletion" case
Closed	marmoute	D9650 copies-rust: process copy information of both parent at the same time
Closed	marmoute	D9649 copies-rust: yield both p1 and p2 copies in `ChangedFiles.actions()`
Closed	marmoute	D9648 copies-rust: extract the processing of a single copy information
Closed	marmoute	D9647 copies-rust: use matching to select the final copies information
Closed	marmoute	D9646 copies-rust: get the parents' copies earlier
Closed	marmoute	D9645 copies-rust: remove the ancestor Oracle logic
Closed	marmoute	D9644 copies-rust: track "overwrites" directly within CopySource
Closed	marmoute	D9643 copies-rust: add methods to build and update CopySource
Closed	marmoute	D9657 copies-rust: fix reverted argument when merging tiny minor or major
Closed	marmoute	D9642 copies-rust: rename TimeStampedPathCopy to CopySource
Closed	marmoute	D9641 copies-rust: rename TimeStampedPathCopies to InternalPathCopies
Closed	marmoute	D9613 copies: detect case when a merge decision overwrite previous data
Closed	marmoute	D9612 copies: rearrange all value comparison conditional
Closed	marmoute	D10059 test-copies: introduce merge chains test for the P/Q merges
Closed	marmoute	D10058 test-copies: add a case involving the `b` and a new `r` branch
Closed	marmoute	D10057 test-copies: introduce case combining the `p` and `q` branch
Closed	marmoute	D10056 test-copies: add a `q` branch similar to the `e` but on the new files
Closed	marmoute	D10055 test-copies: add a `p` branch similar to the `a` but on the new files
Closed	marmoute	D10054 test-copies: move the new files in the `i` branch
Closed	marmoute	D10053 test-copies: add 3 new files with their own content
Closed	marmoute	D10052 test-copies: introduce merge chaing test for the A/E + change tests
Closed	marmoute	D10051 test-copies: add a "change during merge" variant to the A+E test
Closed	marmoute	D10050 test-copies: filter out the linkrev part of `debugindex`
Closed	marmoute	D10049 test-copies: use "case-id" instead of revision number when listing sidedata
Closed	marmoute	D10048 test-copies: remove revision number from log
Closed	marmoute	D9611 test-copies: add test chaining multiple merge
Closed	marmoute	D9610 test-copies: add test chaining multiple merges
Closed	marmoute	D9609 test-copies: add test chaining multiple merges
Closed	marmoute	D10047 test-copies: add subcase titles for various "conflicting" information variant
Closed	marmoute	D10046 test-copies: improve description of the B+F case
Closed	marmoute	D10045 test-copies: improve description of the C+H case
Closed	marmoute	D10044 test-copies: improve description of the B+C "revert/restore" case
Closed	marmoute	D10043 test-copies: improve description of the G+C case
Closed	marmoute	D10042 test-copies: improve description of the G+F case
Closed	marmoute	D10041 test-copies: improve description of the D+G case
Closed	marmoute	D10040 test-copies: improve description of the A+E case
Closed	marmoute	D10039 test-copies: improve description of the B+D case
Closed	marmoute	D10038 test-copies: improve description of the B+C case
Closed	marmoute	D10037 test-copies: improve description of the A+B case
Closed	marmoute	D10036 test-copies: use intermediate variable some commit descriptions
Closed	marmoute	D10035 test-copies: don't use empty file for "same content" cases
Closed	marmoute	D9587 test-copies: reinstall initial identical (empty) files for chained copied
Closed	marmoute	D9586 copies: explain the "arbitrary" copy source pick in case of conflict
Closed	marmoute	D9585 copies: properly match result during changeset centric copy tracing
Closed	marmoute	D9584 copies: avoid early return in _combine_changeset_copies
Closed	marmoute	D9499 copies-rust: record overwrite when merging
Closed	marmoute	D9498 copies-rust: make the comparison aware of the revision being current merged
Closed	marmoute	D9497 copies-rust: start recording overwrite as they happens
Closed	marmoute	D9496 copies-rust: rename Oracle.is_ancestor to Oracle.is_overwrite
Closed	marmoute	D9495 copies-rust: use the `entry` API for copy information too
Closed	marmoute	D9494 copies-rust: use the entry API to overwrite deleted entry
Closed	marmoute	D9493 copies-rust: tokenize all paths into integer
Closed	marmoute	D9492 copies-rust: pre-introduce a PathToken type and use it where applicable
Closed	marmoute	D9491 copies-rust: add smarter approach for merging small mapping with large mapping
Closed	marmoute	D9426 copies-rust: hide most of the comparison details inside a closure
Closed	marmoute	D9425 copies-rust: move the mapping merging into a else clause
Closed	marmoute	D9424 copies-rust: extract conflicting value comparison in its own function
Closed	marmoute	D9423 copies: no longer cache the ChangedFiles during copy tracing
Closed	marmoute	D9422 copies: iterate over children directly (instead of parents)
Closed	marmoute	D9581 copies: document the current algorithm step

Diff 24682

rust/hg-core/src/copy_tracing.rs

	use crate::utils::hg_path::HgPath;			use crate::utils::hg_path::HgPath;
	use crate::utils::hg_path::HgPathBuf;			use crate::utils::hg_path::HgPathBuf;
	use crate::Revision;			use crate::Revision;
	use crate::NULL_REVISION;			use crate::NULL_REVISION;

	use im_rc::ordmap::DiffItem;
	use im_rc::ordmap::Entry;			use im_rc::ordmap::Entry;
	use im_rc::ordmap::OrdMap;			use im_rc::ordmap::OrdMap;
	use im_rc::OrdSet;			use im_rc::OrdSet;

	use std::cmp::Ordering;			use std::cmp::Ordering;
	use std::collections::HashMap;			use std::collections::HashMap;
	use std::convert::TryInto;			use std::convert::TryInto;


	/// merge two copies-mapping together, minor and major			/// merge two copies-mapping together, minor and major
	///			///
	/// In case of conflict, value from "major" will be picked, unless in some			/// In case of conflict, value from "major" will be picked, unless in some
	/// cases. See inline documentation for details.			/// cases. See inline documentation for details.
	fn merge_copies_dict(			fn merge_copies_dict(
	path_map: &TwoWayPathMap,			path_map: &TwoWayPathMap,
	current_merge: Revision,			current_merge: Revision,
	mut minor: InternalPathCopies,			minor: InternalPathCopies,
	mut major: InternalPathCopies,			major: InternalPathCopies,
	changes: &ChangedFiles,			changes: &ChangedFiles,
	) -> InternalPathCopies {			) -> InternalPathCopies {
	// This closure exist as temporary help while multiple developper are			use crate::utils::{ordmap_union_with_merge, MergeResult};
	// actively working on this code. Feel free to re-inline it once this
	// code is more settled.			ordmap_union_with_merge(minor, major, \|dest, src_minor, src_major\| {
	let cmp_value =			let (pick, overwrite) = compare_value(
	\|dest: &PathToken, src_minor: &CopySource, src_major: &CopySource\| {
	compare_value(
	path_map,			path_map,
	current_merge,			current_merge,
	changes,			changes,
	dest,			dest,
	src_minor,			src_minor,
	src_major,			src_major,
	)			);
	};
	if minor.is_empty() {
	major
	} else if major.is_empty() {
	minor
	} else if minor.len() * 2 < major.len() {
	// Lets says we are merging two InternalPathCopies instance A and B.
	//
	// If A contains N items, the merge result will never contains more
	// than N values differents than the one in A
	//
	// If B contains M items, with M > N, the merge result will always
	// result in a minimum of M - N value differents than the on in
	// A
	//
	// As a result, if N < (M-N), we know that simply iterating over A will
	// yield less difference than iterating over the difference
	// between A and B.
	//
	// This help performance a lot in case were a tiny
	// InternalPathCopies is merged with a much larger one.
	for (dest, src_minor) in minor {
	let src_major = major.get(&dest);
	match src_major {
	None => {
	major.insert(dest, src_minor);
	}
	Some(src_major) => {
	let (pick, overwrite) =
	cmp_value(&dest, &src_minor, src_major);
	if overwrite {
	let src = match pick {
	MergePick::Major => CopySource::new_from_merge(
	current_merge,
	src_major,
	&src_minor,
	),
	MergePick::Minor => CopySource::new_from_merge(
	current_merge,
	&src_minor,
	src_major,
	),
	MergePick::Any => CopySource::new_from_merge(
	current_merge,
	src_major,
	&src_minor,
	),
	};
	major.insert(dest, src);
	} else {
	match pick {
	MergePick::Any \| MergePick::Major => None,
	MergePick::Minor => major.insert(dest, src_minor),
	};
	}
	}
	};
	}
	major
	} else if major.len() * 2 < minor.len() {
	// This use the same rational than the previous block.
	// (Check previous block documentation for details.)
	for (dest, src_major) in major {
	let src_minor = minor.get(&dest);
	match src_minor {
	None => {
	minor.insert(dest, src_major);
	}
	Some(src_minor) => {
	let (pick, overwrite) =
	cmp_value(&dest, src_minor, &src_major);
	if overwrite {			if overwrite {
	let src = match pick {			let (winner, loser) = match pick {
	MergePick::Major => CopySource::new_from_merge(			MergePick::Major \| MergePick::Any => (src_major, src_minor),
	current_merge,			MergePick::Minor => (src_minor, src_major),
	&src_major,
	src_minor,
	),
	MergePick::Minor => CopySource::new_from_merge(
	current_merge,
	src_minor,
	&src_major,
	),
	MergePick::Any => CopySource::new_from_merge(
	current_merge,
	&src_major,
	src_minor,
	),
	};
	minor.insert(dest, src);
	} else {
	match pick {
	MergePick::Any \| MergePick::Minor => None,
	MergePick::Major => minor.insert(dest, src_major),
	};
	}
	}
	};			};
	}			MergeResult::UseNewValue(CopySource::new_from_merge(
	minor
	} else {
	let mut override_minor = Vec::new();
	let mut override_major = Vec::new();

	let mut to_major = \|k: &PathToken, v: &CopySource\| {
	override_major.push((k.clone(), v.clone()))
	};
	let mut to_minor = \|k: &PathToken, v: &CopySource\| {
	override_minor.push((k.clone(), v.clone()))
	};

	// The diff function leverage detection of the identical subpart if
	// minor and major has some common ancestors. This make it very
	// fast is most case.
	//
	// In case where the two map are vastly different in size, the current
	// approach is still slowish because the iteration will iterate over
	// all the "exclusive" content of the larger on. This situation can be
	// frequent when the subgraph of revision we are processing has a lot
	// of roots. Each roots adding they own fully new map to the mix (and
	// likely a small map, if the path from the root to the "main path" is
	// small.
	//
	// We could do better by detecting such situation and processing them
	// differently.
	for d in minor.diff(&major) {
	match d {
	DiffItem::Add(k, v) => to_minor(k, v),
	DiffItem::Remove(k, v) => to_major(k, v),
	DiffItem::Update { old, new } => {
	let (dest, src_major) = new;
	let (_, src_minor) = old;
	let (pick, overwrite) =
	cmp_value(dest, src_minor, src_major);
	if overwrite {
	let src = match pick {
	MergePick::Major => CopySource::new_from_merge(
	current_merge,
	src_major,
	src_minor,
	),
	MergePick::Minor => CopySource::new_from_merge(
	current_merge,
	src_minor,
	src_major,
	),
	MergePick::Any => CopySource::new_from_merge(
	current_merge,			current_merge,
	src_major,			winner,
	src_minor,			loser,
	),			))
	};
	to_minor(dest, &src);
	to_major(dest, &src);
	} else {			} else {
	match pick {			match pick {
	MergePick::Major => to_minor(dest, src_major),			MergePick::Any \| MergePick::Major => {
	MergePick::Minor => to_major(dest, src_minor),			MergeResult::UseRightValue
	// If the two entry are identical, no need to do
	// anything (but diff should not have yield them)
	MergePick::Any => unreachable!(),
	}
	}			}
				MergePick::Minor => MergeResult::UseLeftValue,
	}			}
	};
	}

	let updates;
	let mut result;
	if override_major.is_empty() {
	result = major
	} else if override_minor.is_empty() {
	result = minor
	} else {
	if override_minor.len() < override_major.len() {
	updates = override_minor;
	result = minor;
	} else {
	updates = override_major;
	result = major;
	}
	for (k, v) in updates {
	result.insert(k, v);
	}
	}
	result
	}			}
				})
	}			}

	/// represent the side that should prevail when merging two			/// represent the side that should prevail when merging two
	/// InternalPathCopies			/// InternalPathCopies
	enum MergePick {			enum MergePick {
	/// The "major" (p1) side prevails			/// The "major" (p1) side prevails
	Major,			Major,
	/// The "minor" (p2) side prevails			/// The "minor" (p2) side prevails

rust/hg-core/src/utils.rs

	// utils module			// utils module
	//			//
	// Copyright 2019 Raphaël Gomès <rgomes@octobus.net>			// Copyright 2019 Raphaël Gomès <rgomes@octobus.net>
	//			//
	// This software may be used and distributed according to the terms of the			// This software may be used and distributed according to the terms of the
	// GNU General Public License version 2 or any later version.			// GNU General Public License version 2 or any later version.

	//! Contains useful functions, traits, structs, etc. for use in core.			//! Contains useful functions, traits, structs, etc. for use in core.

	use crate::utils::hg_path::HgPath;			use crate::utils::hg_path::HgPath;
				use im_rc::ordmap::DiffItem;
				use im_rc::ordmap::OrdMap;
	use std::{io::Write, ops::Deref};			use std::{io::Write, ops::Deref};

	pub mod files;			pub mod files;
	pub mod hg_path;			pub mod hg_path;
	pub mod path_auditor;			pub mod path_auditor;

	/// Useful until rust/issues/56345 is stable			/// Useful until rust/issues/56345 is stable
	///			///
	// TODO: use the str method when we require Rust 1.45			// TODO: use the str method when we require Rust 1.45
	pub(crate) fn strip_suffix<'a>(s: &'a str, suffix: &str) -> Option<&'a str> {			pub(crate) fn strip_suffix<'a>(s: &'a str, suffix: &str) -> Option<&'a str> {
	if s.ends_with(suffix) {			if s.ends_with(suffix) {
	Some(&s[..s.len() - suffix.len()])			Some(&s[..s.len() - suffix.len()])
	} else {			} else {
	None			None
	}			}
	}			}

				pub(crate) enum MergeResult<V> {
				UseLeftValue,
				UseRightValue,
				UseNewValue(V),
				}

				/// Return the union of the two given maps,
				/// calling `merge(key, left_value, right_value)` to resolve keys that exist in
				/// both.
				///
				/// CC https://github.com/bodil/im-rs/issues/166
				pub(crate) fn ordmap_union_with_merge<K, V>(
				left: OrdMap<K, V>,
				right: OrdMap<K, V>,
				mut merge: impl FnMut(&K, &V, &V) -> MergeResult<V>,
				) -> OrdMap<K, V>
				where
				K: Clone + Ord,
				V: Clone + PartialEq,
				{
				if left.ptr_eq(&right) {
				// One of the two maps is an unmodified clone of the other
				left
				} else if left.len() / 2 > right.len() {
				// When two maps have different sizes,
				// their size difference is a lower bound on
				// how many keys of the larger map are not also in the smaller map.
				// This in turn is a lower bound on the number of differences in
				// `OrdMap::diff` and the "amount of work" that would be done
				// by `ordmap_union_with_merge_by_diff`.
				//
				// Here `left` is more than twice the size of `right`,
				// so the number of differences is more than the total size of
				// `right`. Therefore an algorithm based on iterating `right`
				// is more efficient.
				//
				// This helps a lot when a tiny (or empty) map is merged
				// with a large one.
				ordmap_union_with_merge_by_iter(left, right, merge)
				} else if left.len() < right.len() / 2 {
				// Same as above but with `left` and `right` swapped
				ordmap_union_with_merge_by_iter(right, left, \|key, a, b\| {
				// Also swapped in `merge` arguments:
				match merge(key, b, a) {
				MergeResult::UseNewValue(v) => MergeResult::UseNewValue(v),
				// … and swap back in `merge` result:
				MergeResult::UseLeftValue => MergeResult::UseRightValue,
				MergeResult::UseRightValue => MergeResult::UseLeftValue,
				}
				})
				} else {
				// For maps of similar size, use the algorithm based on `OrdMap::diff`
				ordmap_union_with_merge_by_diff(left, right, merge)
				}
				}

				/// Efficient if `right` is much smaller than `left`
				fn ordmap_union_with_merge_by_iter<K, V>(
				mut left: OrdMap<K, V>,
				right: OrdMap<K, V>,
				mut merge: impl FnMut(&K, &V, &V) -> MergeResult<V>,
				) -> OrdMap<K, V>
				where
				K: Clone + Ord,
				V: Clone,
				{
				for (key, right_value) in right {
				match left.get(&key) {
				None => {
				left.insert(key, right_value);
				}
				Some(left_value) => match merge(&key, left_value, &right_value) {
				MergeResult::UseLeftValue => {}
				MergeResult::UseRightValue => {
				left.insert(key, right_value);
				}
				MergeResult::UseNewValue(new_value) => {
				left.insert(key, new_value);
				}
				},
				}
				}
				left
				}

				/// Fallback when both maps are of similar size
				fn ordmap_union_with_merge_by_diff<K, V>(
				mut left: OrdMap<K, V>,
				mut right: OrdMap<K, V>,
				mut merge: impl FnMut(&K, &V, &V) -> MergeResult<V>,
				) -> OrdMap<K, V>
				where
				K: Clone + Ord,
				V: Clone + PartialEq,
				{
				// (key, value) pairs that would need to be inserted in either map
				// in order to turn it into the union.
				//
				// TODO: if/when https://github.com/bodil/im-rs/pull/168 is accepted,
				// change these from `Vec<(K, V)>` to `Vec<(&K, Cow<V>)>`
				// with `left_updates` only borrowing from `right` and `right_updates` from
				// `left`, and with `Cow::Owned` used for `MergeResult::UseNewValue`.
				//
				// This would allow moving all `.clone()` calls to after we’ve decided
				// which of `right_updates` or `left_updates` to use
				// (value ones becoming `Cow::into_owned`),
				// and avoid making clones we don’t end up using.
				let mut left_updates = Vec::new();
				let mut right_updates = Vec::new();

				for difference in left.diff(&right) {
				match difference {
				DiffItem::Add(key, value) => {
				left_updates.push((key.clone(), value.clone()))
				}
				DiffItem::Remove(key, value) => {
				right_updates.push((key.clone(), value.clone()))
				}
				DiffItem::Update {
				old: (key, left_value),
				new: (_, right_value),
				} => match merge(key, left_value, right_value) {
				MergeResult::UseLeftValue => {
				right_updates.push((key.clone(), left_value.clone()))
				}
				MergeResult::UseRightValue => {
				left_updates.push((key.clone(), right_value.clone()))
				}
				MergeResult::UseNewValue(new_value) => {
				left_updates.push((key.clone(), new_value.clone()));
				right_updates.push((key.clone(), new_value))
				}
				},
				}
				}
				if left_updates.len() < right_updates.len() {
				for (key, value) in left_updates {
				left.insert(key, value);
				}
				left
				} else {
				for (key, value) in right_updates {
				right.insert(key, value);
				}
				right
				}
				}

This is an archive of the discontinued Mercurial Phabricator instance.

copies-rust: extract generic map merge logic from merge_copies_dictClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 24682

rust/hg-core/src/copy_tracing.rs

rust/hg-core/src/utils.rs

copies-rust: extract generic map merge logic from merge_copies_dict
ClosedPublic

Revision Contents
Changeset List