This is an archive of the discontinued Mercurial Phabricator instance.

Whoops, this last version didn't even compile (I compiled the wrong workspace when I made the tweak). I pushed a version that compiles now.
I saw the performance difference on .hgignore parsing (saw a 1ms improvement or so), but I don't think there's a command that makes that operation in isolation.
I'm looking at adding rhg debugignore.

I just realized that the extra copying that I removed was potentially useful. After a call to extend the buffer has spare capacity, so the call to HgPathBuf::from_bytes was trimming the unnecessary bytes. Removal this trimming saves time, but wastes some memory.
I'm assuming this is a fine tradeoff if the standard library is taking it, but I imagine we could get the best of both worlds if we pre-allocated a vector of the right capacity from the start.

The command hg debugignorerhg I introduced in https://phab.mercurial-scm.org/D11722 on a repo with a ~3k line hgignore takes ~31ms after this patch and ~32.5 before this patch.

I looked at the benefit of pre-allocating the HgPathBuf of the right size from the start, but I couldn't measure much improvement, if any. (possibly because I don't have a sensitive benchmark)
The patch looks like this:

diff --git a/rust/hg-core/src/utils/hg_path.rs b/rust/hg-core/src/utils/hg_path.rs
--- a/rust/hg-core/src/utils/hg_path.rs
+++ b/rust/hg-core/src/utils/hg_path.rs
@@ -172,6 +172,13 @@ impl HgPath {
             inner: self.inner.to_owned(),
         }
     }
+    fn to_hg_path_buf_with_spare_capacity(&self, spare_capacity : usize) -> HgPathBuf {
+        let mut vec = Vec::with_capacity(self.len() + spare_capacity);
+        vec.extend(&self.inner);
+        HgPathBuf {
+            inner: vec,
+        }
+    }
     pub fn bytes(&self) -> std::slice::Iter<u8> {
         self.inner.iter()
     }
@@ -222,7 +229,7 @@ impl HgPath {
     }
 
     pub fn join(&self, path: &HgPath) -> HgPathBuf {
-        let mut buf = self.to_owned();
+        let mut buf = self.to_hg_path_buf_with_spare_capacity(path.len() + 1);
         buf.push(path);
         buf
     }

I'm happy to push that if you think that's better.

I think this is a better change because of the ergonomics (them being closer to stdlib) and because it removes the quadratic code, but the optimizer is probably smart enough to figure some of this out. I don't think a 1ms difference in a 30ms run is very significant (unless you have the most stable of machines).

pulkit accepted this revision.Nov 9 2021, 9:49 AM

aalekseyev added a commit: rHG6d69e83e6b6e: rhg: more efficient `HgPath::join`.Nov 9 2021, 10:03 AM

Closed by commit rHG6d69e83e6b6e: rhg: more efficient `HgPath::join` (authored by aalekseyev). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			rust/hg-core/src/filepatterns.rs (2 lines)
M			rust/hg-core/src/matchers.rs (9 lines)
M			rust/hg-core/src/utils/hg_path.rs (22 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	30989		Oct 26 2021, 2:49 PM	★	★
Diff 2	30990		Oct 26 2021, 2:52 PM	★	★
Diff 3	30991		Oct 27 2021, 6:16 AM	★	★
Diff 4	31043	rHG6d69e83e6b6ee344f10b4a4b16f410680dc2df98	Oct 26 2021, 2:47 PM	★	★

Diff 31043

rust/hg-core/src/filepatterns.rs

	let path = source_root.join(get_path_from_bytes(pattern));			let path = source_root.join(get_path_from_bytes(pattern));
	let new_root = path.parent().unwrap_or_else(\|\| path.deref());			let new_root = path.parent().unwrap_or_else(\|\| path.deref());

	let prefix = canonical_path(root_dir, root_dir, new_root)?;			let prefix = canonical_path(root_dir, root_dir, new_root)?;

	Ok(Self {			Ok(Self {
	prefix: path_to_hg_path_buf(prefix).and_then(\|mut p\| {			prefix: path_to_hg_path_buf(prefix).and_then(\|mut p\| {
	if !p.is_empty() {			if !p.is_empty() {
	p.push(b'/');			p.push_byte(b'/');
	}			}
	Ok(p)			Ok(p)
	})?,			})?,
	path: path.to_owned(),			path: path.to_owned(),
	root: new_root.to_owned(),			root: new_root.to_owned(),
	included_patterns: Vec::new(),			included_patterns: Vec::new(),
	})			})
	}			}

rust/hg-core/src/matchers.rs

	let mut dirs = Vec::new();			let mut dirs = Vec::new();

	for ignore_pattern in ignore_patterns {			for ignore_pattern in ignore_patterns {
	let IgnorePattern {			let IgnorePattern {
	syntax, pattern, ..			syntax, pattern, ..
	} = ignore_pattern;			} = ignore_pattern;
	match syntax {			match syntax {
	PatternSyntax::RootGlob \| PatternSyntax::Glob => {			PatternSyntax::RootGlob \| PatternSyntax::Glob => {
	let mut root = vec![];			let mut root = HgPathBuf::new();

	for p in pattern.split(\|c\| *c == b'/') {			for p in pattern.split(\|c\| *c == b'/') {
	if p.iter().any(\|c\| match *c {			if p.iter().any(\|c\| match *c {
	b'[' \| b'{' \| b'*' \| b'?' => true,			b'[' \| b'{' \| b'*' \| b'?' => true,
	_ => false,			_ => false,
	}) {			}) {
	break;			break;
	}			}
	root.push(HgPathBuf::from_bytes(p));			root.push(HgPathBuf::from_bytes(p).as_ref());
	}			}
	let buf =			roots.push(root);
	root.iter().fold(HgPathBuf::new(), \|acc, r\| acc.join(r));
	roots.push(buf);
	}			}
	PatternSyntax::Path \| PatternSyntax::RelPath => {			PatternSyntax::Path \| PatternSyntax::RelPath => {
	let pat = HgPath::new(if pattern == b"." {			let pat = HgPath::new(if pattern == b"." {
	&[] as &[u8]			&[] as &[u8]
	} else {			} else {
	pattern			pattern
	});			});
	roots.push(pat.to_owned());			roots.push(pat.to_owned());

rust/hg-core/src/utils/hg_path.rs

	match &self.inner.iter().rposition(\|c\| *c == b'/') {			match &self.inner.iter().rposition(\|c\| *c == b'/') {
	None => (HgPath::new(""), &self),			None => (HgPath::new(""), &self),
	Some(size) => (			Some(size) => (
	HgPath::new(&self.inner[..*size]),			HgPath::new(&self.inner[..*size]),
	HgPath::new(&self.inner[*size + 1..]),			HgPath::new(&self.inner[*size + 1..]),
	),			),
	}			}
	}			}
	pub fn join<T: ?Sized + AsRef<Self>>(&self, other: &T) -> HgPathBuf {
	let mut inner = self.inner.to_owned();			pub fn join(&self, path: &HgPath) -> HgPathBuf {
	if !inner.is_empty() && inner.last() != Some(&b'/') {			let mut buf = self.to_owned();
	inner.push(b'/');			buf.push(path);
	}			buf
	inner.extend(other.as_ref().bytes());
	HgPathBuf::from_bytes(&inner)
	}			}

	pub fn components(&self) -> impl Iterator<Item = &HgPath> {			pub fn components(&self) -> impl Iterator<Item = &HgPath> {
	self.inner.split(\|&byte\| byte == b'/').map(HgPath::new)			self.inner.split(\|&byte\| byte == b'/').map(HgPath::new)
	}			}

	/// Returns the first (that is "root-most") slash-separated component of			/// Returns the first (that is "root-most") slash-separated component of
	/// the path, and the rest after the first slash if there is one.			/// the path, and the rest after the first slash if there is one.
	pub struct HgPathBuf {			pub struct HgPathBuf {
	inner: Vec<u8>,			inner: Vec<u8>,
	}			}

	impl HgPathBuf {			impl HgPathBuf {
	pub fn new() -> Self {			pub fn new() -> Self {
	Default::default()			Default::default()
	}			}
	pub fn push(&mut self, byte: u8) {
				pub fn push<T: ?Sized + AsRef<HgPath>>(&mut self, other: &T) -> () {
				if !self.inner.is_empty() && self.inner.last() != Some(&b'/') {
				self.inner.push(b'/');
				}
				self.inner.extend(other.as_ref().bytes())
				}

				pub fn push_byte(&mut self, byte: u8) {
	self.inner.push(byte);			self.inner.push(byte);
	}			}
	pub fn from_bytes(s: &[u8]) -> HgPathBuf {			pub fn from_bytes(s: &[u8]) -> HgPathBuf {
	HgPath::new(s).to_owned()			HgPath::new(s).to_owned()
	}			}
	pub fn into_vec(self) -> Vec<u8> {			pub fn into_vec(self) -> Vec<u8> {
	self.inner			self.inner
	}			}

This is an archive of the discontinued Mercurial Phabricator instance.

rhg: more efficient `HgPath::join`ClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 31043

rust/hg-core/src/filepatterns.rs

rust/hg-core/src/matchers.rs

rust/hg-core/src/utils/hg_path.rs

rhg: more efficient `HgPath::join`
ClosedPublic

Revision Contents
Changeset List