This is an archive of the discontinued Mercurial Phabricator instance.

Differential D12426

rust-changelog: don't skip empty lines when iterating over changeset lines
ClosedPublic

Authored by martinvonz on Apr 1 2022, 2:09 AM.

Download Raw Diff

Details

Reviewers

Alphare

Group Reviewers

hg-reviewers

Commits

rHGfb82b5cb8301: rust-changelog: don't skip empty lines when iterating over changeset lines

Summary

The first empty line in the changeset indicates the end of headers and
beginning of description. Callers can't know figure out where that
position is if empty lines are skipped.

Diff Detail

Repository

rHG Mercurial

Branch

default

Lint

No Linters Available

Unit

No Unit Test Coverage

Event Timeline

martinvonz created this revision.Apr 1 2022, 2:09 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptApr 1 2022, 2:09 AM

Herald added a subscriber: mercurial-patches. · View Herald Transcript

martinvonz added inline comments.Apr 4 2022, 7:08 PM

rust/hg-core/src/revlog/changelog.rs
48	By the way, how do you want to extend this type in the future? For example, we could do a coarse splitting into header lines, files list, and description in the constructor, and then we can do more detailed parsing of each header line lazily.

martinvonz added a child revision: D12438: rust-changelog: remove special parsing of empty changelog data for null rev.Apr 5 2022, 2:11 PM

Alphare requested changes to this revision.Apr 6 2022, 6:54 AM

Alphare added a subscriber: Alphare.

Alphare added inline comments.

rust/hg-core/src/revlog/changelog.rs
48	We could lazily fill an `offsets` fixed-sized array and have functions that return the slice to header lines, files list, and description. This would be the lower overhead (I think) while still giving a nice API.
60	Please return `Err(HgError::corrupted("changelog does not contain manifest node"))` instead of unwrapping

This revision now requires changes to proceed.Apr 6 2022, 6:54 AM

martinvonz marked an inline comment as done.Apr 6 2022, 11:46 AM

martinvonz requested review of this revision.

martinvonz added inline comments.

rust/hg-core/src/revlog/changelog.rs
60	That can't happen AFAIK, since I removed the filtering-out of empty lines in `lines()` above. If the input is empty, the iterator will yield a single empty line.

martinvonz retitled this revision from rhg: don't skip empty lines when iterating over changeset lines to rust-changelog: don't skip empty lines when iterating over changeset lines.Apr 9 2022, 1:35 AM

Alphare added inline comments.Apr 11 2022, 6:09 AM

rust/hg-core/src/revlog/changelog.rs
60	Think of it as a defensive solution to the code evolving. I dislike `unwraps` outside of tests since they don't convey the intent of why they're valid at the moment they're written. All unwraps should be `except`s IMO (maybe that could become a lint for this codebase). My suggestion of using `Result` is simply because we already have the signature for it, it would turn the corruption into a nicer experience for the user, and would allow us to maybe build `rhg verify` more easily than with a panicking parser. What do you think?

martinvonz marked 2 inline comments as done.Apr 11 2022, 11:26 AM

martinvonz added inline comments.

rust/hg-core/src/revlog/changelog.rs
60	I like with `expect()` as a way of documenting why we think `unwrap()` would have been safe. I'll change to that. I don't like returning an error here because that would only be returned if we had a bug here (such as filtering out empty lines in `lines()`), so the error might be misleading.

martinvonz marked an inline comment as done.Apr 11 2022, 12:07 PM

martinvonz updated this revision to Diff 33001.

Alphare accepted this revision.Apr 12 2022, 4:06 AM

This revision is now accepted and ready to land.Apr 12 2022, 4:06 AM

martinvonz added a commit: rHGfb82b5cb8301: rust-changelog: don't skip empty lines when iterating over changeset lines.Apr 13 2022, 5:18 AM

Closed by commit rHGfb82b5cb8301: rust-changelog: don't skip empty lines when iterating over changeset lines (authored by martinvonz). · Explain Why

This revision was automatically updated to reflect the committed changes.

martinvonz updated this revision to Diff 33149.Apr 13 2022, 9:10 AM

Revision Contents
Changeset List

			Path	Packages
M			rust/hg-core/src/revlog/changelog.rs (12 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	32740		Apr 1 2022, 2:09 AM	★	★
Diff 2	33001		Apr 11 2022, 12:07 PM	★	★
Diff 3	33142	rHGfb82b5cb8301e48eb0241a0932699a122a2e7907	Apr 1 2022, 1:06 AM	★	★
Diff 4	33149		Apr 13 2022, 9:10 AM	★	★

Commit	Parents	Author	Summary	Date
67034616303e	61d7cf024302	Martin von Zweigbergk		Apr 1 2022, 1:06 AM

Status	Author	Revision
Closed	martinvonz	D12439 rust-changelog: start parsing changeset data
Closed	martinvonz	D12438 rust-changelog: remove special parsing of empty changelog data for null rev
Closed	martinvonz	D12426 rust-changelog: don't skip empty lines when iterating over changeset lines
Closed	martinvonz	D12425 rust-requirements: allow loading repos with `bookmarksinstore` requirement

Diff 32740

rust/hg-core/src/revlog/changelog.rs

	self.revlog.node_from_rev(rev)			self.revlog.node_from_rev(rev)
	}			}
	}			}

	/// `Changelog` entry which knows how to interpret the `changelog` data bytes.			/// `Changelog` entry which knows how to interpret the `changelog` data bytes.
	#[derive(Debug)]			#[derive(Debug)]
	pub struct ChangelogRevisionData {			pub struct ChangelogRevisionData {
	/// The data bytes of the `changelog` entry.			/// The data bytes of the `changelog` entry.
	bytes: Vec<u8>,			bytes: Vec<u8>,
				martinvonzAuthorUnsubmitted Done By the way, how do you want to extend this type in the future? For example, we could do a coarse splitting into header lines, files list, and description in the constructor, and then we can do more detailed parsing of each header line lazily. martinvonz: By the way, how do you want to extend this type in the future? For example, we could do a…
				AlphareUnsubmitted Done We could lazily fill an `offsets` fixed-sized array and have functions that return the slice to header lines, files list, and description. This would be the lower overhead (I think) while still giving a nice API. Alphare: We could lazily fill an `offsets` fixed-sized array and have functions that return the slice to…
	}			}

	impl ChangelogRevisionData {			impl ChangelogRevisionData {
	/// Return an iterator over the lines of the entry.			/// Return an iterator over the lines of the entry.
	pub fn lines(&self) -> impl Iterator<Item = &[u8]> {			pub fn lines(&self) -> impl Iterator<Item = &[u8]> {
	self.bytes			self.bytes.split(\|b\| b == &b'\n')
	.split(\|b\| b == &b'\n')
	.filter(\|line\| !line.is_empty())
	}			}

	/// Return the node id of the `manifest` referenced by this `changelog`			/// Return the node id of the `manifest` referenced by this `changelog`
	/// entry.			/// entry.
	pub fn manifest_node(&self) -> Result<Node, HgError> {			pub fn manifest_node(&self) -> Result<Node, HgError> {
	match self.lines().next() {			let manifest_node_hex = self.lines().next().unwrap();
				AlphareUnsubmitted Done Please return `Err(HgError::corrupted("changelog does not contain manifest node"))` instead of unwrapping Alphare: Please return `Err(HgError::corrupted("changelog does not contain manifest node"))` instead of…
				martinvonzAuthorUnsubmitted Done That can't happen AFAIK, since I removed the filtering-out of empty lines in `lines()` above. If the input is empty, the iterator will yield a single empty line. martinvonz: That can't happen AFAIK, since I removed the filtering-out of empty lines in `lines()` above.
				AlphareUnsubmitted Done Think of it as a defensive solution to the code evolving. I dislike `unwraps` outside of tests since they don't convey the intent of why they're valid at the moment they're written. All unwraps should be `except`s IMO (maybe that could become a lint for this codebase). My suggestion of using `Result` is simply because we already have the signature for it, it would turn the corruption into a nicer experience for the user, and would allow us to maybe build `rhg verify` more easily than with a panicking parser. What do you think? Alphare: Think of it as a defensive solution to the code evolving. I dislike `unwraps` outside of tests…
				martinvonzAuthorUnsubmitted Done I like with `expect()` as a way of documenting why we think `unwrap()` would have been safe. I'll change to that. I don't like returning an error here because that would only be returned if we had a bug here (such as filtering out empty lines in `lines()`), so the error might be misleading. martinvonz: I like with `expect()` as a way of documenting why we think `unwrap()` would have been safe.
	None => Ok(NULL_NODE),			if manifest_node_hex.is_empty() {
	Some(x) => Node::from_hex_for_repo(x),			Ok(NULL_NODE)
				} else {
				Node::from_hex_for_repo(manifest_node_hex)
	}			}
	}			}
	}			}