( )⚙ D11679 rhg: internally, return a structured representation from hg cat

This is an archive of the discontinued Mercurial Phabricator instance.

rhg: internally, return a structured representation from hg cat
ClosedPublic

Authored by aalekseyev on Oct 15 2021, 9:21 AM.

Details

Summary

The purpose of this change is to make it possible to support limited templating in hg cat, so we could print separators between files etc.
The templating itself is not implemented yet, so this functionality is unused in rhg cat.
However, in our fork of hg we're implementing a slightly different command hg jscat which makes use of this.
So accepting this change will let us minimize the size of the patch we're maintaining on our side.

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

aalekseyev created this revision.Oct 15 2021, 9:21 AM
aalekseyev retitled this revision from rhg: simplify the type of FilelogEntry to rhg: internally, return a structured representatioon from hg cat.Oct 15 2021, 9:27 AM
aalekseyev edited the summary of this revision. (Show Details)
aalekseyev retitled this revision from rhg: internally, return a structured representatioon from hg cat to rhg: simplify the type of FilelogEntry.Oct 15 2021, 9:30 AM
aalekseyev retitled this revision from rhg: simplify the type of FilelogEntry to rhg: add FilelogEntry.into_data.
aalekseyev retitled this revision from rhg: add FilelogEntry.into_data to rhg: simplify the type of FilelogEntry.
aalekseyev edited the summary of this revision. (Show Details)
aalekseyev updated this revision to Diff 30835.
aalekseyev updated this revision to Diff 30837.
aalekseyev retitled this revision from rhg: simplify the type of FilelogEntry to rhg: internally, return a structured representatioon from hg cat.Oct 15 2021, 9:33 AM
aalekseyev edited the summary of this revision. (Show Details)

The change looks like a good idea, as you said it'll be useful for templating later anyway.

However I wonder if the fragmented allocations have a measurable impact on performance for a case with a large output (in both number of files and number of lines). Do you have a way of testing this easily?

rust/hg-core/src/operations/cat.rs
59

Could you possibly send a follow-up that uses more explicit names for the lifetime variables (and possibly M for manifest and F for files)?

However I wonder if the fragmented allocations have a measurable impact on performance for a case with a large output (in both number of files and number of lines). Do you have a way of testing this easily?

I've ran the same benchmarks I used to measure the impact of https://phab.mercurial-scm.org/D11616 and saw no noticeable difference. (the difference was on the order of 1ms, and not always in the same direction)

By the way, we only introduce extra fragmented allocations in the case when files have metadata.
If the file has no metadata, then we're just keeping around buffers created when reading the file, so I expect we're actually making one fewer copy of the data.
I don't know (and didn't measure) which is the common case.

Alphare accepted this revision.Oct 18 2021, 6:08 AM

Sounds good to me!

This revision is now accepted and ready to land.Oct 18 2021, 6:08 AM
Alphare requested changes to this revision.Oct 18 2021, 6:09 AM

This does not apply on top of default sorry, please send a rebase.

This revision now requires changes to proceed.Oct 18 2021, 6:09 AM
aalekseyev added inline comments.Oct 18 2021, 6:10 AM
rust/hg-core/src/operations/cat.rs
59

I can't think of very meaningful names for the lifetime variables. Is there a convention I should follow? (simply 'm and 'f, perhaps?)
(possibly the name 'a can simply be omitted here, actually)

Alphare added inline comments.Oct 18 2021, 6:12 AM
rust/hg-core/src/operations/cat.rs
59

How about 'manifest and 'files?

One-letter variables are fine for trivial cases, but for more complex cases they should be more explicit. We tend to be more lazy with lifetime naming that regular variables for some reason, myself included.

Oh also, (sorry I forgot to say anything) the commit message has a typo in "representation".

aalekseyev retitled this revision from rhg: internally, return a structured representatioon from hg cat to rhg: simplify the type of FilelogEntry.Oct 18 2021, 6:19 AM
aalekseyev edited the summary of this revision. (Show Details)
aalekseyev updated this revision to Diff 30860.
aalekseyev edited the summary of this revision. (Show Details)Oct 18 2021, 6:24 AM
aalekseyev updated this revision to Diff 30866.
aalekseyev updated this revision to Diff 30867.Oct 18 2021, 6:28 AM

Oh also, (sorry I forgot to say anything) the commit message has a typo in "representation".

Thanks, amended.

aalekseyev added inline comments.Oct 18 2021, 6:30 AM
rust/hg-core/src/operations/cat.rs
59

I renamed all one-letter variables here. What do you think? (apparently the 'a aka 'm' aka 'manifest` can't be omitted)
I also renamed "files" into "query" because I think that makes its role clearer.

This does not apply on top of default sorry, please send a rebase.

Rebased.

aalekseyev retitled this revision from rhg: simplify the type of FilelogEntry to rhg: internally, return a structured representation from hg cat.Oct 18 2021, 6:35 AM
aalekseyev edited the summary of this revision. (Show Details)
Alphare accepted this revision.Oct 18 2021, 7:40 AM
Alphare added inline comments.
rust/hg-core/src/operations/cat.rs
59

Yep, that is because there are two inputs and two outputs, so no way of automatically knowing how the lifetimes are supposed to be bound.

That looks good, query is better, thanks.

This revision is now accepted and ready to land.Oct 18 2021, 7:40 AM
spectral added inline comments.
rust/hg-core/src/operations/cat.rs
22

Does this still need to specify/require manifest ordering if we have the path as well?

rust/hg-core/src/revlog/filelog.rs
58

I'm not very familiar with Rust, can you explain why this separate function was necessary/useful? Wouldn't it have been possible to only add into_data without modifying split like this:

/// Consume the entry, and convert it into data, discarding any metadata,
/// if present.
pub fn into_data(self) -> Result<Vec<u8>, HgError> {
    let (metadata, data) = self.split()?;
    Ok(match metadata {
        Some(_) => data.to_owned(),
        None => self.0
    })
}

@aalekseyev could you follow-up on the new comments?

rust/hg-core/src/operations/cat.rs
22

I guess not, there shouldn't be any code making an assumption on ordering.

rust/hg-core/src/revlog/filelog.rs
58

I was expecting @aalekseyev to have a plan with this private function in following patches, but I agree with you that this is more code than it needs to be.

aalekseyev added inline comments.Oct 19 2021, 5:38 AM
rust/hg-core/src/operations/cat.rs
22

The order matters in so far as it shows up in the output of the hg cat. No other reason. (as there wasn't before this change)

rust/hg-core/src/revlog/filelog.rs
58

@spectral, thanks, that's simpler indeed. I just didn't think of that.
It did not occur to me to add a post-condition to split.
I'll make a follow-up change.

Alphare added inline comments.Oct 19 2021, 5:39 AM
rust/hg-core/src/operations/cat.rs
22

Let's keep the comment since it gives more information.