This is an archive of the discontinued Mercurial Phabricator instance.

contrib: add a partial-merge tool for sorted lists (such as Python imports)
ClosedPublic

Authored by martinvonz on Mar 15 2022, 6:51 PM.

Details

Summary

This is a pretty naive tool that uses a regular expression for
matching lines. It is based on a Google-internal tool that worked in a
similar way.

For now, the regular expression is hard-coded to attempt to match
single-line Python imports. The only commit I've found in the hg core
repo where the tool helped was commit 9cd6292abfdf. I think that's
because we often use multiple imports per import statement. I think
this tool is still a decent first step (especially once the regex is
made configurable in the next patch). The merging should ideally use a
proper Python parser and do the merge at the AST (or CST?) level, but
that's significantly harder, especially if you want to preserve
comments and whitespace. It's also less generic.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

martinvonz created this revision.Mar 15 2022, 6:51 PM
martinvonz updated this revision to Diff 32651.Mar 16 2022, 1:45 AM
martinvonz retitled this revision from contrib: add a partial-merge tool for Python imports [WIP] to contrib: add a partial-merge tool for sorted lists (such as Python imports).Tue, Mar 22, 12:58 AM
martinvonz edited the summary of this revision. (Show Details)
martinvonz updated this revision to Diff 32679.
Alphare accepted this revision as: Alphare.Tue, Mar 22, 5:42 AM
Alphare added a subscriber: Alphare.

Accepting this as myself to give you and opportunity to answer my suggestions before taking this. This is a contrib patch, so there is no reason that any of these should be blockers.

contrib/merge-lists/Cargo.toml
10

It would be pretty low effort to still support lower versions of Rust, but going to 1.59 allows the use of edition 2021 and makes sense for a contrib file anyway, so I understand the appeal of a newer version.

contrib/merge-lists/src/main.rs
148

Suggestion: with clap 3 you can use struct-based arguments. I find the declarative approach clearer and less prone to bugs.

154
167

Maybe only write the files that have indeed changed?

190

Why do we want the wrapping behavior here?

martinvonz marked 4 inline comments as done.Tue, Mar 22, 12:12 PM
martinvonz updated this revision to Diff 32692.
martinvonz added inline comments.Tue, Mar 22, 12:13 PM
contrib/merge-lists/Cargo.toml
10

Agreed, it would be pretty low effort, but yes, I figured it would be fine in contrib to not worry about it. Sounds like you're not really objecting either, so I'll leave it as is.

contrib/merge-lists/src/main.rs
148

Good idea! (I should do that upgrade in my hobby project too.)

167

Heh, that's both clearer and more efficient. Silly me.

190

It adds an isize (offset) to a usize. I did that by converting the isize to usize and then doing a wrapping add. Happy to hear alternatives.

Alphare added inline comments.Wed, Mar 23, 4:41 AM
contrib/merge-lists/src/main.rs
190

If the offset isize is positive, there is a chance of overflow. It's unlikely given what we're manipulating and the resulting if will be a bit uglier, but better safe than sorry.

martinvonz added inline comments.Wed, Mar 23, 2:24 PM
contrib/merge-lists/src/main.rs
190

The point of the wrapping_add() was to handle negative offsets. Do you mean that you'd prefer if we do something like this:

let offset = self.offsets[side];
if offset < 0 {
  self.base_range.start.checked_sub(-offset as usize)
} else {
  self.base_range.start.checked_add(offset as usize)  
}
martinvonz added inline comments.Thu, Mar 24, 11:11 AM
contrib/merge-lists/src/main.rs
156

I realized that clap will base the index on the declared order in the struct if you don't specify it, so I've stopped specifying them.

Alphare added inline comments.Fri, Mar 25, 5:32 AM
contrib/merge-lists/src/main.rs
156

That looks good!

190

Yes, that's what I'm referring to (without the - sign in checked_sub, of course). The justification comes from the - maybe too paranoid - risk of overflow if both are positive.

martinvonz added inline comments.Fri, Mar 25, 8:23 AM
contrib/merge-lists/src/main.rs
190

The - should be there. Let's say the base range starts at 10 and the offset is -5, then you'll want to calculate 10 - 5 here. I could use checked_abs() if you're worried and it being usize::MIN.

The overflow should only happen if there are bugs. The offset comes from line 260.

By the way, I need to add .unwrap() calls too; checking_*() return Option<>.

martinvonz marked an inline comment as done.Fri, Mar 25, 12:04 PM
This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.