This is an archive of the discontinued Mercurial Phabricator instance.

split: new extension to split changesets
ClosedPublic

Authored by quark on Oct 14 2017, 5:15 PM.

Details

Summary

This diff introduces an experimental split extension to split changesets.

The implementation is largely inspired by Laurent Charignon's implementation
for mutable-history (changeset 9603aa1ecdfd54b0d86e262318a72e0a2ffeb6cc [1])

This version contains various improvements:

  • Rebase by default. This is more friendly for new users. Split won't lead to merge conflicts so a rebase won't give the user more trouble. This has been on by default at Facebook for months now and seems to be a good UX improvement. The rebase skips obsoleted or orphaned changesets, which can avoid issues like allowdivergence, merge conflicts, etc. This is more flexible because the user can decide what to do next (see the last test case in test-split.t)
  • Remove "Done split? [y/n]" prompt. That could be detected by checking repo.status() instead.
  • Works with obsstore disabled. Without obsstore, split uses strip to clean up old nodes, and it can even handle split a non-head changeset with "allowunstable" disabled, since it runs a rebase to solve the "unstable" issue in a same transaction.
  • More friendly editor text. Put what has been already split into the editor text so users won't lost track about where they are.

[1]: https://bitbucket.org/marmoute/mutable-history/commits/9603aa1ecdfd54b

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Oct 14 2017, 5:15 PM
quark edited the summary of this revision. (Show Details)Oct 14 2017, 5:17 PM
quark updated this revision to Diff 2757.Oct 14 2017, 5:21 PM
quark updated this revision to Diff 2758.
dlax added a subscriber: dlax.Oct 15 2017, 4:55 AM
dlax added inline comments.Oct 15 2017, 5:03 AM
hgext/split.py
54

Not sure "repetitively" exists. Maybe "repeatedly"?

quark updated this revision to Diff 2827.Oct 16 2017, 1:33 PM
quark added a comment.Oct 16 2017, 1:36 PM

I changed it to "repeatedly". The word "repetively" seems to exist but may have negative tone according to https://www.usingenglish.com/forum/threads/12782-Need-advice-on-the-difference-btw-Repeatedly-and-Repetitively-thanks-teachers!

martinvonz added inline comments.
tests/test-split.t
488–494

Leaving these two behind seems reasonable. It would also be reasonable to evolve/stabilize them. Either way, it's different from what "hg rebase" does. Do we eventually want them to behave the same? If so, are we okay with a small BC break here (either in split or in rebase)?

It will be great to have split in core, even if it's only as an experimental experiment for now.

I like the UX improvements, but could we add a config knob to disable the auto-rebase for power-users? I agree that generating orphans is maybe not the best UX for users, so I think having it on by default could be an improvement.

But, I often am in the middle of a too big stack and auto-rebasing would break my flow of fixing commits from bottom to top without mentioning the number of obs-markers it would generate.

quark added a comment.Oct 16 2017, 2:54 PM

It will be great to have split in core, even if it's only as an experimental experiment for now.
I like the UX improvements, but could we add a config knob to disable the auto-rebase for power-users? I agree that generating orphans is maybe not the best UX for users, so I think having it on by default could be an improvement.

I believe most users want auto-rebase by default. Auto-rebase is the default at FB for months and people like it.

I agree power users may want a different default. In that case, you can set alias split = split --no-rebase.

But, I often am in the middle of a too big stack and auto-rebasing would break my flow of fixing commits from bottom to top without mentioning the number of obs-markers it would generate.

tests/test-split.t
488–494

I don't understand why you think "it's different from what "hg rebase" does". The --rebase flag does not suggest what rebase source or destination it uses. So it's up to the split implementation to do a reasonable thing. It can do -r SMART_REVSET -d ... instead of -s SINGLE_REV -d .... I can revise the help text to clarify.

I think we wanted to implement "Option 2 (skip troublemakers)" as the default behavior in https://www.mercurial-scm.org/wiki/CEDRebase according to previous sprint discussion. That said, I don't think that should block this patch.

quark added inline comments.Oct 16 2017, 2:56 PM
tests/test-split.t
488–494

It would also be reasonable to evolve/stabilize them.

Only if we know that would not cause conflicts. Otherwise I don't think it's reasonable to do that automatically. This is also a difference between split's rebase and rebase command itself. The former cares about no-conflict experience and the latter does not care.

In D1082#18693, @quark wrote:

It will be great to have split in core, even if it's only as an experimental experiment for now.
I like the UX improvements, but could we add a config knob to disable the auto-rebase for power-users? I agree that generating orphans is maybe not the best UX for users, so I think having it on by default could be an improvement.

I believe most users want auto-rebase by default. Auto-rebase is the default at FB for months and people like it.

Yes agreed, sorry if I was not clear, I think it's a good behavior for most users.

I agree power users may want a different default. In that case, you can set alias split = split --no-rebase.

But, I often am in the middle of a too big stack and auto-rebasing would break my flow of fixing commits from bottom to top without mentioning the number of obs-markers it would generate.

I'm still concerned about possible obsmarkers explosion as we move to more and more auto-stabilizing commands in core. For leaving a trace of the problem, if we use auto-stabilizing commands on every changeset on the stack (rewrite the first changeset, all parent get stabilized, rewrite the second changeset, all parents get stabilized, etc..), we will end up creating N²/2 obsmarkers with a stack of N:

stack of 4 changeset: 8 obs-markers,
stack of 10 changesets: 50 obs-markers,
stack of 30 changesets: 450 obs-markers,

I'm not saying that everyone has a 30 changesets stack but even with a small stack going back and forth in the stack while making editions would rapidly grow the number of obsmarkers created which will cause scalability issues when exchanging markers.

There is a couple of interesting lead to solve this, maybe we could do something special when more than X (configurable) obs-markers are created. Probably avoid the stabilization and point to some documentation or commands (like restack).

quark added a comment.Oct 17 2017, 4:57 PM

I'm still concerned about possible obsmarkers explosion as we move to more and more auto-stabilizing commands in core. For leaving a trace of the problem, if we use auto-stabilizing commands on every changeset on the stack (rewrite the first changeset, all parent get stabilized, rewrite the second changeset, all parents get stabilized, etc..), we will end up creating N²/2 obsmarkers with a stack of N:

stack of 4 changeset: 8 obs-markers,
stack of 10 changesets: 50 obs-markers,
stack of 30 changesets: 450 obs-markers,

I'm not saying that everyone has a 30 changesets stack but even with a small stack going back and forth in the stack while making editions would rapidly grow the number of obsmarkers created which will cause scalability issues when exchanging markers.
There is a couple of interesting lead to solve this, maybe we could do something special when more than X (configurable) obs-markers are created. Probably avoid the stabilization and point to some documentation or commands (like restack).

In my opinion, the "obsmarkers explosion" problem is because the current algorithm loads the entire obsstore while it is not necessary to know all markers in all cases. i.e. "obsmarkers explosion" is NOT a problem if the algorithm is smarter that only parses or loads markers needed for certain calculations. i.e. obsolete() and all obsmarker-related revsets are lazy. Of course it takes some time to rewrite them into lazy versions.

In additional, the N^2 case you are talking about only happens if the user splits every commit in their draft stack, which is unrealistic. I don't think that should block the default behavior here.

I'd like to get this queued. Jun, can you rebase this and make sure tests still pass?

tests/test-split.t
488–494

I agree now. This seems like a good default.

quark updated this revision to Diff 4534.Dec 18 2017, 7:53 PM
martinvonz added inline comments.Dec 19 2017, 5:21 PM
hgext/split.py
91

s/ctx/rev/ seems a little clearer (I realize that ctx has a int method, but I didn't know that until I read this)

96

Do we want to exclude all troubled commits? Maybe phase- and content-divergent ones are better to not rebase too? (I.e. '%ld - obsolete() - troubled()')

99

maybe s/will/would/ ?

110

Do we really want to allow splitting when there's an unfinished merge? Or is that handled elsewhere?

quark added inline comments.Dec 19 2017, 5:48 PM
hgext/split.py
91

Good advice. This is probably copy-pasted from old code.

96

I think it's better to rebase content-divergent changesets here. Since that enables people to check the final result of the divergence and be able to make a final judgement about which one to pick or whether to merge the divergence.

I also think in the future it might make sense to rebase some unstable ones too as long as there is a fast way to test if doing that won't cause merge conflicts.

99

Will change.

110

Good catch. This is unintentional. Will remove.

quark updated this revision to Diff 4555.Dec 19 2017, 6:00 PM
This revision was automatically updated to reflect the committed changes.