Page MenuHomePhabricator

[PoC] obsolete: config option to enable local only obsolescence mode
Needs ReviewPublic

Authored by indygreg on Mar 4 2018, 7:42 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

Our path to enable obsolescence/evolve in core is to enable
creation of markers locally (no exchange) with user-facing behavior
that mimics existing behavior as closely as possible. Think of it
as evolve light.

We introduce a config option to control this behavior.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Mar 4 2018, 7:42 PM
pulkit added a subscriber: pulkit.Aug 1 2018, 6:24 PM

These patches were result of a very extensive discussion about what ways we have to start supporting obsmarkers by default. I will like to push these changesets so that we can try these in this cycle. If I don't hear any concern in a week, I will push them.

In D2679#62662, @pulkit wrote:

These patches were result of a very extensive discussion about what ways we have to start supporting obsmarkers by default. I will like to push these changesets so that we can try these in this cycle. If I don't hear any concern in a week, I will push them.

We made a series with Ryan during the sprint that had the advantage of not requiring to delete obs-markers. I will undust it, rebase it and send it here for making the discussion go forward.

pulkit added a comment.Aug 7 2018, 6:58 AM
In D2679#62662, @pulkit wrote:

These patches were result of a very extensive discussion about what ways we have to start supporting obsmarkers by default. I will like to push these changesets so that we can try these in this cycle. If I don't hear any concern in a week, I will push them.

We made a series with Ryan during the sprint that had the advantage of not requiring to delete obs-markers. I will undust it, rebase it and send it here for making the discussion go forward.

Oh nice! That will be great! Looking forward to your series.

@lothiraldan Gentle reminder for the series you were talking about.

Hey @lothiraldan, if not the patches itself, can you please explain the approach taken?
If I don't hear from you this week, I will push this series so that we can have enough time testing this in current cycle. Thanks!

Hi Pulkit,

Thanks for your patience. I had a couple of important things to get out of the way before I could turn my attention to this.

I reached out to Ryan and he sent me the code he wrote at the 4.6 Sprint. I made it available here to support the discussion: https://phab.mercurial-scm.org/D4479. The core differences in his approach are to filter out obsmarkers affecting the "re-pulled" nodes instead of "stripping" those markers. The main benefit is to avoid information loss. Stripping markers will break the evolution history, preventing user to benefit from it. Filtering markers out mean that the history will be fully restored if the affected changeset get obsolete again. It can be triggered by the same events.This is a pure implementation detail, having the same user-level effect than the behavior proposed in this Proof of Concept series.

Regarding the implementation of this series, I feel like the current detection code needs to move at a higher level. Having changegroup aware of obsmarkers at such a low level seems wrong and can lead to consistency issues if the bundle contains obsmarkers. An alternative approach would be to keep track of redundant incoming nodes during changegroup processing (on the transaction object as we are already doing for all new nodes), and process that list at the end of the transaction. We'll give a shot at such implementation, it is useful in other scenarios.

To take a step back, I'm wondering what's the end goal? I remember there was a discussion about having rebase enabled by default, is it related?

The behavior target by this series ("unobsolete" re-pulled changeset) conflicts with the final behavior we want for Changeset Evolution. Intermediate steps are a good way to make progress. I feel like it is important to write down a clear plan when it comes to adding behavior that does not match our final goals. How are we planning to transition from the local-only step to full (ie, distributed) Evolution?

To take a step back, I'm wondering what's the end goal? I remember there was a discussion about having rebase enabled by default, is it related?

Getting rebase (and maybe histedit?) enabled by default is my recollection of the rough goal.

The behavior target by this series ("unobsolete" re-pulled changeset) conflicts with the final behavior we want for Changeset Evolution. Intermediate steps are a good way to make progress. I feel like it is important to write down a clear plan when it comes to adding behavior that does not match our final goals. How are we planning to transition from the local-only step to full (ie, distributed) Evolution?

I'm slowly becoming convinced that the long-unquestioned axiom that "all markers are distributed globally" isn't correct, and this is part of why: it's potentially of great value to be able to restore a change by re-pulling it, even though the obsmarkers would normally cause it to be deleted. It's _super confusing_ when I hg import a patch and it seems to work but also immediately disappears, so I've got more sympathy for this PoC series than I do the theoretical purity of markers having any kind of globalness. Does that make sense?

To take a step back, I'm wondering what's the end goal? I remember there was a discussion about having rebase enabled by default, is it related?

Getting rebase (and maybe histedit?) enabled by default is my recollection of the rough goal.

The behavior target by this series ("unobsolete" re-pulled changeset) conflicts with the final behavior we want for Changeset Evolution. Intermediate steps are a good way to make progress. I feel like it is important to write down a clear plan when it comes to adding behavior that does not match our final goals. How are we planning to transition from the local-only step to full (ie, distributed) Evolution?

I'm slowly becoming convinced that the long-unquestioned axiom that "all markers are distributed globally" isn't correct, and this is part of why: it's potentially of great value to be able to restore a change by re-pulling it, even though the obsmarkers would normally cause it to be deleted.

Avoiding silent reappearance of the locally obsolete changeset that we see on a remote repository is a core feature of obsolescence. Actually, it is the one issue that prompted the creation of changeset evolution in the first place. We and the other people using distributed evolution rely on this behavior on a daily basis. The includes people who picked up the evolve extension on their own and came up with their own workflow without ever speaking to us. We can see some user interfaces adjustment in the future, but the set of synchronized markers and the associated behavior is something we are happy about. It has been well tested in diverse teams and situation for multiple years now.

It's _super confusing_ when I hg import a patch and it seems to work but also immediately disappears, so I've got more sympathy for this PoC series than I do the theoretical purity of markers having any kind of globalness. Does that make sense?

This seems like an unrelated user interface issues. We usually warn the user when their working copy becomes obsolete, pointing out to the newest version/evolution. We have to extend this messages logic to hg import to clarify the situation.

I'm slowly becoming convinced that the long-unquestioned axiom that "all markers are distributed globally" isn't correct, and this is part of why: it's potentially of great value to be able to restore a change by re-pulling it, even though the obsmarkers would normally cause it to be deleted.

Avoiding silent reappearance of the locally obsolete changeset that we see on a remote repository is a core feature of obsolescence. Actually, it is the one issue that prompted the creation of changeset evolution in the first place.

There's a difference (in my view) between something accidentally coming back and intentionally coming back. The current state of affairs prevents both. I feel like we can do better.

We and the other people using distributed evolution rely on this behavior on a daily basis. The includes people who picked up the evolve extension on their own and came up with their own workflow without ever speaking to us. We can see some user interfaces adjustment in the future, but the set of synchronized markers and the associated behavior is something we are happy about. It has been well tested in diverse teams and situation for multiple years now.

I've brought this up repeatedly and been dismissed with this exact argument every time. I weary of it, because it's not a new argument, it's just blind devotion to the initial design without any attempt at reflection if we can do better. I don't dispute that evolution as currently implemented is a good thing, but I feel like there's room for significant improvement while simultaneously simplifying the exchange algorithms. Care to try and convince me why I'm wrong, without anecdotal "many people use this and haven't complained" arguments?

It's _super confusing_ when I hg import a patch and it seems to work but also immediately disappears, so I've got more sympathy for this PoC series than I do the theoretical purity of markers having any kind of globalness. Does that make sense?

This seems like an unrelated user interface issues. We usually warn the user when their working copy becomes obsolete, pointing out to the newest version/evolution. We have to extend this messages logic to hg import to clarify the situation.

I disagree, perhaps unsurprisingly.

I'm slowly becoming convinced that the long-unquestioned axiom that "all markers are distributed globally" isn't correct, and this is part of why: it's potentially of great value to be able to restore a change by re-pulling it, even though the obsmarkers would normally cause it to be deleted.

Avoiding silent reappearance of the locally obsolete changeset that we see on a remote repository is a core feature of obsolescence. Actually, it is the one issue that prompted the creation of changeset evolution in the first place.

There's a difference (in my view) between something accidentally coming back and intentionally coming back. The current state of affairs prevents both. I feel like we can do better.

I feel like detecting if an obsolete changeset is re-added accidentally or intentionally is a near impossible problem. We have submitted code recently to detect when an obsolete changeset is re-added, but again we don't know if it is accidentally or intentionally. The safe path here is to inform the user about suspicious situations and give them the tool to explicitly restore changesets they want to be restored. That why we have been working on an explicit hg rewind command in evolve.

We and the other people using distributed evolution rely on this behavior on a daily basis. The includes people who picked up the evolve extension on their own and came up with their own workflow without ever speaking to us. We can see some user interfaces adjustment in the future, but the set of synchronized markers and the associated behavior is something we are happy about. It has been well tested in diverse teams and situation for multiple years now.

I've brought this up repeatedly and been dismissed with this exact argument every time. I weary of it, because it's not a new argument, it's just blind devotion to the initial design without any attempt at reflection if we can do better.

Since I joined Octobus, I've witnessed the exploration of all the remaining uncharted territories and solving of most of the resulting associated challenges. In particular, some important aspects of distributed evolution: instabilities resolutions and efficient obsmarkers exchanges have been solved and have working implementations used on a daily basis. We did alter the initial design when deemed so by new information from exploration or feedback of evolve users.
During these developments, we did not find reasons to challenge the obsolescence core concepts. They, in fact, proved a useful help to build solutions to the distributed evolution challenges. To name a few of these progress and design changes: vocabulary renaming, the missing commands for history exploration and reviving, the new discovery protocol and more recently how to track fold in obsmarkers.

In a similar way, the numbers and types of distributed evolution users have grown and we had more and more opportunity to exchange with them. We base our plans on more than "some people using it without complains". They are a diverse and large corpus of people we have been talking to and studied their workflows. Here again, core concepts, in particular, the global state and instabilities are things that, practically helps teams to use distributed evolution workflows and newcomers to get a good grasp of it. Important behavior designed and tested in the past 5 years, like the exchange behavior, have been battle tested and real people rely on them in their day to day workflow.

So, it might seem like the same argument, but it is actually a stronger one. The data to back it kept growing in the past couple of years.

If you need more details, check our recent blog post about evolution: https://octobus.net/blog/2018-09-03-evolve-status-2018-08.html

I don't dispute that evolution as currently implemented is a good thing, but I feel like there's room for significant improvement while simultaneously simplifying the exchange algorithms. Care to try and convince me why I'm wrong, without anecdotal "many people use this and haven't complained" arguments?

Sure, the proposal in this PoC ("unobsolete" all re-added changesets) break basic distributed evolution workflow very quickly. Here are small examples among many. They display simple draft changesets roundtrips:

  • Example A:
    • I am working on a repository with a changeset A that is present also on my default server
    • I amend A into A'
    • I pull from my default server, the server doesn't know yet that A has been rewritten into A'. A is sent by the server.
    • A is hidden before the pull. Getting A to be visible again would be confusing for me as I would now have both A and A' "alive" at the same time.
  • Example B:
    • I am working on a repository with a changeset A that is present also on my default server
    • I amend A into A'
    • One of my coworkers push a changeset B on top of A to the server
    • I pull from my default server
    • I receive both B and A, and I have all the needed information to stabilize B and continue my work. I don't want A to become "unobsolete" on pull.

We can provide other examples if needed.

Could you give some details about the improvements that you suggest and that what kind of core changes they would require and their effects on distributed evolution?

Here's a case I just ran into where I would have liked to de-obsolete a commit:

  1. Someone queues a patch and pushes to central repo. Repo looks like this:
o  B
|
o  A
  1. I fix a typo and and amend, but I forget to push.
  2. I build my own commit on top. My repo:
o  X
|
o  B'
|
o  A
  1. Someone pushes new patches to central repo. Central repo:
o  C
|
o  B
|
o  A
  1. I pull from central repo. My repo:
o  C
|
x B
|
| o  X
| |
| o  B'
|/
o  A
  1. I move my change onto the new upstream and prune my local B' that's now unwanted:
o X'
|
o  C
|
x B
|
| x B'
|/
o  A

If upstream history was really just a single commit (C above), then the obvious thing to do is to just evolve that commit and push it. However, sometimes there is a long chain on top, and in my case there was a separate commit upstream that fixed the typo I fixed when I amended B and it would be a little confusing to explain why someone's fix was "lost". So I'd prefer to mark B as no longer obsolete. I know I can do that with hg strip B', but that also strips X, which is probably not a big loss, but it's a weird side-effect. Perhaps all I'm asking for is a way of dropping the obsmarker between B and B'. I don't know what the best UI for that would be, though. For now, I guess I'll use hg strip.

Here's a case I just ran into where I would have liked to de-obsolete a commit:

  1. Someone queues a patch and pushes to central repo. Repo looks like this:

You are using the words "Someone queues a patch", is your example taken from the current Mercurial project workflow? If so, keep in mind that this workflow is a bit unique. It is not what we refer as "distributed evolution" and diverges from it in various ways. In particular, we can highlight the following difference:

  • Contributions are exchanged through email or Phabricator. Both ways fail to transfers the associated obsmarkers (and therefore break the evolution chains). Instead people using distributed evolution exchanges changesets through mercurial repositories, fully taking advantage of the obsmarkers data. This makes it possible for them to collaborate on the same feature branch and other evolution benefits (eg: evolution data in code review tools).
  • The publication workflow is odds as hundreds of unrelated draft changeset can stack above each other before being published. This is not really the use-case that draft phase and evolution targets.
o  B
|
o  A
  1. I fix a typo and and amend, but I forget to push.
  2. I build my own commit on top. My repo:
o  X
|
o  B'
|
o  A

Small side question: Why are you building X on top of (draft) B', are they related?

  1. Someone pushes new patches to central repo. Central repo:
o  C
|
o  B
|
o  A

Additional side question: Why is C on top of (draft) B. Are they related?

  1. I pull from central repo. My repo:
o  C
|
x B
|
| o  X
| |
| o  B'
|/
o  A
  1. I move my change onto the new upstream and prune my local B' that's now unwanted:
o X'
|
o  C
|
x B
|
| x B'
|/
o  A

If upstream history was really just a single commit (C above), then the obvious thing to do is to just evolve that commit and push it. However, sometimes there is a long chain on top, and in my case there was a separate commit upstream that fixed the typo I fixed when I amended B and it would be a little confusing to explain why someone's fix was "lost".

I can see two distinct arguments in that description:

  1. Someone else made a similar fix in an independent changeset, and you want to give them the priority.
  2. There are many drafts stacked above it and you don't want to rewrite B.

Regarding the first argument, the case is easily handled by "rewinding" your amend's change and restoring the diff provided by 'B'. The hg rewind command deal with this case well: hg rewind --to B. The use of hg prune in your example is wrong. By using prune, you signal that the full line of change in B' (B' and it predecessors B) is something that is unwanted. So you are not just dropping the change introduced in your amend, but the other associated change.

Regarding the second argument, the one I interpret as it would create too many orphans.
It seems to be the consequence of the current Mercurial Project workflow where many unrelated drafts get stacked.
The point of the draft phase is to signal that a changeset is safe to be rewritten. In other words that the consequences of such rewriting are acceptable. If you get to the point where rewriting draft no longer feel safe you seem to have a workflow issue.

For the sake of the current argument, let's assume that the workflow would be fixed in a direction where B would be public. After your pull. In that case, B' would be "phase-divergent". Solving that phase-divergence will create a ΔB changeset with just the fix (you amend change). Rebasing that ΔB on top of the other fix will make it disappears, giving priority as you intended.

So I'd prefer to mark B as no longer obsolete. I know I can do that with hg strip B', but that also strips X, which is probably not a big loss, but it's a weird side-effect.

You cannot really strip the obsmarkers between B and B' without stripping B' itself. If B' end up being visible again it's Evolve that would be lost introducing confusion too.

Perhaps all I'm asking for is a way of dropping the obsmarker between B and B'. I don't know what the best UI for that would be, though. For now, I guess I'll use hg strip.

Stripping obsmarkers (and changesets) gets tricky quickly as their might continue to exist elsewhere. issue5267 is a good example. There is already a couple of ways to do it (hg strip or hg debugobsolete) but it is currently confined into advanced commands. Before smoothing special case that could benefit stripping (or equivalent feature), it is important to deal with the UI for the main use-case. The use-case that solves both local and distributed evolution with the same command set. We want this commands set to comes first as we know it will be necessary. We can add more advanced commands and concept to deal with stripping afterward.

Let me clarify that last part: imagine two commands (or command set, or concepts):

  1. Command-A handles cases for both local and distributed evolution,
  2. Command-B only handles local evolution, but deal with a corner case a bit later.

If we introduce Command-A early, we can foster a good user experience that will be the same for local and distributed case. This makes the "step" to distributed usage simple and painless. The overall user experience and other commands will make sure they work smoothly with Command-A. Later, we can introduce Command-B that will deal with specialized cases were Command-A is lacking. Since Command-A is already covering the main use-cases well, Command-B can focus on dealing with the specialized cases only. Each command has clear responsibilities.

On the other hand, introducing Command-B first will create a user experience that does not cover all of the main use-cases. The later arrival of Command-A will introduce an overlap in responsibilities confusing to users. In addition, it will create a problematic step to move from the local experience to the distributed one. Users will have to "forget" their previous usage to be able to get the better of Mercurial, this will create more confusion.