( )⚙ D2096 infinitepush: move the extension to core from fb-hgext

This is an archive of the discontinued Mercurial Phabricator instance.

infinitepush: move the extension to core from fb-hgext
ClosedPublic

Authored by pulkit on Feb 9 2018, 6:45 AM.

Details

Summary

This patch moves the infinitepush extension from fb-hgext to core. The
extension is used to store incoming bundles during a push in bundlestore rather
than applying them to the revlog.

The extension was copied from the repository revision at
f27f094e91553d3cae5167c0b1c42ae940f888d5 and following changes were made:

  • added from __future__ import absolute_import where missing
  • fixed module imports to follow the core style
  • minor fixes for test-check-code.t
  • registered the configs
  • adding the testedwith value to match core's convention
  • removed double newlines to make test-check-commit.t happy
  • added one line doc about extension and marked it as experimental

Only one test file test-infinitepush-bundlestore.t is moved to core and
following changes are made to file:

  • remove dependency of library.sh
  • split the tests into two tests i.e. test-infinitepush.t and test-infinitepush-bundlestore.t
  • removed testing related to other facebook's extensions pushrebase, inhibit, fbamend

library-infinitepush.sh is also copied from fb-hgext from the same revision and
following changes are made:

  • change the path to infinitepush extension as it's in core with this patch
  • removed sql handling from the file as we are not testing that initially

Currently at this revision, test-check-module-imports.t does not pass as there
is import of a module from fb/hgext in one the of the file which will be removed
in the next patch.

This extension right now has a lot of things which we don't require in core like
--to, --create flags to hg bookmark, logic related to remotenames
extension and another facebook's extensions, custom bundle2parts which can be
prevented by using bookmarks bundle part and also logic related to sql store
which is probably we don't want initially.

The next patches in this series will remove all the unwanted and unrequired
things from the extension and will make this a nice one.

The end goal is to have a very lighweight extension with no or very less
wrapping on the client side.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

pulkit created this revision.Feb 9 2018, 6:45 AM

I can take review of this series since this feature is of extreme interest to Mozilla. I might get around to looking at it this weekend. But no promises.

I can take review of this series since this feature is of extreme interest to Mozilla. I might get around to looking at it this weekend. But no promises.

Do you have thoughts on which functionality from the following you want in core?

  • the --bundle-store flag to push command
  • functionality to pull from bundlestore using hg pull
  • functionality to pull changesets from bundlestore if a changeset is not found locally on hg update
  • logic around sql store
  • interaction with the hoisting functionality of remotenames extension which is also being moved to core
In D2096#35153, @pulkit wrote:

I can take review of this series since this feature is of extreme interest to Mozilla. I might get around to looking at it this weekend. But no promises.

Do you have thoughts on which functionality from the following you want in core?

  • the --bundle-store flag to push command
  • functionality to pull from bundlestore using hg pull
  • functionality to pull changesets from bundlestore if a changeset is not found locally on hg update
  • logic around sql store
  • interaction with the hoisting functionality of remotenames extension which is also being moved to core

I haven't looked at the full code in detail yet. But I think we should take a step back and think about what are the fundamental features/workflows we're trying to accomplish with infinitepush.

I see the following high-level potential use cases for infinitepush:

  1. A hosting/repo management alternative to forks
  2. A central repository for submitting changesets to a service (e.g. code review, a "try" repo to trigger CI, etc)
  3. As a backup mechanism

Let's elaborate.

As I wrote under the Forks aren't the Model You are Looking For section at https://gregoryszorc.com/blog/2017/12/11/high-level-problems-with-git-and-how-to-fix-them/, using clones / separate repos to manage forks really isn't a great model. I think you want something far less heavyweight where derivations from the main repo are hung off that main repo somehow. I think you want a repo-like primitive - let's call it a workspace - that is attached to the main repo. Conceptually, imagine that every push goes to a central store. But instead of exposing every changeset with a traditional repository, we instead have different views of that data. A view in Mercurial technical terms would be defined by a set of heads, bookmarks, phases, obsolescence markers, etc. Although the rules for what data is in that set isn't clear: I think it is valid to have a view that is independent of the main repo as well as a view that is additive to the main repo (that would allow your fork to automatically track changes to the main repo).

I think the Mercurial server should grow the ability to host workspaces instead of forks. What this looks like and what role infinitepush plays, I'm not sure. But I do know infinitepush dabbles in this space, so we need to consider it.

Another use case is a central repository to push/upload to in order to trigger events. For example, you may want to have a central server that receives pushes to trigger code review, trigger a test run of CI, run static analysis, etc. This is different than the workspaces use case because the data being pushed is more ephemeral and ad-hoc. There is a difference between pushing something to your workspace so you can save and share it versus pushing something to a service to trigger an action on it. To me, this use case sounds best served by server functionality that stashes standalone bundles somewhere and provides access to individual revisions on demand.

Our third use case is backups. If a changeset is produced, it gets uploaded/saved somewhere. This is actually a hybrid between the above 2 items. It initially looks a lot like the central repository storing bundles that can accessed on demand. But if you think of each client using the backup service as an individual entity, then modeling each client as having a workspace starts to make sense as well. What if each client's state were automatically backed up / kept in sync with a dedicated workspace on the server?

I think we need to decide which of these use cases we're targeting and what is the pathway (if any) for achieving the remaining use cases in the future. I suspect there is potential to integrate remotenames into our vision here. We definitely need to think about how we'd address these offshoot stores. Infinitepush currently uses bookmarks. I'm not sure if we want to be forcing people to use bookmarks or if bookmarks are the best way to route pushes/pulls to specific areas of the repository.

My hunch is that teaching the server to accept incoming pushes and store them in standalone bundles (instead of as part of the main repo store) is a much easier problem to solve than providing workspaces. I don't think it requires nearly as many changes and will introduce few - if any - user-facing changes. But the feature isn't as valuable as persisted workspaces would be.

@pulkit: I know you have been doing a lot of work with remotenames. I'd be very curious about your thoughts on how you'd integrate workspaces, infinitepush, and remotenames. You've also looked at this infinitepush code. So I'm curious about your general thoughts on what you think we should include in core and what our plan should be for adding potentially missing features that I've outlined above.

I see the following high-level potential use cases for infinitepush:

  1. A hosting/repo management alternative to forks
  2. A central repository for submitting changesets to a service (e.g. code review, a "try" repo to trigger CI, etc)
  3. As a backup mechanism

I think infinitepush can serve as a very good base for solving these problems.

Let's elaborate.
As I wrote under the Forks aren't the Model You are Looking For section at https://gregoryszorc.com/blog/2017/12/11/high-level-problems-with-git-and-how-to-fix-them/, using clones / separate repos to manage forks really isn't a great model. I think you want something far less heavyweight where derivations from the main repo are hung off that main repo somehow. I think you want a repo-like primitive - let's call it a workspace - that is attached to the main repo. Conceptually, imagine that every push goes to a central store. But instead of exposing every changeset with a traditional repository, we instead have different views of that data. A view in Mercurial technical terms would be defined by a set of heads, bookmarks, phases, obsolescence markers, etc. Although the rules for what data is in that set isn't clear: I think it is valid to have a view that is independent of the main repo as well as a view that is additive to the main repo (that would allow your fork to automatically track changes to the main repo).
I think the Mercurial server should grow the ability to host workspaces instead of forks. What this looks like and what role infinitepush plays, I'm not sure. But I do know infinitepush dabbles in this space, so we need to consider it.

That sounds like a very great idea. If we introduce user namespace in infinitpush bundlestore, we can have per user store, where bundles of that user are stored. Then we can introduce a new namespace like remote workspace which can be used to push to that workspace. For example:

pulkit$ hg push      # should push to my workspace
pulkit$ hg pull        # should pull from my workspace
pulkit$ hg pull --workspace indygreg      # should pull form your workspace
pulkit$ hg push --workspace indygreg    # should raise an error initially

This will make all the pushes default to bundlestore. Since this thing sounds like a big BC, we can have all this as a part of extension.

Another use case is a central repository to push/upload to in order to trigger events. For example, you may want to have a central server that receives pushes to trigger code review, trigger a test run of CI, run static analysis, etc. This is different than the workspaces use case because the data being pushed is more ephemeral and ad-hoc. There is a difference between pushing something to your workspace so you can save and share it versus pushing something to a service to trigger an action on it. To me, this use case sounds best served by server functionality that stashes standalone bundles somewhere and provides access to individual revisions on demand.

I think this can be done using a server which stores all the incoming pushes in the bundlestore unless specified otherwise. The current infinitepush extension can handle that well.

Our third use case is backups. If a changeset is produced, it gets uploaded/saved somewhere. This is actually a hybrid between the above 2 items. It initially looks a lot like the central repository storing bundles that can accessed on demand. But if you think of each client using the backup service as an individual entity, then modeling each client as having a workspace starts to make sense as well. What if each client's state were automatically backed up / kept in sync with a dedicated workspace on the server?

I deleted the backup commands, once the initial series get in, will add that back with appending 'debug' in front of command names.

I think we need to decide which of these use cases we're targeting and what is the pathway (if any) for achieving the remaining use cases in the future. I suspect there is potential to integrate remotenames into our vision here. We definitely need to think about how we'd address these offshoot stores. Infinitepush currently uses bookmarks. I'm not sure if we want to be forcing people to use bookmarks or if bookmarks are the best way to route pushes/pulls to specific areas of the repository.
@pulkit: I know you have been doing a lot of work with remotenames. I'd be very curious about your thoughts on how you'd integrate workspaces, infinitepush, and remotenames. You've also looked at this infinitepush code. So I'm curious about your general thoughts on what you think we should include in core and what our plan should be for adding potentially missing features that I've outlined above.

We need to have a well designed way on how we can differentiate between an infinitepush and normal push. I am not fan of current pattern matching of bookmark which is pushed. We need to have a generic way independent of bookmark, remotenames etc. The --bundle-store flag looks good candidate for that.

I think the plan should be something like:

  • add capability to have a server which forwards every push to bundlestore and does not require any client side extension.
  • define a good way on how to differentiate between a normal push and infinitepush on server without client side extension
  • functionality to mark an outgoing push as infinitepush on the client side -> this is currently done by the extension but there is scope of improvements
  • add a user namespace in the bundlestore

These steps will serve as good base for the workspace ideas.

I'll try to look at this tomorrow. (I was on a mini vacation last week and didn't spend much time looking at a computer.)

indygreg requested changes to this revision.Mar 6 2018, 8:08 PM

Per discussion at sprint, Pulkit will follow up by deleting yet more code around workspaces and scratch branches. We're going to focus on the "try server" use case for the initial landing.

We will land the full code initially. So if we want to land what we have now and delete code later (rather than waiting for all the commits before landing anything), I'm OK with that.

This revision now requires changes to proceed.Mar 6 2018, 8:08 PM
pulkit requested review of this revision.Mar 27 2018, 8:42 AM

Per discussion at sprint, Pulkit will follow up by deleting yet more code around workspaces and scratch branches. We're going to focus on the "try server" use case for the initial landing.
We will land the full code initially. So if we want to land what we have now and delete code later (rather than waiting for all the commits before landing anything), I'm OK with that.

The last commit of this 20 patch series get us to the state where we can easily use it for "try server" use case with no client side wrapping required.

indygreg added inline comments.Mar 30 2018, 3:01 PM
hgext/infinitepush/__init__.py
259

This will need updated to hoistedpeer.

indygreg added inline comments.Mar 30 2018, 3:03 PM
hgext/infinitepush/__init__.py
137

This deleted module isn't dropped later in the series.

267

Default value can be dropped.

277

Default value can be dropped.

indygreg accepted this revision.Mar 30 2018, 3:46 PM

OK. I applied this series locally and looked at the final code.

There are some minor issues with the current series. But I can fix those in flight.

I've also captured a number of other issues in the review of this commit. IMO they can be addressed as follow-ups.

A larger issue that I think may be worth pursuing is dropping the code to support pre-bundle2. I think it is reasonable for a server operator to say that a repo using this feature requires bundle2. Because attempting to handle e.g. bookmark or phases updates pre-bundle2 is a recipe for disaster since these updates are performed on a separate wire protocol command, after changegroup data is exchanged. Clients pushing without bundle2 may see errors on pushkey operations because nodes aren't readily available. And, having a server update bookmarks, phases, etc in multiple bundles isn't reasonable.

I would make it so enabling this extension also implies server.bundle1=false.

hgext/infinitepush/README
2

I think we can delete this file since it is redundant with info in the docstring of the extension.

hgext/infinitepush/__init__.py
1410–1429

This code can be deleted since hg debugfilleinfinitepushmetadata was deleted.

hgext/infinitepush/indexapi.py
11

We'll want to port this to zope.interface.

hgext/infinitepush/sqlindexapi.py
16

I think we should rename this module since it is MySQL specific.

Various database packages do conform to a similar API. So we could potentially make the SQL layer implementation agnostic by coding to an interface.

65

force_ipv6=True. Kudos to Facebook for getting away with that in their network. But this won't fly in the real world.

85–86

And here we have some MySQL specific functionality :/

This revision is now accepted and ready to land.Mar 30 2018, 3:46 PM
This revision was automatically updated to reflect the committed changes.