( )⚙ D5495 revset: add "branch" positional arguments to the merge revset

This is an archive of the discontinued Mercurial Phabricator instance.

revset: add "branch" positional arguments to the merge revset
Needs RevisionPublic

Authored by angel.ezquerra on Jan 6 2019, 2:03 PM.

Details

Reviewers
baymax
Group Reviewers
hg-reviewers
Summary

Make it possible to only include those merge revisions that are merges with one
or more specific branches (passed as a positional argument that can receive
either a single branch name or a regular expression). All merge revisions are
shown by default (i.e. if no "merge with" branch or expression is specified).

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

angel.ezquerra created this revision.Jan 6 2019, 2:03 PM
yuja added a subscriber: yuja.Jan 7 2019, 9:17 AM

+@predicate('merge(*withbranch)', safe=True)
def merge(repo, subset, x):

  • """Changeset is a merge changeset.

+ """Changeset is a merge changeset
+
+ All merge revisions are returned by default. If one or more "withbranch"
+ names are provided only merges with those branches (i.e. whose
+ second parent belongs to one of those branches) will be returned.

I understand this will be useful in a certain branch strategy, but the
proposed syntax is hardly extensible. Maybe it can be a non-wildcard argument
of stringmatcher type so we can at least add another option later.

Any thoughts? Do anyone love this feature?

If we had a syntax to filter revisions by sub expression, this and the
samebranch option could be expressed as follows:

merge() & filter($a, p2($a) & branch("..."))
merge() & filter($a, samebranch(parents($a)))

where filter(argname, boolean-expr)

This is much more expressive (or verbose) and can support other types of
branches, but is hard to implement.

pulkit added a subscriber: pulkit.Jan 8 2019, 7:05 AM
In D5495#81562, @yuja wrote:

+@predicate('merge(*withbranch)', safe=True)
def merge(repo, subset, x):

  • """Changeset is a merge changeset.

+ """Changeset is a merge changeset
+
+ All merge revisions are returned by default. If one or more "withbranch"
+ names are provided only merges with those branches (i.e. whose
+ second parent belongs to one of those branches) will be returned.

I understand this will be useful in a certain branch strategy, but the
proposed syntax is hardly extensible. Maybe it can be a non-wildcard argument
of stringmatcher type so we can at least add another option later.
Any thoughts? Do anyone love this feature?

I like this feature as it will be quite useful for us. Also I like your filter() suggestion more than the syntax this series introduces.

In D5495#81562, @yuja wrote:

+@predicate('merge(*withbranch)', safe=True)
def merge(repo, subset, x):

  • """Changeset is a merge changeset.

+ """Changeset is a merge changeset
+
+ All merge revisions are returned by default. If one or more "withbranch"
+ names are provided only merges with those branches (i.e. whose
+ second parent belongs to one of those branches) will be returned.

I understand this will be useful in a certain branch strategy, but the
proposed syntax is hardly extensible. Maybe it can be a non-wildcard argument
of stringmatcher type so we can at least add another option later.
Any thoughts? Do anyone love this feature?
If we had a syntax to filter revisions by sub expression, this and the
samebranch option could be expressed as follows:

merge() & filter($a, p2($a) & branch("..."))
merge() & filter($a, samebranch(parents($a)))
where filter(argname, boolean-expr)

This is much more expressive (or verbose) and can support other types of
branches, but is hard to implement.

Thank you for the review, @yuja

I think it would be a good idea to make the "branch" arguments more flexible. One option could be to use a stringmatcher to add support for regular expressions as you suggest. I can look into that. However there may be some other options worth exploring. The one you suggest is very interesting although I find the syntax a bit complicated for the common use cases that I want to enable which are:

  1. Ignore merges from the same branch, which in a named-branch based branching strategy are usually irrelevant
  2. Look into merges with a specific branch (e.g. which branches have been merged with the default branch)?

In my experience those two are the ones that are the most common and I think we should try to make the easy to use. That is, I think that even if mercurial had a filter function like the one you propose I would still want to be able to express those 2 common merge properties in a simple way.

That being said, I really like your idea since I often find myself being unable to express what I want with a revset (as powerful as those are) because of the lack of a filtering mechanism. Adding a generic filter function would be very useful indeed. I'm not sure if the syntax you propose would work as is though. It seems that it would need a new "&" operator? In any case I believe that it is out of the scope of this particular set of patches. Do you agree? If so I can focus on improving this patch by adding the stringmatcher as you suggest (as it seems I'm not the only one who thinks this would be useful). Is that ok?

yuja added a comment.Jan 10 2019, 8:16 AM
I think it would be a good idea to make the "branch" arguments more flexible. One option could be to use a stringmatcher to add support for regular expressions as you suggest. I can look into that. However there may be some other options worth exploring. The one you suggest is very interesting although I find the syntax a bit complicated for the common use cases that I want to enable which are:
1. Ignore merges from the same branch, which in a named-branch based branching strategy are usually irrelevant
2. Look into merges with a specific branch (e.g. which branches have been merged with the default branch)?
In my experience those two are the ones that are the most common and I think we should try to make the easy to use. That is, I think that even if mercurial had a filter function like the one you propose I would still want to be able to express those 2 common merge properties in a simple way.

Yep, I agree with that.

That being said, I really like your idea since I often find myself being unable to express what I want with a revset (as powerful as those are) because of the lack of a filtering mechanism. Adding a generic filter function would be very useful indeed. I'm not sure if the syntax you propose would work as is though. It seems that it would need a new "&" operator? In any case I believe that it is out of the scope of this particular set of patches. Do you agree?

Yes. Actually I have a PoC-level implementation of generic filtering predicate,
which can be reviewed separately.

If so I can focus on improving this patch by adding the stringmatcher as you suggest (as it seems I'm not the only one who thinks this would be useful). Is that ok?

Sounds good to me. To be clear, I want 'withbranch' instead of
'*withbranch', because the withbranch option doesn't look like a first-class
parameter of the merge() predicate.

In D5495#82036, @yuja wrote:
I think it would be a good idea to make the "branch" arguments more flexible. One option could be to use a stringmatcher to add support for regular expressions as you suggest. I can look into that. However there may be some other options worth exploring. The one you suggest is very interesting although I find the syntax a bit complicated for the common use cases that I want to enable which are:
1. Ignore merges from the same branch, which in a named-branch based branching strategy are usually irrelevant
2. Look into merges with a specific branch (e.g. which branches have been merged with the default branch)?
In my experience those two are the ones that are the most common and I think we should try to make the easy to use. That is, I think that even if mercurial had a filter function like the one you propose I would still want to be able to express those 2 common merge properties in a simple way.

Yep, I agree with that.

That being said, I really like your idea since I often find myself being unable to express what I want with a revset (as powerful as those are) because of the lack of a filtering mechanism. Adding a generic filter function would be very useful indeed. I'm not sure if the syntax you propose would work as is though. It seems that it would need a new "&" operator? In any case I believe that it is out of the scope of this particular set of patches. Do you agree?

Yes. Actually I have a PoC-level implementation of generic filtering predicate,
which can be reviewed separately.

If so I can focus on improving this patch by adding the stringmatcher as you suggest (as it seems I'm not the only one who thinks this would be useful). Is that ok?

Sounds good to me. To be clear, I want 'withbranch' instead of
'*withbranch', because the withbranch option doesn't look like a first-class
parameter of the merge() predicate.

This would not make it possible to select multiple "merged with" branches by doing: hg log -r "merge(feature1, feature2)"
Instead I guess you are proposing that for that use case we force the user to do: hg log -r "merge('re:(feature1|feature2)')

Did I understand you correctly?

yuja added a comment.EditedJan 11 2019, 9:23 PM
This would not make it possible to select multiple "merged with" branches by doing: hg log -r "merge(feature1, feature2)"
Instead I guess you are proposing that for that use case we force the user to do: hg log -r "merge('re:(feature1|feature2)')
Did I understand you correctly?

Yes. And we can introduce other prefixes of string matcher if re: seemed
unnecessarily complex.

EDIT: or introduce merge(withbranch=(foo, bar)) syntax. This works, but is really new
syntax so I don't want to go this way at this point.

angel.ezquerra edited the summary of this revision. (Show Details)Jan 13 2019, 5:45 PM
angel.ezquerra updated this revision to Diff 13204.
yuja added a comment.Jan 13 2019, 10:11 PM

Generally looks good.

Can you fix a couple of nits? And if possible, fold the tests from D5577
into this and the next patch. We prefer including relevant test in each
commit.

-@predicate('merge()', safe=True)
+@predicate('merge(withbranch)', safe=True)

merge([withbranch]) as it is an optional parameter.

def merge(repo, subset, x):

  • """Changeset is a merge changeset.

+ """Changeset is a merge changeset
+
+ All merge revisions are returned by default. If a "withbranch"
+ pattern is provided only merges with (i.e. whose second parent
+ belongs to) those branches that match the pattern will be returned.
+ The simplest pattern is the name of a single branch. It is also
+ possible to specify a regular expression by starting the pattern
+ with "re:". This can be used to match more than one branch
+ (e.g. "re:branch1|branch2").

"""
# i18n: "merge" is a keyword
  • getargs(x, 0, 0, _("merge takes no arguments"))

+ args = getargsdict(x, 'merge', 'withbranch')
+ withbranch = ''
+ if 'withbranch' in args:
+ withbranch = getstring(args['withbranch'],
+ _('withbranch argument must be a string'))
+ kind, branchname, branchmatcher = stringutil.stringmatcher(withbranch)

Can you merge this with the next if withbranch: block to reduce the number
of conditionally defined variables referenced later?

cl = repo.changelog
  • return subset.filter(lambda r: cl.parentrevs(r)[1] != -1,
  • condrepr='<merge>')

+ # create the function that will be used to filter the subset
+ if withbranch:

+ # matchfn is a function that returns true when a revision
+ # is a merge and the second parent belongs to a branch that
+ # matches the withbranch pattern (which can be a literal or a regex)

Nit: these comments seem a bit verbose. It's documented in stringmatcher().

+ if kind == 'literal':
+ matchfn = lambda r: (cl.parentrevs(r)[1] != -1
+ and repo[r].p2().branch() == withbranch)
+ else:
+ matchfn = lambda r: (cl.parentrevs(r)[1] != -1
+ and branchmatcher(repo[r].p2().branch()))

If we don't have anything special about the literal kind, we can always use
the branchmatcher(). if kind == 'literal' isn't needed.

What about making the argument a revset instead of a branch name. You can get the same result merge(branch("foo") but have a more expressive result merge(only(4.8, 4.7)) ?

(Sorry to be a bit late to the party)

yuja added a comment.Jan 14 2019, 8:30 AM
What about making the argument a revset instead of a branch name. You can get the same result `merge(branch("foo")` but have a more expressive result `merge(only(4.8, 4.7))` ?

That's basically a simpler form of my filter() proposal.

The problem of merge(branch("foo")) is that it's ambiguous which revision
the expression will be tested against. It could be expressed as
merge(p2=branch("foo")) to disambiguate, but this syntax isn't generic
enough to express the samebranch=True constraint. So if we want a truly
expressive syntax, we'll need something like a lambda function.

filtereach(merge(), p2(_) & branch("foo"))
filtereach(merge(), samebranch(parents(_))

I agreed with Angel in that merge(withbranch=name) would be useful enough
to have a dedicated syntax, but I'm open to other ideas like
merge(p1=expr, p2=expr).

In D5495#82397, @yuja wrote:

Generally looks good.
Can you fix a couple of nits? And if possible, fold the tests from D5577
into this and the next patch. We prefer including relevant test in each
commit.

-@predicate('merge()', safe=True)
+@predicate('merge(withbranch)', safe=True)

merge([withbranch]) as it is an optional parameter.

def merge(repo, subset, x):

  • """Changeset is a merge changeset.

+ """Changeset is a merge changeset
+
+ All merge revisions are returned by default. If a "withbranch"
+ pattern is provided only merges with (i.e. whose second parent
+ belongs to) those branches that match the pattern will be returned.
+ The simplest pattern is the name of a single branch. It is also
+ possible to specify a regular expression by starting the pattern
+ with "re:". This can be used to match more than one branch
+ (e.g. "re:branch1|branch2").

"""
# i18n: "merge" is a keyword
  • getargs(x, 0, 0, _("merge takes no arguments"))

+ args = getargsdict(x, 'merge', 'withbranch')
+ withbranch = ''
+ if 'withbranch' in args:
+ withbranch = getstring(args['withbranch'],
+ _('withbranch argument must be a string'))
+ kind, branchname, branchmatcher = stringutil.stringmatcher(withbranch)

Can you merge this with the next if withbranch: block to reduce the number
of conditionally defined variables referenced later?

cl = repo.changelog
  • return subset.filter(lambda r: cl.parentrevs(r)[1] != -1,
  • condrepr='<merge>')

+ # create the function that will be used to filter the subset
+ if withbranch:
+ # matchfn is a function that returns true when a revision
+ # is a merge and the second parent belongs to a branch that
+ # matches the withbranch pattern (which can be a literal or a regex)

Nit: these comments seem a bit verbose. It's documented in stringmatcher().

+ if kind == 'literal':
+ matchfn = lambda r: (cl.parentrevs(r)[1] != -1
+ and repo[r].p2().branch() == withbranch)
+ else:
+ matchfn = lambda r: (cl.parentrevs(r)[1] != -1
+ and branchmatcher(repo[r].p2().branch()))

If we don't have anything special about the literal kind, we can always use
the branchmatcher(). if kind == 'literal' isn't needed.

In D5495#82407, @yuja wrote:
What about making the argument a revset instead of a branch name. You can get the same result `merge(branch("foo")` but have a more expressive result `merge(only(4.8, 4.7))` ?

That's basically a simpler form of my filter() proposal.
The problem of merge(branch("foo")) is that it's ambiguous which revision
the expression will be tested against. It could be expressed as
merge(p2=branch("foo")) to disambiguate, but this syntax isn't generic
enough to express the samebranch=True constraint. So if we want a truly
expressive syntax, we'll need something like a lambda function.

filtereach(merge(), p2(_) & branch("foo"))
filtereach(merge(), samebranch(parents(_))

I agreed with Angel in that merge(withbranch=name) would be useful enough
to have a dedicated syntax, but I'm open to other ideas like
merge(p1=expr, p2=expr).

@lothiraldan , your proposal is interesting but as @yuja said it is less generic that introducing a full blown filter function. Also, to be really flexible it would require some way to refer to the p2 revision (i.e. you'd need to introduce an implicit or explicit reference to each of the p1 or p2 revisions). I think this would make the syntax for the most common use cases that I want to cover too complex. That being said, the proposed syntax leaves open the door for introducing such revset based filtering in the future if needed (although I think that a generic filter function would be more useful).

@yuja , I've sent an updated set of patches, following your recommendations. There are 2 patches now, since each includes its own tests. This means that the 3rd patch on the original patch set is no longer needed. However I don't know what is the best way to tell that to phabricator...

I've sent an updated set of patches, following your recommendations. There are 2 patches now, since each includes its own tests. This means that the 3rd patch on the original patch set is no longer needed. However I don't know what is the best way to tell that to phabricator...

I think you should be able to go to that 3rd patch, and set it to abandoned in the actions at the bottom.

Thank you Matt. I just did that.

Cheers,

Angel

baymax requested changes to this revision.Jan 24 2020, 12:32 PM

There seems to have been no activities on this Diff for the past 3 Months.

By policy, we are automatically moving it out of the need-review state.

Please, move it back to need-review without hesitation if this diff should still be discussed.

:baymax:need-review-idle:

This revision now requires changes to proceed.Jan 24 2020, 12:32 PM