Page MenuHomePhabricator

revset: add new contiguous(x) function for "x::x"
AbandonedPublic

Authored by martinvonz on Mar 15 2019, 1:27 AM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

"x::x" is a useful trick for making a range contiguous, but it gets
annoying if "x" is a long expression. Let's provide a simple function
that helps with that. It also makes it the trick more discoverable.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

martinvonz created this revision.Mar 15 2019, 1:27 AM

I am a fan of this function, I need this on a regular basis. Having an
explicit function for this also open the way to various optimization.
For example we know that a set already has this property we could skip
all computation.

I am ambivalent about the naming however. It feels a bit odd. There are
case where it could be misleading.

Lets look at the following case:

c e
| |
b d
|/
a

the revset (b+c+d+e)::(b+c+d+e) returns the same b+c+d+e, however
the set is not "contiguous" as b+c and d+e as not connected.

This feels a bit more like a "closure" operation to me.

Cheers,

I am a fan of this function, I need this on a regular basis. Having an
explicit function for this also open the way to various optimization.
For example we know that a set already has this property we could skip
all computation.
I am ambivalent about the naming however. It feels a bit odd. There are
case where it could be misleading.
Lets look at the following case:

c e
| |
b d
|/
a

the revset (b+c+d+e)::(b+c+d+e) returns the same b+c+d+e, however
the set is not "contiguous" as b+c and d+e as not connected.

Right, that's what I tried to express with "without adding new common ancestors or common descendants" in the documentation.

This feels a bit more like a "closure" operation to me.

That's what I suggested on IRC because the operation somehow made me think of a closure, but when I looked up what a closure is, it seems like ancestors() is the actual closure function (if we consider the direction to point from child to parents as we normally do).

I chatted a bit with Georges about this. He suggested something along
the line of fill or complete.

I chatted a bit with Georges about this. He suggested something along
the line of fill or complete.

fill also came up in IRC yesterday. @spectral found it slightly confusing that it there's a template function with the same name. I don't think we talked about complete. Other examples were connect, fillgaps, gapfill. Oh, and apparently nbjoerg also suggested closure (before I did) :) Funny how three of us came up with the same name. I still like contiguous the best, but let's see if we got other suggestions or votes for existing suggestions.

I thought of "closure" as well, but I fear it has too many possible meanings, "transitive closure" being one of them in that context (certainly related, but not the same thing), and of course the closures in functional programming.

I think "hull" could be appropriate, if not too pedantic.

I couldn't find out quickly if people dealing with partial ordered sets theory (another way to think of DAGs) actually use "hull", but here's the analogy:
for the convex hull, you add [a, b] (line segment) to the set whenever a and b belong to it, for this "poset hull" you do the same with a::b (which in poset theory would be called the interval [a, b])

This S::S operation seems to be a special case of Closure Operators in the sense of https://en.wikipedia.org/wiki/Closure_operator, which tells us that "hull" can indeed be used as alternative terminology in some cases where "closure" can be confusing (they quote topology). In that same article, there's inded yet another meaning in the context of partially ordered sets, as a generalisation of the very first definition (generalising the power set to any partially ordered set).

/taking rusty mathematical hat off now

I thought of "closure" as well, but I fear it has too many possible meanings, "transitive closure" being one of them in that context (certainly related, but not the same thing), and of course the closures in functional programming.
I think "hull" could be appropriate, if not too pedantic.
I couldn't find out quickly if people dealing with partial ordered sets theory (another way to think of DAGs) actually use "hull", but here's the analogy:
for the convex hull, you add [a, b] (line segment) to the set whenever a and b belong to it, for this "poset hull" you do the same with a::b (which in poset theory would be called the interval [a, b])

Wouldn't the hull pretty much be heads(contiguous(x)) or roots(contiguous(x))? Regardless, most users are not math nerds. I still think contiguous() is pretty clear. Sure, it won't join separate branches by adding ancestors or descendants, but I'm not sure there is a name that conveys that meaning too while not being too academic. @av6 also prefers contiguous, so we have two votes for that. I'm not sure how many votes for closure we have.

If we still want to find a better name, maybe it helps to remember that the function returns nodes that are *both* ancestors and descendants of some nodes in the input. Maybe something family-related (like "ancestors" and "descendants" are)? But I can't think of any term like that.

This S::S operation seems to be a special case of Closure Operators in the sense of https://en.wikipedia.org/wiki/Closure_operator, which tells us that "hull" can indeed be used as alternative terminology in some cases where "closure" can be confusing (they quote topology). In that same article, there's inded yet another meaning in the context of partially ordered sets, as a generalisation of the very first definition (generalising the power set to any partially ordered set).
/taking rusty mathematical hat off now

yuja added a subscriber: yuja.Mar 15 2019, 10:43 PM

I think "contiguous" is good as it stands for the main use case. "closure"
seems confusing unless we have stronger math background than computer science.
The other candidates would require more knowledge about the theory.

In D6140#89474, @yuja wrote:

I think "contiguous" is good as it stands for the main use case. "closure"
seems confusing unless we have stronger math background than computer science.
The other candidates would require more knowledge about the theory.

I'd say we've reached a decision then, but I'm obviously biased.

I spend some more time thinking about it, especially about the semantic
I would expect from a contiguous function. My conclusion is that I'm
seeing something call contiguous more as a filter than something
adding element to the set.

The first idea that comes to mind is to filter elements in a set that
are "contiguous to" another. contiguous(X, Y) would return element
from Y which are connected to an element in X (through Y). Another
possible behavior would be for contiguous(X) to return X if the set
is contigous and nothing otherwise. (Of course I'm not suggesting we
implement theses now, I am just trying to get a sense of what the name
could mean and how user could get confused.

What Martin is trying to achieve here is a simple function to express
X::X. So maybe it could be a special case of a function expression X::Y.
However, we don't have this function right now. Maybe we jut need to
introduce it, with a clear name eg revbetween or connected or
dagrange, with an optionnal second arguments:

  • X::Y → dagrange(X, Y)
  • X::X → dagrange(X)

Finding the name for this function will be "simpler" because the two
arguments value would be clearer. In the case we are discussing right
now, `contiguous(X, Y) seems less clear to me than connect(X, Y) or
dagrange(X, Y).

Conclusion, I am not very enthousiastic for contiguous. If we stay on
a same API (single argument) I think  connect or fillgap are better
fits. However I feel like introducing an explicit function for :: is
probably the best way to have a clear API.

In all cases, "contiguous" is not awful, so even if I don't like it
much, I'll survive if there is a consensus behind it.

I spend some more time thinking about it, especially about the semantic
I would expect from a contiguous function. My conclusion is that I'm
seeing something call contiguous more as a filter than something
adding element to the set.

I can see the argument that most functions are nouns (which is not exactly what you're saying, but your comment reminded me of it). I thought of interior before, but didn't mention it because it sounds too much like it's exclusive of the input set. Other than that, I kind of like that name.

The first idea that comes to mind is to filter elements in a set that
are "contiguous to" another. contiguous(X, Y) would return element
from Y which are connected to an element in X (through Y). Another
possible behavior would be for contiguous(X) to return X if the set
is contigous and nothing otherwise. (Of course I'm not suggesting we
implement theses now, I am just trying to get a sense of what the name
could mean and how user could get confused.
What Martin is trying to achieve here is a simple function to express
X::X. So maybe it could be a special case of a function expression X::Y.
However, we don't have this function right now. Maybe we jut need to
introduce it, with a clear name eg revbetween or connected or
dagrange, with an optionnal second arguments:

  • X::Y → dagrange(X, Y)
  • X::X → dagrange(X)

And ::X and X::? Maybe dagrange(heads=X) and dagrange(roots=X), respectively? That implies that roots and heads default to null and heads(), but it seems weird that dagrange(X) is equivalent to dagrange(roots=X, heads=X).

Finding the name for this function will be "simpler" because the two
arguments value would be clearer. In the case we are discussing right
now, `contiguous(X, Y) seems less clear to me than connect(X, Y) or
dagrange(X, Y).
Conclusion, I am not very enthousiastic for contiguous. If we stay on
a same API (single argument) I think  connect or fillgap are better
fits. However I feel like introducing an explicit function for :: is
probably the best way to have a clear API.
In all cases, "contiguous" is not awful, so even if I don't like it
much, I'll survive if there is a consensus behind it.

I feel like connect suffers more from the problem you initially raised: it sounds like it will add commits down to a common ancestor (or up to a common descendant).

I'd be fine with "fillgap" (which is also not a noun, though). But we already had a few votes for "contiguous" and I don't want to bikeshed this name much more.

[…]
What Martin is trying to achieve here is a simple function to express
X::X. So maybe it could be a special case of a function expression X::Y.
However, we don't have this function right now. Maybe we jut need to
introduce it, with a clear name eg revbetween or connected or
dagrange, with an optionnal second arguments:

  • X::Y → dagrange(X, Y)
  • X::X → dagrange(X)

And ::X and X::? Maybe dagrange(heads=X) and dagrange(roots=X), respectively? That implies that roots and heads default to null and heads(), but it seems weird that dagrange(X) is equivalent to dagrange(roots=X, heads=X).

::X and X:: are cover by ancestors(X) and descendant(X), so they are already covered. They are also a bit different from X::Y as they are simpler (bounded only on one side). I don't think we need to cover these case in "dagrange".

I've only used X::X where X was trivial, so I'm still trying to get my mind around this. Out of curiosity, what are the scenarios where a nontrivial X is useful?

I'm sure I've used the word "contiguous" when describing the :: operator to people, but the case @marmoute referenced and even the contiguous(9+3+4) result don't match my expectations of the English word. (For the latter, the 3 doesn't seem contiguous with the rest of the set.) FWIW, the first 3 words in the help for :: is "A DAG range".

Maybe if there are a couple of entries in the example section, it would reduce the surprise, whatever the name?

I've only used X::X where X was trivial, so I'm still trying to get my mind around this. Out of curiosity, what are the scenarios where a nontrivial X is useful?

It's only come up a few times for me. Most recently, it was @spectral who wanted to collapse merge commits. For example, if a graph merged A with B and then the result with C, we wanted to find A, B, and C. That could be achieved with heads(contiguous(::x - x - merge())) where x is some descendant.

I'm sure I've used the word "contiguous" when describing the :: operator to people, but the case @marmoute referenced and even the contiguous(9+3+4) result don't match my expectations of the English word.

I think we're still looking for English words where the behavior of this function would match expectations.

(For the latter, the 3 doesn't seem contiguous with the rest of the set.)

I'm not sure which 3 you mean. 4 and 9 were made contiguous by the addition of 8, but 3 was a on different branch, so it wasn't. What the revset actually does is to include all commit that have both ancestors and descendants in the input set.

FWIW, the first 3 words in the help for :: is "A DAG range".

Maybe it's just me, but I think it's a bit unfortunate if we describe :: as "a DAG range" and then have a dagrange() function that doesn't support all the same cases (specifically, it won't support ::x and x::).

Maybe if there are a couple of entries in the example section, it would reduce the surprise, whatever the name?

Probably a good idea.

Josef 'Jeff' Sipek <jeffpc@josefsipek.net> sent this to mercurial-devel. I'm adding it here for reference.

"x::x" is a useful trick for making a range contiguous, but it gets
annoying if "x" is a long expression. Let's provide a simple function
that helps with that. It also makes it the trick more discoverable.

...

+@predicate('contiguous(set)', safe=True, takeorder=True)
+def contiguous(repo, subset, x, order):
+ """Changesets that have both ancestors and descendants in the set. This
+ effectively fills in gaps in the set to make it contiguous, without adding
+ new common ancestors or common descendants.
+
+ "contiguous(x)" is identical to "x::x".

I read this doc string and the patch intro several times, and every time I
concluded that this function was useless. Only after reading some of the
other replies, did I realize that "x" here can be a set.

Therefore, technically, the above is correct. Practically, if you are a
mere mortal and you aren't used to very advanced revsets, the doc string
will make you go "huh?". I don't have a good suggestion how to improve it,
I just thought I'd point out that it's a bit...opaque to more novice users.

I agree that the name isn't the best, but I'll stay away from this bike shed
:)

Jeff.

Josef 'Jeff' Sipek <jeffpc@josefsipek.net> sent this to mercurial-devel. I'm adding it here for reference.

"x::x" is a useful trick for making a range contiguous, but it gets
annoying if "x" is a long expression. Let's provide a simple function
that helps with that. It also makes it the trick more discoverable.

...

+@predicate('contiguous(set)', safe=True, takeorder=True)
+def contiguous(repo, subset, x, order):
+ """Changesets that have both ancestors and descendants in the set. This
+ effectively fills in gaps in the set to make it contiguous, without adding
+ new common ancestors or common descendants.
+
+ "contiguous(x)" is identical to "x::x".

I read this doc string and the patch intro several times, and every time I
concluded that this function was useless. Only after reading some of the
other replies, did I realize that "x" here can be a set.

The docstring does say "in the set" :) But I agree that it's not very clear. I copied the pattern from other functions. I would probably have said "in the input set" otherwise. Do you think that would have been clearer? We could make that change to all the existing cases of plain "set" referring to the input.

Josef 'Jeff' Sipek <jeffpc@josefsipek.net> sent this to mercurial-devel. I'm adding it here for reference.
I read this doc string and the patch intro several times, and every time I
concluded that this function was useless. Only after reading some of the
other replies, did I realize that "x" here can be a set.

The docstring does say "in the set" :) But I agree that it's not very clear. I copied the pattern from other functions. I would probably have said "in the input set" otherwise. Do you think that would have been clearer? We could make that change to all the existing cases of plain "set" referring to the input.

That seems good to me. Maybe there should be a general blurb somewhere that a single element is treated like a set for input purposes, and doesn't need any special syntax to make it a set. That was something I got confused by when I first started using revsets.

I'm sure I've used the word "contiguous" when describing the :: operator to people, but the case @marmoute referenced and even the contiguous(9+3+4) result don't match my expectations of the English word.

I think we're still looking for English words where the behavior of this function would match expectations.

(For the latter, the 3 doesn't seem contiguous with the rest of the set.)

I'm not sure which 3 you mean. 4 and 9 were made contiguous by the addition of 8, but 3 was a on different branch, so it wasn't. What the revset actually does is to include all commit that have both ancestors and descendants in the input set.

Sorry, I meant to specify the last test:

$ hg log -G -T '{rev}\n' --config experimental.graphshorten=True
@  9
o  8
| o  7
| o  6
|/|
| o  5
o |  4
| o  3
o |  2
|/
o  1
o  0

$ log 'contiguous(9+3+4)'
3
4
8
9

I agree that the output is behaving as documented, and this is existing behavior. I was just noting the mental disconnect I have with the name in this situation, where 3 is "contiguous". I'm not sure what you mean by "3 ... wasn't", because it's in the output.

FWIW, the first 3 words in the help for :: is "A DAG range".

Maybe it's just me, but I think it's a bit unfortunate if we describe :: as "a DAG range" and then have a dagrange() function that doesn't support all the same cases (specifically, it won't support ::x and x::).

Yeah, I can see what you mean now. I'm not sure what the :x kind of range is called in python, but it kind of reminds me of an open range, whereas x::y is closed or bounded. So maybe dagclosedrange(x) handles this, and dagopenrange(...) handles ::x and x::? As I was typing this out, I ended up with the same idea you had suggested above with dagrange(roots=X, heads=X). I'm not sure why you think that's weird- it seems like the same rules for heads and roots with ::.

martinvonz abandoned this revision.Mar 23 2019, 12:18 AM