This is an archive of the discontinued Mercurial Phabricator instance.

revset: support reading aliases from a .hgrevsets file
AbandonedPublic

Authored by durin42 on Jul 15 2017, 7:41 PM.

Details

Reviewers
quark
indygreg
Group Reviewers
hg-reviewers
Summary

It is common for projects or companies to define/recommend
common customizations for Mercurial. However, distributing these
customizations can be difficult because there's no distribution
channel built into Mercurial itself. The "configexpress"
extension is making some inroads here. But it is geared towards
hgrc files. I think there's room to make certain customizations
outside of the config system via specially named files checked
into the repository.

In this commit, we introduce a new experimental feature for
reading revset aliases from a .hgrevsets file in the working
directory. A repository can then add this file to define
custom revset aliases which will automagically be made available
to anyone who clones the repo and has a modern enough Mercurial
client [with the feature enabled].

Essentially, the .hgrevsets file is parsed as a [revsetalias]
config section but with lower priority than all config files.
This is because a user may want to overwrite an alias defined in
the repo and since not all users can commit to the repo, config
files or --config must take precedence.

One can imagine applying this pattern of "define customizations in
.hg<thing> files" to other features. For example, repositories could
also define templates - either keywords or full template maps.
Another potential area is filesets. You could create named
collections of files and do things like hg files mycollection
or even reference those named collections as part of building a
narrow clone or sparse checkout profile.

While I'm reasonably confident in the implementation, the feature
is marked as experimental because it is close to freeze and this
feature feels a bit dangerous to enable by default with minimal
time to bake.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Jul 15 2017, 7:41 PM

This should probably be considered an RFC patch, especially given how late in the cycle it is. I'm not opposed to punting this to the next cycle.

quark added a subscriber: quark.EditedJul 16 2017, 8:01 AM

The main feature and implementation looks good to me. My questions are about :minversion and localrepository.anyrevs.

Instead of implementing it at revset.matchany, have you considered localrepository.anyrevs? That might be easier to maintain the current behavior, ex. D97 might become unnecessary.

mercurial/revset.py
2137–2157

Is this version check necessary practically? For example, without the minversion check, defining P=X where X is only available in a new version. Old client will get X undefined. With the minversion check, old client will get P undefined. I think X undefined or P undefined is not that much different.

revset could also be defined by extensions. People can also polyfill old mercurial so they have new revsets. It seems to be a lot of work if we also detect extension enabled and their versions.

Big +1 on this functionality, and I'd be happy to experiment with it now. I couldn't quite figure it out from an extension. Also +1 on doing this for filesets.

My use case is less about generic revsets (e.g., nudge = push -r .), and more about defining ones that collect the paths for various products in the repo. It makes sense to lock those into each revision as the codebase evolves. So my initial thoughts for next cycle are:

  1. Hopefully the paths can define filesets in .hgfileset, and then the fileset be used to define the revset (probably as input to 'files()'), to avoid duplication. (It's not clear to me if there will be ordering/dependency issues when these are read in.)
  1. I think using wdir() by default makes sense. But an option to use another rev makes sense in some cases (consider hg archive -r 1234 -I 'set:product1()', without updating to 1234).
  1. I can see cases where the repo defined alias should be enforced (e.g. a release script running hg archive). I think HGRCPATH= ignores every hgrc except .hg/hgrc, right? A config option that the script knows about to alter the default behavior seems useful.

Not directly related to this, but I've also been wondering how to get defined revsetaliases more discoverable. Maybe recognizing minirst help text inside the # comments in the section, and inserting them into the core help? I don't think an end user cares about the difference between a revset proper and a revsetalias.

yuja added a subscriber: yuja.Jul 19 2017, 9:42 AM

Can we have a single source file for project's revset/fileset aliases and maybe templates?
I don't think it's good idea to create .hgxxx file per config section.

[revsetalias]
filealiasall = all()

[filesetalias]
...

Perhaps they could be opt-in by user config, e.g. experimental.revsetaliasfromrepo=*.

In D98#2104, @yuja wrote:

Can we have a single source file for project's revset/fileset aliases and maybe templates?
I don't think it's good idea to create .hgxxx file per config section.

[revsetalias]
filealiasall = all()
[filesetalias]
...

Perhaps they could be opt-in by user config, e.g. experimental.revsetaliasfromrepo=*.

I like it. What else should we allow in here? Are there security concerns we should think through?

yuja added a comment.Jul 20 2017, 10:17 AM

What else should we allow in here? Are there security concerns we should think through?

Probably safe, but could be used for DoS attack:

  • filesetalias (not implemented)
  • revsetalias
  • committemplate
  • templatealias
  • templates

Unsafe (should never be allowed):

  • alias (arbitrary command execution by shell alias or option like bisect -c CMD)

Obviously, they are all unsafe depending on extension predicates/functions/keywords, such
as a revset shelling out arbitrary inputs.

quark added a comment.EditedAug 9 2017, 5:51 PM

Security-wise, the "shelling out revset" seems hard to solve cleanly. By having %include ../hgrc in $REPO/.hg/hgrc, we could already read config in working copy for a trusted repo.

It seems to me that a lot of security work (ex. knowing the "origin" when executing a revset, marking config items or sections as safe or unsafe by extensions) are required to be able to turn on this feature by default. If we don't turn this on by default because of security, the %include ../hgrc approach seems practical enough already.

yuja added a comment.Aug 10 2017, 8:23 AM

Security-wise, the "shelling out revset" seems hard to solve cleanly.

Yes, and IMHO the only reasonable solution is to not add such revset functions.
revsets and filesets can be executed through the web UI.

In D98#4834, @quark wrote:

It seems to me that a lot of security work (ex. knowing the "origin" when executing a revset, marking config items or sections as safe or unsafe by extensions) are required to be able to turn on this feature by default. If we don't turn this on by default because of security, the %include ../hgrc approach seems practical enough already.

I thought about doing this, instead of fixing up a stripped down projrc extension. But then you have to get every developer to edit the local .hg/hgrc file for every clone they have. So even if we have to have this off by default, I still see value in this in a corporate environment.

I couldn't tell from Yuya's comment if "... not add such revset functions" means don't add support for external revsets at all, or filter them out of this file by default, or something else.

yuja added a comment.Aug 15 2017, 2:46 AM

I couldn't tell from Yuya's comment if "... not add such revset functions" means

I meant "don't load unsafe extensions." revsets and filesets have to be secure against
malicious input because they are accessible through the Web UI. It's up to users to
make a security hole by loading unsafe extensions.

I thought about doing this, instead of fixing up a stripped down projrc extension. But then you have to get every developer to edit the local .hg/hgrc file for every clone they have. So even if we have to have this off by default, I still see value in this in a corporate environment.

Our solution for this is to tell people to use a special clone script.

quark requested changes to this revision.Aug 23 2017, 11:43 PM

Cleaning this from my queue since it's doable using %include .. today in a trusted environment - If you can tell people to toggle a config option in their hgrc, you can also tell people to add that %include.

If we decide to have this feature builtin, I think it's still a long way to go because of the security concerns.

This revision now requires changes to proceed.Aug 23 2017, 11:43 PM

There are a number of security and user control issues at play here.

For security, we need to take a long hard look at what configs can be modified by in-repo files. Revsets already have a "safe" flag that controls what to expose on hgweb. We likely need more of this.

For user control, I could imagine something hooked into hg clone and hg pull. If we encounter a well-known in-tree config file, we can prompt or notify the user to enable it. Enabling would be as simple as dropping a %include into .hg/hgrc or something. We could expose global config options to control the default behavior. e.g. always install, prompt, etc.

This would be a good discussion to have at the Sprint!

durin42 commandeered this revision.
durin42 abandoned this revision.

Let's reopen this if/when we return to it.