( )⚙ D2819 hgweb: refactor repository name URL parsing

This is an archive of the discontinued Mercurial Phabricator instance.

hgweb: refactor repository name URL parsing
ClosedPublic

Authored by indygreg on Mar 12 2018, 5:16 PM.

Details

Summary

The hgwebdir WSGI application detects when a requested URL is for
a known repository and it effectively forwards the request to the
hgweb WSGI application.

The hgweb WSGI application needs to route the request based on the
base URL for the repository. The way this normally works is
SCRIPT_NAME is used to resolve the base URL and PATH_INFO
contains the path after the script.

But with hgwebdir, SCRIPT_NAME refers to hgwebdir, not the base
URL for the repository. So, there was a hacky REPO_NAME environment
variable being set to convey the part of the URL that represented
the repository so hgweb could ignore this path component for
routing purposes.

The use of the environment variable for passing internal state
is pretty hacky. Plus, it wasn't clear from the perspective of
the URL parsing code what was going on.

This commit improves matters by making the repository name an
explicit argument to the request parser. The logic around
handling of this value has been shored up. We add various checks
that the argument is used properly - that the repository name
does represent the prefix of the PATH_INFO.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

indygreg created this revision.Mar 12 2018, 5:16 PM
durin42 accepted this revision.Mar 12 2018, 5:27 PM
This revision is now accepted and ready to land.Mar 12 2018, 5:27 PM
This revision was automatically updated to reflect the committed changes.

I installed the latest default branch with SCM Manager, and it 404s even simple things like hg id https://.... I bisected back to this. The paths in the access log looks unchanged:

With this commit:

127.0.0.1 - - [04/Apr/2018:12:44:12 -0400] "GET /hook/hg/?ping=true HTTP/1.1" 204 -
10.10.1.36 - - [04/Apr/2018:12:44:12 -0400] "GET /hg/eng/devsetup?cmd=capabilities HTTP/1.1" 404 949

Parent of this commit:

127.0.0.1 - - [04/Apr/2018:12:47:18 -0400] "GET /hook/hg/?ping=true HTTP/1.1" 204 -
10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET /hg/eng/devsetup?cmd=capabilities HTTP/1.1" 200 422
10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET /hg/eng/devsetup?cmd=lookup HTTP/1.1" 200 43
10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET /hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 30
10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET /hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 -

I'm going to try to add print statements, but if you have any darts you'd like to throw, I'd be happy to try it.

I installed the latest default branch with SCM Manager, and it 404s even simple things like hg id https://.... I bisected back to this. The paths in the access log looks unchanged:
With this commit:
127.0.0.1 - - [04/Apr/2018:12:44:12 -0400] "GET /hook/hg/?ping=true HTTP/1.1" 204 -
10.10.1.36 - - [04/Apr/2018:12:44:12 -0400] "GET /hg/eng/devsetup?cmd=capabilities HTTP/1.1" 404 949
Parent of this commit:
127.0.0.1 - - [04/Apr/2018:12:47:18 -0400] "GET /hook/hg/?ping=true HTTP/1.1" 204 -
10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET /hg/eng/devsetup?cmd=capabilities HTTP/1.1" 200 422
10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET /hg/eng/devsetup?cmd=lookup HTTP/1.1" 200 43
10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET /hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 30
10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET /hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 -
I'm going to try to add print statements, but if you have any darts you'd like to throw, I'd be happy to try it.

Finally figured this out. SCM Manager must be generating this stub hgweb.py (it gets rewritten every time tomcat is restarted):

import os
from mercurial import demandimport
from mercurial.hgweb import hgweb, wsgicgi

repositoryPath = os.environ['SCM_REPOSITORY_PATH']

demandimport.enable()

application = hgweb(repositoryPath)
wsgicgi.launch(application)

Note that it's not using hgwebdir. (I have no idea why not, but it does let you organize repos into virtual directories under '/hg', and git repos under '/git'.) If we simply set the reponame value in request.parserequestfromenv() from env if it wasn't passed in, then everything works. Not real nice, but I assume that we don't want to break existing hosting packages when upgrading just Mercurial? I'll send a patch if this is acceptable.