rust: implementation of `hg`
ClosedPublic

Authored by indygreg on Dec 4 2017, 1:59 AM.

Details

Summary

This commit provides a mostly-working implementation of the
hg script in Rust along with scaffolding to support Rust in
the repository.

If you are familiar with Rust, the contents of the added rust/
directory should be pretty straightforward. We create an "hgcli"
package that implements a binary application to run Mercurial.
The output of this package is an "hg" binary.

Our Rust hg (henceforth "rhg") essentially is a port of the existing
hg Python script. The main difference is the creation of the embedded
CPython interpreter is handled by the binary itself instead of relying
on the shebang. In that sense, rhg is more similar to the "exe wrapper"
we currently use on Windows. However, unlike the exe wrapper, rhg does
not call the hg Python script. Instead, it uses the CPython APIs to
import mercurial modules and call appropriate functions. The amount of
code here is surprisingly small.

It is my intent to replace the existing C-based exe wrapper with rhg.
Preferably in the next Mercurial release. This should be achievable -
at least for some Mercurial distributions. The future/timeline for
rhg on other platforms is less clear. We already ship a hg.exe on
Windows. So if we get the quirks with Rust worked out, shipping a
Rust-based hg.exe should hopefully not be too contentious.

Now onto the implementation.

We're using python27-sys and the cpython crates for talking to the
CPython API. We currently don't use too much functionality of the
cpython crate and could have probably cut it out. However, it does
provide a reasonable abstraction over unsafe {} CPython function
calls. While we still have our fair share of those, at least we're
not dealing with too much refcounting, error checking, etc. So I
think the use of the cpython crate is justified. Plus, there is
not-yet-implemented functionality that could benefit from cpython. I
see our use of this crate only increasing.

The cpython and python27-sys crates are not without their issues.
The cpython crate didn't seem to account for the embedding use case
in its design. Instead, it seems to assume that you are building
a Python extension. It is making some questionable decisions around
certain CPython APIs. For example, it insists that
PyEval_ThreadsInitialized() is called and that the Python code
likely isn't the main thread in the underlying application. It
is also missing some functionality that is important for embedded
use cases (such as exporting the path to the Python interpreter
from its build script). After spending several hours trying to
wrangle python27-sys and cpython, I gave up and forked the project
on GitHub. Our Cargo.toml tracks this fork. I'm optimistic that
the upstream project will accept our contributions and we can
eventually unfork.

There is a non-trivial amount of code in our custom Cargo build
script. Our build.rs (which is called as part of building the hgcli
crate):

  • Validates that the Python interpreter that was detected by the python27-sys crate provides a shared library (we only support shared library linking at this time - although this restriction could be loosened).
  • Validates that the Python is built with UCS-4 support. This ensures maximum Unicode compatibility.
  • Exports variables to the crate build allowing the built crate to e.g. find the path to the Python interpreter.

The produced rhg should be considered alpha quality. There are several
known deficiencies. Many of these are documented with inline TODOs.

Probably the biggest limitation of rhg is that it assumes it is
running from the ./rust/target/<target> directory of a source
distribution. So, rhg is currently not very practical for real-world
use. But, if you can cargo build it, running the binary *should*
yield a working Mercurial CLI.

In order to support using rhg with the test harness, we needed to hack
up run-tests.py so the path to Mercurial's Python files is set properly.
The change is extremely hacky and is only intended to be a stop-gap
until the test harness gains first-class support for installing rhg.
This will likely occur after we support running rhg outside the
source directory.

Despite its officially alpha quality, rhg copes extremely well with
the test harness (at least on Linux). Using
run-tests.py --with-hg ../rust/target/debug/hg, I only encounter
the following failures:

  • test-run-tests.t -- Warnings emitted about using an unexpected Mercurial library. This is due to the hacky nature of setting the Python directory when run-tests.py detected rhg.
  • test-devel-warnings.t -- Expected stack trace missing frame for hg (This is expected since we no longer have an hg script!)
  • test-convert.t -- Test running $PYTHON "$BINDIR"/hg, which obviously assumes hg is a Python script.
  • test-merge-tools.t -- Same assumption about hg being executable with Python.
  • test-http-bad-server.t -- Seeing exit code 255 instead of 1 around line 358.
  • test-blackbox.t -- Exit code 255 instead of 1.
  • test-basic.t -- Exit code 255 instead of 1.

It certainly looks like we have a bug around exit code handling. I
don't think it is severe enough to hold up review and landing of this
initial implementation. Perfect is the enemy of good.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
indygreg created this revision.Dec 4 2017, 1:59 AM
dlax added a subscriber: dlax.Dec 4 2017, 4:12 AM
durin42 added a subscriber: durin42.Dec 4 2017, 5:28 PM
durin42 added inline comments.
rust/hgcli/src/main.rs
131

You might be able to usefully avoid all the unwrap() in here by having an extra layer, eg

fn run() -> Result<(), failure::Error> {
...

let program_name = CString::new(env.python_exe.to_str()?)?.as_ptr();

}

and then

fn main() {

let mut exit_code;
match run() {
  Ok(_) => exit_code = 0,
  Err(e) => { println!("{:?}", e); exit_code = 1; },

}

std::process::exit(exit_code);

}

etc

quark added a subscriber: quark.Dec 4 2017, 6:12 PM
quark added inline comments.
rust/hgcli/src/main.rs
20–32

Maybe use extern crate python27_sys so part of them get defined.

35–38

Have you tried rustfmt? It will remove the space before :.

112

Maybe use .expect("setdefaultencoding failed").

116

Maybe py: Python. That's what rust-cpython uses. It implements Copy.

121

Probably use PyResult<int> since dispatch.run returns the exit code.

173

255 is probably better since that's an uncaught exception.

indygreg updated this revision to Diff 4112.Dec 5 2017, 2:44 AM
yuja added a subscriber: yuja.Dec 5 2017, 9:40 AM
yuja added inline comments.
rust/hgcli/src/main.rs
102

Perhaps we'll need a utility function for platform-specific cstr
conversion.

On Unix, the story is simple. We can use OsStrExt. On Windows,
maybe we'll have to first convert it to "wide characters" by OsStrExt
and then call WideCharToMultiByte to convert it back to ANSI bytes sequence.

I have no idea if Rust stdlib provides a proper way to convert
OsStr to Windows ANSI bytes.

https://stackoverflow.com/a/38948854
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx

indygreg updated this revision to Diff 4165.Dec 7 2017, 2:54 AM
indygreg edited the summary of this revision. (Show Details)
indygreg updated this revision to Diff 4241.Dec 8 2017, 2:21 AM
indygreg edited the summary of this revision. (Show Details)
indygreg retitled this revision from [RFC] rust: Rust implementation of `hg` and standalone packaging to rust: implementation of `hg`.
indygreg updated this revision to Diff 4244.Dec 8 2017, 2:26 AM
indygreg marked 6 inline comments as done.Dec 8 2017, 2:32 AM

I removed the standalone distribution code and cleaned things up a bit.

Over 97% of the tests pass with the Rust hg on Linux. And the failures seem reasonable.

There's a bit of work to be done around packaging, Windows support, Rust linting, etc. But I'd like to get something landed so others have something to play with and so we're not reviewing a massive patch.

What I'm trying to say is "I think this is ready for a real review."

rust/hgcli/src/main.rs
102

I documented a potential path forward in the code. I was thinking we can address this is a follow-up because it is non-trivial to implement. And we may want to do something funky, such as inject the raw bytes into mercurial.argv or something. Since we can call the Windows API directly, we can actually do that now. This opens up a lot of possibilities for encoding...

121

`dispatch.run() actually calls sys.exit()`. There is a bug around somewhere in this code around exit code handling. But it only presents in a few tests, which is odd. I documented that in the commit message.

indygreg updated this revision to Diff 4525.Dec 17 2017, 1:31 PM
yuja requested changes to this revision.Jan 5 2018, 2:38 AM

Suppose this is a kind of contrib code, I think it's good enough to accept.
Can you drop Cargo.lock file?

rust/Cargo.lock
1 ↗(On Diff #4525)

Perhaps Cargo.lock should be excluded from the commit.

rust/hgcli/build.rs
89

Nit: const ?

rust/hgcli/src/main.rs
63

Nit: probably we don't have to clone() here, just move these values.

This revision now requires changes to proceed.Jan 5 2018, 2:38 AM
indygreg updated this revision to Diff 4744.Jan 9 2018, 9:07 PM
indygreg added inline comments.Jan 9 2018, 9:07 PM
rust/Cargo.lock
1 ↗(On Diff #4525)

According to https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries (and other places I found on the Internet), Cargo.lock files under version control are a good practice for binaries - but not libraries.

Since this Cargo.lock encompasses the hgcli binary, I think that counts.

FWIW, Firefox vendors Cargo.lock files for libraries. Although those libraries are attached to C++ binaries, so there isn't an appropriate Rust binary to attach to. But a main purpose of Cargo.lock files is to add determinism. For that reason, I think we want them vendored.

indygreg updated this revision to Diff 4745.Jan 9 2018, 9:10 PM
indygreg marked an inline comment as done.
indygreg marked an inline comment as done.Jan 9 2018, 9:10 PM
yuja added inline comments.Jan 10 2018, 8:06 AM
rust/Cargo.lock
1 ↗(On Diff #4525)

Cargo.lock files under version control are a good practice for binaries

Okay, I didn't know that, thanks. (FWIW, I don't like the idea of version-controlling
strict build dependencies, but let's just follow their rule.)

Can we move Cargo.lock under hgcli directory, then? Sounds like we'll need a
separate Cargo.lock file for "libraries".

indygreg updated this revision to Diff 4753.Jan 10 2018, 11:53 AM
indygreg marked 3 inline comments as done.Jan 10 2018, 11:53 AM

Looks pretty good to me-- left a couple nits inline.

There are a lot of uses of unwrap, expect, and panic that can (and probably should) be replaced with proper error handling using Result (and the failure crate).

There are also a couple of crates that provide safe bindings to Python interpreters-- I'm not sure what your external dependency situation is, but you might consider using something like https://crates.io/crates/pyo3 rather than writing your own unsafe calls to the python interpreter.

rust/hgcli/build.rs
13

Nit: if you move this import into have_shared, you'll only need one cfg and it'll be easier to validate that the proper deps are available for the proper platforms.

89

Nit: not sure what version you're targeting, but 'static is automatic for const vars, so you could write [&str; 2]

rust/hgcli/src/main.rs
46

Nit: you can just write &str. Also, I'm not familiar with what you're trying to do here, but is the PYTHON_INTERPRETER always determined at compile-time? It seems like something you might want to switch on at runtime. Is that not the case?

126

If it needs to live for the whole time, consider storing it in a static or similar. There's a subtle unsafety here: if this program panics (due to panic, unwrap, or expect, program_name will be dropped (free'd) before the python interpreter is killed (when the process ends, that is-- Finalize won't ever be called in that case). I don't know how much of an issue this will be in practice, but it's something to think about.

There are a lot of uses of unwrap, expect, and panic that can (and probably should) be replaced with proper error handling using Result (and the failure crate).

I definitely agree. I still consider this just beyond proof-of-concept code. We'll definitely want to shore things up before we ship. Perfect is very much the enemy of good at this stage.

There are also a couple of crates that provide safe bindings to Python interpreters-- I'm not sure what your external dependency situation is, but you might consider using something like https://crates.io/crates/pyo3 rather than writing your own unsafe calls to the python interpreter.

pyo3 requires non-stable Rust features last I checked. That makes it a non-starter for us at this time (since downstream packagers will insist on only using stable Rust).

If other external dependencies provide the interfaces we need, I'm open to taking those dependencies. But this crate is focused on embedding a Python interpreter. Most (all?) of the Rust+Python crates I found seemed to target the "implementing Python extensions with Rust" use case, not embedding Python. As such, their embedding API support is very lacking. I even had to fork rust-cpython because it didn't implement the proper APIs and is forcing its extension-centric religion on consumers. I've upstreamed most of my modifications though. So hopefully the fork doesn't live much longer...

There are a lot of uses of unwrap, expect, and panic that can (and probably should) be replaced with proper error handling using Result (and the failure crate).

I definitely agree. I still consider this just beyond proof-of-concept code. We'll definitely want to shore things up before we ship. Perfect is very much the enemy of good at this stage.

There are also a couple of crates that provide safe bindings to Python interpreters-- I'm not sure what your external dependency situation is, but you might consider using something like https://crates.io/crates/pyo3 rather than writing your own unsafe calls to the python interpreter.

pyo3 requires non-stable Rust features last I checked. That makes it a non-starter for us at this time (since downstream packagers will insist on only using stable Rust).

If other external dependencies provide the interfaces we need, I'm open to taking those dependencies. But this crate is focused on embedding a Python interpreter. Most (all?) of the Rust+Python crates I found seemed to target the "implementing Python extensions with Rust" use case, not embedding Python. As such, their embedding API support is very lacking. I even had to fork rust-cpython because it didn't implement the proper APIs and is forcing its extension-centric religion on consumers. I've upstreamed most of my modifications though. So hopefully the fork doesn't live much longer...

SGTM! Thanks for clarifying.

durin42 accepted this revision.Jan 10 2018, 5:30 PM
This revision was automatically updated to reflect the committed changes.
indygreg marked 7 inline comments as done.Jan 10 2018, 10:41 PM

I'll address this and other review feedback in follow-up patches.

Thanks for the review!

rust/hgcli/src/main.rs
46

This is meant to be dynamic. The gist of this code is we're trying to find the location of the Python install given various search strategies. The search strategy is (currently) defined at compile time. And this localdev search strategy defines the path to Python at compile time. Look for my features and documentation for this code later. (I stripped out unused code from this patch to make it smaller.)

indygreg marked 3 inline comments as done.Jan 10 2018, 10:41 PM

Overall it looks great. I added a bunch of nitpicks. The most common suggestion was to add some contextual information to error messages. Other then that mostly minor style improvements. Feel free to push back on anything you don't agree with :)

rust/hgcli/build.rs
33
assert!(
    Path::new(&python).exists(),
    "Python interpreter {} does not exist; this should never happen",
    python);

I would also recommend {:?} as the quotes highlight the injected variable nicely and it will make some hard-to-notice things more visible due to escaping.

89

I would also recommend using a slice if you don't intend the size of the array to be part of the type signature.

const REQUIRED_CONFIG_FLAGS: &[&str] = &["Py_USING_UNICODE", "WITH_THREAD"];
110

Use assert!.

116

Use assert!

127
#[cfg(not(target_os = "windows"))]
assert_eq!(
    config.config.get("Py_UNICODE_SIZE"), Some("4"),
     "Detected Python doesn't support UCS-4 code points");
rust/hgcli/src/main.rs
37

Use expect for a better error message. .expect("Error getting executable path")

91

This method allows paths that aren't valid UTF-8 by avoiding ever becoming a str.

CString::new(env.python_home.as_ref().as_bytes())

I would also change the unwrap to .expect("Error setting python home").

99

It appears that HG accepts HGUNICODEPEDANTRY= as not enabling unicode pedantry. Maybe the behavior should be the same here. Untested code below should work.

let pedantry = !env::var("HGUNICODEPEDANTRY").unwrap_or("").is_empty();
111

.expect("Error accessing sys.path")

133

CString::new(env.python_exe.as_ref().as_bytes())

134

.expect("Error setting program name")

200

Since this method returns a Result why not handle the error?

let sys_mod = py.import("sys")?;
yuja added inline comments.Jan 11 2018, 7:31 AM
rust/hgcli/src/main.rs
133

That's more correct on Unix, but wouldn't work on Windows since
native string is UTF-16 variant (Rust) or ANSI (Python 2.7).

kevincox added inline comments.Jan 11 2018, 8:09 AM
rust/hgcli/src/main.rs
133

Oops, for some reason I thought this was in a unix block. as_bytes() isn't actually available otherise. I think that this is fine then. non-UTF-8 paths shouldn't be a major issue..

ruuda added a subscriber: ruuda.Jan 11 2018, 12:09 PM

Awesome! I have just a few things that could be written more briefly.

rust/hgcli/build.rs
39

You can safely indent those, leading whitespace on the continuation line will be stripped from the string literal.

107

There is no need to match:

let result = config.config.get(*key) == Some("1");

Or using assert as @kevincox recommends:

assert_eq!(config.config.get(*key), Some("1"), "Detected ...");
rust/hgcli/src/main.rs
187

There exists Result::map_err for this:

result = run_py(&env, py).map_err(|err| {
    err.print(py);
    255
});