Adds a python module that uses the Rust treedirstate to replace the dirstate
map.
Details
- Reviewers
jsgf durham - Group Reviewers
Restricted Project - Commits
- rFBHGX4afa6d8392af: treedirstate: add Python linkage
Diff Detail
- Repository
- rFBHGX Facebook Mercurial Extensions
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Haven't looked into details yet.
The "modern" style is put extensions under hgext3rd.treedirstate. It is less likely to conflict with other modules and avoids one lookup when being loaded.
treedirstate/__init__.py | ||
---|---|---|
2 | We usually keep the GPL header here. | |
3 | This affects hg help output. Use lower-case description to be more consistent with other extensions. | |
4 | Put from __future__ import absolute_import here to make test-check-py3-compat-hg.t happy. |
treedirstate/__init__.py | ||
---|---|---|
66 | This pattern will require searching down into the tree at each step right? Versus maintaining the state of the last search? Maybe add a comment explaining that room for future optimization? | |
328 | Why do we not need to look at the 'now' argument when serializing? The pure/parses.py implementation of pack_dirstate says it's there to handle the case of a dirstate write happening in the same second as a file modification, so it's ambiguous if the file is clean or not. |
The "modern" style is put extensions under hgext3rd.treedirstate.
I thought it was the other way round, so thanks for the pointer. I also took a look at your stack, and I think for consistency I should rename treedirstate/__init__/py to hgext3rd/treedirstate.py and move the rust code to the package hgext3rd.rust.treedirstate. Does that sound right?
Also I see you used non-camel-case for your python interop layer (with #[allow(non_camel_case_types)]). Again, for consistency, I should probably do the same.
I think treedirstate as a whole could be an extension with a Rust submodule.
The reason I put Rust module separately is because I plan to eventually use it not only for changelog indexes, but also obsstore indexes (reused by inhibit or fbamend), or packfile indexes. So it's more like a shared Rust module with some low-level components used by multiple extensions.
rust/treedirstate/src/python.rs | ||
---|---|---|
24–28 | Does this need to be a macro? What about something like: fn map_pyerr<F, T, E>(py: &Python, expr: F) -> PyResult<T> where F: FnOnce() -> Result<T, E>, E: Error, { expr().map_err(|e| PyErr::new(py, e.description()) } Or is that that too cumbersome if the type parameters can't be inferred? | |
66–69 | Might be a bit cleaner to move the PyIterator::from_object() call to its own line (bind to a new var). Or maybe map the iterator rather than using a for loop. | |
70–76 | I assume all these ? will return failure if there's a tuple size or element type mismatch. Are the errors that end up on the Python side reasonably descriptive about what actually went wrong? | |
73 | This this only accept UTF-8 encoded Python strings, or does it do some kind of conversion for Rust strings? | |
77–88 | I'd prefer to flip this around: if !cond { cond not true } else { cond true } reads awkwardly. I assume the 'r' in &FileState::new('r', 0, size, 0) means the same thing as in state != 'r'. | |
127–140 | Lift out the Ok(...)s: let res = if let Ok(filename) ... { ... } else { default }; Ok(res) | |
145–158 | Lift out the Ok(...)s | |
161–164 | The hastrackeddir / has_tracked_dir is bugging me. Is there a reason to make them different? | |
178 | Does this mean anything non-bytes will be treated as the initial case? Should it explicitly check for however None gets represented? Otherwise I could imagine confusing bugs if you accidentally pass something else. Could you use something like filename: Option<PyBytes> to get the macro to handle this automatically? | |
194 | Lift Ok |
rust/treedirstate/Cargo.toml | ||
---|---|---|
26 | Do you need features in the git version? I added D1493 to remove 'static lifetime. A side effect of that is the crates.io version can be used. | |
rust/treedirstate/src/python.rs | ||
24–28 | Maybe it could be Into<PyErr> like D1470. | |
161–164 | The Mercurial coding style is to not use underscores. Either here or .py has to be inconsistent. I slightly prefer making .py consistent. | |
treedirstate/__init__.py | ||
79–81 | This seems unnecessary. There is a check-code rule that forbids .next(): (r'\.next\(\)', "don't use .next(), use next(...)"), | |
178 | Is list() necessary? | |
285 | nit: use a tuple to make it immutable. | |
341 | nit: Prefer with self._opener(...) as st: | |
383 | nit: Use with repo.wlock(): instead. | |
398 | with | |
427 | Maybe move if useinnewrepos to wrapnewreporequirements as if repo.ui.configbool('format', 'usetreedirstate', True):. This is consistent with format.uselz4. | |
432–433 | repo.dirstate is a filecache property. Use extensions.wrapfilecache instead. | |
436 | What is this used for? Could it be removed? setconfig in *setup usually makes chg sad (since the config is a flat map, chg will pick whatever being rewritten by the extension when creating a new ui object. That means the value stored in a config file will be ignored). |
Thanks for the comments. Will take me a little while to rework to address them.
rust/treedirstate/Cargo.toml | ||
---|---|---|
26 | I need this fix from the git version, as I've hit the crash that results without it a couple of times. | |
rust/treedirstate/src/python.rs | ||
24–28 | @jsgf: The reason for the macro is to pass in the type of python exception to raise; PyErr::new is ambiguous without it. An alternative without macros is to write code like: dirstate .get_next_tracked(filename.data(py)) .map_err(|e| PyErr::new::<exc::IOError, _>(py, e.description())? @quark: Is that safe to do? Also, it makes all your errors RuntimeErrors. I was hoping to give more meaningful exception types, but maybe that's not useful. | |
70–76 | Yes, they should be proper Python exceptions (like IndexError), which is the same you'd get elsewhere in Mercurial in that scenario. (In practice these are all C-based dirstatetuple objects, which have exactly four elements and this is the only way to access them). | |
73 | I should probably be using a PyBytes here. In practice it's always a string containing a single ASCII character. For a single byte value, it's surprisingly hard to extract safely. | |
77–88 | Interesting, I see 'r' as the abnormal case and prefer that second. state != 'r' is somehow qualitatively different to !(state == 'r') in my head, even though they're the same. I'll flip them round and see how it looks to me. | |
treedirstate/__init__.py | ||
79–81 | It's needed for Python 2.7's iterator interface. | |
178 | I think so, to match the API for dict.keys(). Importantly you're allowed to modify the dict whilst iterating over keys(). I'm not sure if anything actually uses it. | |
427 | I add the config option in a later diff (this one was large enough already), but I'll move where the test happens. | |
436 | It's used for telemetry. |
rust/treedirstate/Cargo.toml | ||
---|---|---|
26 | That's sad. I guess that explained some of the weird segfaults I have seen. Thanks for locating the root cause! | |
rust/treedirstate/src/python.rs | ||
24–28 | My understanding is ? does an implicit into() so it looks safe? | |
treedirstate/__init__.py | ||
436 | I notice there is tweakdefaults._trackdirstatesizes, since it also has a repo object, could you use repo.requirements instead? |
rust/treedirstate/src/python.rs | ||
---|---|---|
24–28 | There are so many long methods that end up wrapping lines anyway, I think I will just go with the explicit map approach. |
rust/treedirstate/src/python.rs | ||
---|---|---|
24–28 | For Python::acquire_gil(), Python itself allows nested locks from a same thread. With the py_class! macro expanded, all instance methods seem to have py: Python as its input parameter, which indicates lock acquired. That means other foreign threads are not taking the lock. So I think it's safe in instance methods. It adds some overhead calling PyGILState_Ensure and PyGILState_Release though. |
rust/treedirstate/src/python.rs | ||
---|---|---|
161–164 | The mercurial tests also notice these methods because their signatures are sufficiently Pythonish for their regexs to match. We'd either need to mark these files as excluded from the tests, or accept that these are "python functions implemented in Rust" and adopt the python standards. | |
178 | Option<PyBytes> works. I'll use that. I'll also fix up a bunch other places in this file where I could use PyBytes in the signature. |
rust/treedirstate/src/python.rs | ||
---|---|---|
25 | Is it possible to avoid using RefCell<...> and use just make some methods accept &mut self? |
Do you need features in the git version? I added D1493 to remove 'static lifetime. A side effect of that is the crates.io version can be used.