This is an archive of the discontinued Mercurial Phabricator instance.

Differential D2057

rust implementation of hg status
Needs RevisionPublic

Authored by Ivzhh on Feb 5 2018, 5:31 PM.

Download Raw Diff

Details

Reviewers

kevincox
baymax

Group Reviewers

hg-reviewers

Summary

implementation of revlog v1
parsing changelog, manifest, dirstate
use .hgignore in repo root
comparable performance with current hg status (Linux & Mac: slightly faster, Windows: slightly slower)
use hg r-status as subcommand, in this case, bypass python engine

Diff Detail

Repository

rHG Mercurial

Branch

phab-submit-D2057-2018-02-05 (bookmark) on default (branch)

Lint

No Linters Available

Unit

No Unit Test Coverage

Event Timeline

Ivzhh created this revision.Feb 5 2018, 5:31 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptFeb 5 2018, 5:31 PM

Herald added subscribers: mercurial-devel, kevincox, durin42. · View Herald Transcript

I'd be curious to see what @indygreg has to say about this, maybe wait on his input before doing any work in response to my feedback?

I do wonder if we should have at least three crates:

hgcli
libmercurial
hgcext

The first one would be the command-line entry point, the last could use the cpython API, and libmercurial would be "pure rust" and open the door to eventually having a libhg or something that exports C functions and would be suitable for cffi and linking into other binaries?

rust/hgcli/src/hgext/base85.rs
22 ↗	(On Diff #5238)	I think I'd like to separate things a bit more and have a Python-free module, and then a glue module that we can use to call into the pure Rust. Part of the reason is that in my perfect world we won't use the cpython crate for speedups so they can be used from pypy as well. Separating them at least makes it easier to have an extern "C" version of the method that can be used from cffi instead of only through the CPython API. (Not sure what opinions others have. It's likely that I'll attempt this approach in the near future as part of a continued attempt to speed up `hg diff`.)

I am open to the three-crates plan. Oirginally I have hgcli and hgext separately, and I was planning to use CFFI mode. I am a pypy user too, so I will be willing to provide a python C API free crate for pypy and others.

Yes, we should definitely split things into multiple crates. Small, narrowly-focused crates does seem to be the Rust way, after all.

hgcli should be for things specific to the Rust implementation of hg. I think this can also include the feature set of chg (once we've ported chg to Rust).

I definitely support separating the "pure Rust" from the "Python Rust" via a crate boundary. It is generally useful to have Rust that isn't bound to Python because it will facilitate reuse outside of Python contexts. For example, someone could implement a Mercurial wire protocol server in pure Rust without needing to worry about Python. Of course, we're likely to encounter areas where we really want tight coupling in order to achieve optimal performance in Python. So we may have to design APIs on the pure Rust side to facilitate CPython use. I'm OK with that.

As for how many crates to have, I don't have super strong opinions. I could see us putting every little component/subsystem in its own crate. I could also see us putting everything in one large crate. I don't think it is worth deciding at this early juncture. API design and ability to be reused outside its originally intended purpose is the important property to strive for. I think that has more to do with how the code is authored rather than which crates things are in.

A missing piece of this patch is the build system and module loader integration. We have a module policy that dictates which implementation of a Python module we use. We probably want to introduce a rust policy that uses Rust-based modules where available and falls back to the cext modules/policy if a Rust module isn't available. We also need to figure out how to integrate Rust into setup.py. But I think the build system bit can be deferred until we're actually ready to ship Rust, which is still a bit of ways off. I'm happy for the workflow to be run cargo in order to load Rust modules for the time being. But if you can implement Makefile and/or setup.py integration to build these Rust extensions, that would be awesome.

Sure, thank you for the comments! I can definitely prepare makefile and setup.py to make the building process work with rust part. I am planning to change the policy.py module to support and try to load rust modules and run all the tests. I will submit a new patch after finishing these two tasks.

After reading wiki/OxidationPlan again, I plan to change to cffi for better compatibility (pypy and others), and try to build algorithms in pure rust. Shall I wait till migrating to cffi based solution now and resubmit this patch with all three changes (building, testing, and cffi)?

Thank you!

We generally prefer that patches to Mercurial be small and do a single thing. This makes it easier to review and understand changes, since each change can be evaluated in isolation. If you submit changesets together using hg phabsend, they automatically show up as a stack in Phabricator. And if changesets at the bottom of the stack are ready to land, we generally land those without waiting for the entire stack to land. This enables forward progress to be made and this is generally better for everyone than waiting until a series of commits is perfect before adding any of them.

What that means is you should ideally split this work into smaller parts. For example:

Add the pure Rust code/crate
Add the Python Rust code/crate
Build system / module policy changes

I'm not sure of the order of things though. Since this is the first Rust extension, it's not clear what needs to be implemented in what order. I'm fine looking at a large commit if things are too tightly coupled to separate. But you should strive to make smaller commits.

Thank you @indygreg for your detailed explanation!

I understand the process now, and I will go back reading the developer's guide thoroughly again. I will try my best to provide a relatively clean stack of patches.

Thank you for you time!

I agree with the splitting comments :) In fact there might already be a base85 crate which can be used: https://docs.rs/zero85. Either way I'll hold off on the review, feel free to ping me when you are ready for me to take a look.

What would be the advantage of taking this? Since we already have the C implementation, it's not likely to gain us any performance. On the other hand, it might make a good test case for integrating Rust and Python, finding the right API boundaries and experimenting with different approaches, precisely because we already have a C implementation. @indygreg @durin42 what are your thoughts about it?

As the author of this patch, actually I have the same concern. I started to translate base85 as baby steps to find a way of integrating rust and cpython, on my side, Today I modify setup.py, policy.py and makefile to run hg's test suit with the new base85. For myself, it is only proof of concept.

Maybe I should take another way: translate more python modules into CFFI-style, and let CFFI call rust implementation. And gradually change more implementations of python modules with corresponding cffi-style, while keep the python interface the same. My own hope is the rust routines will be able to call each other and eventually run some basic tasks without calling python part. And the rust still lazily provides info to python interface for extensions etc.

I am exploring this way now, and hope the findings will be useful for community to make decision.

Thank you all for the comments!

To be honest, we're not yet sure what we'll decide for the Python -> Rust bridge. The problem is summarized in the Rust <=> Python Interop section on https://www.mercurial-scm.org/wiki/OxidationPlan.

I suspect at some level we'll need a CPython extension for CPython for performance reasons (especially for high volume function calls). PyPy obviously uses CFFI. I think the ideal outcome is we can write Rust that exposes a C API and use CFFI natively on PyPy and something like cbindgen + Milksnake to auto-generate a CPython extension that acts as a wrapper around the C API exposed by Rust. I'm not sure if anyone has invented this exact wheel yet. If not, it's probably faster to use rust-cpython. Maybe several months from now we have enough Rust and maintaining rust-cpython is painful enough that we pursue the auto-generated CPython extension route.

What I'm trying to say is you have a green field to explore! But at this juncture, perfect is the enemy of done. We'll be happy with any forward progress, even failed experiments.

Thank you @indygreg!

The OxidationPlan is my best reference when I started to make a move, and this thread is even more helpful. I am really interested in exploring this ;-) In 2014 I was trying to change the hg backend storage to Postgres, a silly and failed experiment.

Anyway, I will save everyone's time and stop talking. I will come back later with a more meaningful implementation.

merge with stable
translate base85.c into rust code
move hgbase85 into independent module
add hgstorage crate
hg status implementation in rust

Ivzhh retitled this revision from translate base85.c into rust code to rust implementation of hg status.Mar 8 2018, 1:33 AM

Ivzhh edited the summary of this revision. (Show Details)

Hi all,

Based on the discussion a few weeks ago, I come up with a solution for review and discussion. After reading the Oxidation plan, the first thought is to bypass python engine and current plugin system IFF on request. If user (maybe background checker of IDE) request r-* subcommands, then hg gives rust implementations instead of python's. So I try to make hg r-status as fast as possible. The submitted version has comparable performance (as an example of the performance, not evidence, on my MacBook, in hg's own repo, hg r-status 150ms, and hg status 220ms). I am using CodeXL to profile the performance, and plan to use Future.rs to make the loading parallel and maybe 30ms faster.

The implementation originates from hg python implementation, because the python version is really fast. I tried to split into small changes, however, I eventually to combine all hgstorage module as one commit.

Thank you for your comments!

First of all - wow! Thanks for writing all this code. There's definitely a lot to work with. And work with it we will!

This is definitely too big to review as one commit. If you could do *any* work to split it up, it would be greatly appreciated. I'd focus on the pure Rust pieces first. Everything needed to open revlogs would be great!

You may find our custom Phabricator extensions (linked from https://www.mercurial-scm.org/wiki/Phabricator) useful for submitting series of commits to Phabricator.

Regarding the performance, that's pretty good! The dirstate code is some of the most optimized code in Mercurial. There are some gnarly Python C hacks to make it fast. Some of those tricks involve using special system calls to walk directories to minimize the number of system calls. I'm not sure if the crate you imported has those optimizations. (I wouldn't be surprised either way.) I wouldn't worry too much about performance at this juncture. But I suspect we could make the Rust code another 50% faster with some tweaking. It would also be interesting to test on a larger repo, say https://hg.mozilla.org/mozilla-unified. Also, I believe there are hooks in the dirstate code to use Watchman (fsmonitor). Those hooks are critical in order to achieve peak performance on large repositories.

Since you seem to be proficient at writing lots of Rust code, if you are looking for another project, may I suggest porting chg to Rust? That code is in contrib/chg. That might be the easiest component to actually ship in Rust since it is a standalone binary that doesn't link against Python. But we shouldn't get ahead of ourselves :)

Anyway, it is late for me and I need to detach from my computer. I'm sure others will have things to say as well...

rust/hgbase85/build.rs
1	I see this file was copied. There's nothing wrong with that. But does this mean we will need a custom build.rs for each Rust package doing Python? If that's the case, then I would prefer to isolate all our rust-cpython code to a single package, if possible. I'm guessing that could be challenging due to crossing create boundaries. I'm sure there are placed where we don't want to expose symbols outside the crate. I'm curious how others feel about this.
rust/hgcli/src/main.rs
233–261	This is definitely nifty and an impressive achievement \o/ The `r-` commands for testing pure Rust code paths are an interesting idea! I think I'm OK with including support for this in `hgcli`. But I think the code should live in a separate file so it doesn't pollute `main()`. And it should be behind a Cargo feature flag so we maintain compatibility with `hg` as much as possible by default. Also, Mercurial's command line parser is extremely wonky and has some questionable behavior. If the intent is to make `rhg` compatible with `hg`, we would need to preserve this horrible behavior. We'll likely have to write a custom argument parser because of how quirky Mercurial's argument parser is :(
rust/hgstorage/src/config.rs
95	I would not worry about supporting v0 or v2 at this time. v0 is only important for backwards compatibility with ancient repos. And v2 never got off the ground.
rust/hgstorage/src/revlog_v1.rs
279	IIRC, core Mercurial keeps an open file handle on revlogs and ensures we don't run out of file handles by not keeping too many revlogs open at the same time. For scanning operations, not having to open and close the file handles all the time will make a difference for performance. Also, core Mercurial loads the entirety of the `.i` file into memory. That's a scaling problem for large revlogs. But it does make performance of index lookups really fast.
290–293	A thread pool to help with zlib decompression should go a long way here. Probably too early to think about this, but we'll likely eventually want a global thread pool for doing I/O and CPU expensive tasks, such as reading chunks from a revlog and decompressing them. FWIW, we're going to radically alter the storage format in order to better support shallow clones. But that work hasn't started yet. I still think there is a benefit to implementing the revlog code in Rust though.

Doesn't mononoke have code to read revlogs already?

I will try to finish the review later, but it might be quicker if you incorporate some of the changes first since a lot of them are repeated many times. Overall it looks good, there are a couple of things that i would highlight to make the code easier to read.

Prefer more descriptive variable names.
If you can, avoid "pointer" arithmetic. Cursors and slices are nice and have great convenience methods.
Group your flow control and filtering more.
Try to keep your types straight. In rust using the right types helps catch errors. So be aware of char vs u8 vs String vs Vec<char> vs Vec<u8>.

On a higher level, all of these code appears to be treating file names as strings. This isn't really true and will disallow some valid file names. Maybe we should stick with bytes throughout. Of course this makes windows filepaths difficult because they are actually (utf16) strings.

rust/hgbase85/build.rs
1	If this is going to be reused I would move it into it's own crate. It seems like everything here could be boiled down to a single function call in main.
rust/hgbase85/src/base85.rs
21	I prefer something like this: https://play.rust-lang.org/?gist=5ca18a5b95335600e911b8f9310ea5c7&version=stable I doubt lazy_static is too slow. Otherwise we can stay like this until const functions get implemented. Either way note: I changed the type of B85CHARS to an array as opposed to an array ref. The loop condition is much nicer.
23	Would it be possible to separate the decode from the python objects. I'm thinking two helper functions. fn b85_required_len(text: &str) -> usize fn b85_encode(text: &str, pad: i32, out: &mut [u8]) -> Result<()>
23	`&str` can only hold valid utf8 data? Does it make more sense to have `&[u8]` here for a list of bytes?
23	IIUC pad is only ever checked `== 0`. Can it be made into a bool?
45	`ptext` isn't very descriptive.
46	Why the braces here?
47	I suspect this type annotation isn't required.
47	It might be best to use a `std::io::Cursor` and let it drack `dst_off` for your.
52	while !ptext.is_empty()
54	I would prefer the name `chunk` or even `accum` is a lot mode obvious to me than `acc`.
56	for i in &[24, 16, 8, 0]
58	I would just combine these into one line as the name `ch` isn't adding much.
63	Tracking len manually is a smell. Why not drop it and use `ptest.is_empty()`.
91	This is probably worth a comment that this is safe because D85DEC is required to be initialized before this function is called.
152	Let rust do the overflow checking. acc = acc.checked_mul(85) .ok_or_else(\|\| { PyErr::new::<exc::ValueError, _>( py, format!("bad base85 character at position {}", i)) })?;
rust/hgstorage/src/changelog.rs
24	Passing a message as a third argument is really useful.
31	If you aren't using the value I would prefer `truncate(NodeId::hex_len())`
33	Just put `msg: content` in the struct construction.
rust/hgstorage/src/config.rs
49	Is this used yet? It probably also needs some documentation because I don't really understand the fields (but I do have little domain knowledge).
78	If you are just going to convert to String I would recommend taking a String argument. Also prefer `.to_owned()` over `.to_string()`.
rust/hgstorage/src/dirstate.rs
5	I recommend not renaming this. It is confusing.
48	This could have a better name.
87	This should Probably return a `Result<Self>` and pass the error to the caller.
90	I would skip this check and rely on `p.metadata()`. Just switch `.unwrap()` to `.expect()` with a nicer message. This also handles race conditions more nicely.
108	Does this function need to be public? It seems internal to the constructor. If it doesn't need to be I would prefer it return the Map so that you don't have a partial-constructed DirState.
125	Is ignoring duplicate entries desired? It might be worth a comment explaining why.
130	Don't use `_` prefix for privates. Rely on rust viability. Also `is_bad` isn't very informative.
141	s/mtc/matcher/
146	let mut grey = Set::new(); grey.extend(self.dmap.keys().map(\|s\| s.as_path())); Also I would pick a name like `undiscovered_paths` or something. `grey` is cryptic.
152	I would prefer doing the filter before the loop and storing it in a variable.
155	This is probably worth a helper function.
161	Please explain why you are ignoring the error condition.
162	I would just call this `path` or `pathbuf`.
167	I would move this filter beside the filter in the loop.
169	I would also put this filter above. But more importantly all `_is_bad()` does is check for file types. So it seems like the former filter is redundant with this one.
170	You could do the following for a slight performance win and save a line. if let Occupied(entry) = self.dmap.entry(relpath) { ... }
175	Use an `else if`.
183	s/rem/path/ or remaining_path.
184	You can use the entry api here.
199	Please use a better name for `sent`.
206	In rust we generally avoid brackets around `as` as it is very tightly binding.
rust/hgstorage/src/lib.rs
54	You can add a later `.arg(dst)` to support non-utf8 paths instead of converting to a str here.
rust/hgstorage/src/local_repo.rs
50	I would replace the condition with. assert!(dot_hg_path.exists(), ".hg folder not found for the path given by -R argument: {:?}", p);
66	while !root.join(".hg").exists() { root = root.parent().expect(".hg folder not found"); }
121	s/fp/path/
127	assert!(abspath.exists(), "path not exists: {:?}", abspath);
129	`gd` is cryptic.

kevincox added inline comments.Mar 8 2018, 12:30 PM

rust/hgstorage/src/local_repo.rs
136	Why does it need to be mutable to clone?
155	This test has no assetions. Consider calling it `test_create_...` or something to indicate that you are just checking for panics.
rust/hgstorage/src/manifest.rs
33	What are these magic numbers?
42	s/ent/entry/
49	What are these numbers?
rust/hgstorage/src/matcher.rs
9	s/pat/glob/
10	Might be worth calling `String::with_capacity(pat.len())` since it will be at least that long.
14	Can you manage a `&[u8]` rather then pointer arithmetic for the whole string. It will make me feel better and will probably be easier to read.
108	If you are going to call `String.as_str()` just take a `&str`.
108	s/relglob/relative_glob_re/
111	If you are just doing one call just return the result.
111	You should be able to do `&string` rather then `string.as_str()` as it coerces.
131	Better name please.
143	s/ln/line/
159	Is this a warning or error? You might want to switch to `panic!`.
160	I would move this into the following match because it dedupes the `starts_with` check and puts the logic closer together.
195	s/rp/path/
200	I would do `self.inner.map(\|m\| m.is_match(rp)).unwrap_or(false)` but this is fine.
rust/hgstorage/src/mpatch.rs
13	Spell these out please.
24	struct Fragment { len: u32, offset: u32, }
35	Maybe it's just me but I think it is more common to put the source before the destination.
35	`pull` is very generic.
39	assert!(!src.is_empty())
40	If you are unwrapping the `pop` there is no need for the prior check.
40	s/f/fragment/
51	`mov` is overly shortened and generic.
51	It seems weird to take a cursor to a vec if you are just going to do an absolute seek. Can it work with `&mut [u8]`?
54	`vec![0; count]` works. (The arguments might be the other way around).
68	for &Fragment{frag_len, frag_ofs} in list.iter().rev()
86	Make this one line and don't bother renaming.
120	Please explain.
137	assert!(!frags.is_empty());
rust/hgstorage/src/path_encoding.rs
5	const HEX_DIGIT: [u8; 16] = *b"0123456789abcdef";
7	c should be a `u8`.
11	`Vec<char>` is odd. Is there any reason not to use a `String` or `Vec<u8>`
11	Don't pass a `char` by reference. Also it seems your function wants a `u8`.
22	I don't think you need this.
23	This isn't necessary.
25	p.ends_with(".i") \|\| p.ends_with(".d")
34	Take a `&str`
34	`encode_file_name`?
35	Use a String.
57	fn escape(out: &mut String, b: char) { unimplemented!() } pub fn encode_path(path: &str) -> String { let mut out = String::with_capacity(path.len()); for c in path.bytes() { let c = c as char; match c { 'A'...'Z' => { out.push('_'); out.push(c.to_ascii_lowercase()); } '\\' \| ':' \| '*' \| '?' \| '"' \| '<' \| '>' \| '\|' => { escape(&mut out, c); } // The rest of the printable range. ' '...'~' => { out.push(c); } _ => { escape(&mut out, c); } } } out } https://godbolt.org/g/3WCQs3
62	Take a `&str`.

This revision now requires changes to proceed.Mar 8 2018, 12:30 PM

In D2057#43892, @kevincox wrote:

On a higher level, all of these code appears to be treating file names as strings. This isn't really true and will disallow some valid file names. Maybe we should stick with bytes throughout. Of course this makes windows filepaths difficult because they are actually (utf16) strings.

Mercurial tries to be principled about always treating filenames as bytes. AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the plan of record there?

In D2057#43987, @durin42 wrote:

Mercurial tries to be principled about always treating filenames as bytes. AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the plan of record there?

Reading that page it seems to claim that filenames should be utf8, not bytes. If utf8, this is what the code does, but if it is bytes that definitely won't work.

In D2057#43988, @kevincox wrote:

In D2057#43987, @durin42 wrote:

Mercurial tries to be principled about always treating filenames as bytes. AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the plan of record there?

Reading that page it seems to claim that filenames should be utf8, not bytes. If utf8, this is what the code does, but if it is bytes that definitely won't work.

IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?

We may have to make adjustments to this plan on macOS with APFS, but I'm not sure about that yet.

In D2057#43989, @durin42 wrote:

In D2057#43988, @kevincox wrote:

In D2057#43987, @durin42 wrote:

Mercurial tries to be principled about always treating filenames as bytes. AIUI https://www.mercurial-scm.org/wiki/WindowsUTF8Plan is still the plan of record there?

Reading that page it seems to claim that filenames should be utf8, not bytes. If utf8, this is what the code does, but if it is bytes that definitely won't work.

IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?
We may have to make adjustments to this plan on macOS with APFS, but I'm not sure about that yet.

I think we want to express a path as a dedicated type which has different underlying storage depending on the platform (bytes on Linux, UTF-16 on Windows). All filesystem operations should take a Path instance to operate on. This is the only way to cleanly round trip filenames between the OS, the application, and back to the OS. That leaves us with the hard problem of normalizing Mercurial's storage representation of paths (bytes) with the operating system's. But this world is strictly better than today, where we lose path data from the OS because we use POSIX APIs.

FWIW, Python 3 rewrote the I/O layer to use Win32 APIs everywhere. Combined with the pathlib types, I'm pretty sure Python 3 can round trip paths on Windows. I also think Rust's path type(s) have OS-dependent functionality.

Rust has platform independent types PathBuf and &Path for paths and OsString and &OsStr for strings (owned and references respectively. They do have os-specific extensions but as long as you don't use them it should be cross platform. That being said, if you are serializing and deserializing them you may need to write some platform dependant code.

Hi everyone,

Thank you for your encouragements and comments! I will follow up with all comments and update the code soon.

@indygreg It is a great idea to test on Mozilla repo, actually I found several things interesting:

I found a bug in my code (shame on me): because I did not use byte literal, and I made a typo. This triggers problem in Mozilla unified repo
A regexp pattern in hgignore in Mozilla unified repo is not supported by rust's regex crate, a.k.a. "(?!)". I choose to ignore these unsupported patterns.
My version is slower in this repo: 70s (hg) and 90s (mine). CodeXL reveals that the mpatch::collect() function uses 63% of the running time. I think I need to optimize it somehow.

I totally agree with @kevincox that I did not sort well on char/u8/str/String/Path/PathBuf. The first bug is caused by this. I need to improve them.

Thank you everyone!

Reading that page it seems to claim that filenames should be utf8, not bytes. If utf8, this is what the code does, but if it is bytes that definitely won't work.

IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?

It's MBCS (i.e. ANSI multi-byte characters) on Windows. The plain was to support
both MBCS and UTF-8-variant on Windows, but that isn't a thing yet.

Perhaps we'll have to write a platform compatibility layer (or serialization/deserialization
layer) on top of the Rust's file API, something like vfs.py we have in Python code.

In D2057#44269, @yuja wrote:

Reading that page it seems to claim that filenames should be utf8, not bytes. If utf8, this is what the code does, but if it is bytes that definitely won't work.

IIRC it's bytes everyplace except Windows, where we pretend utf8 is real?

It's MBCS (i.e. ANSI multi-byte characters) on Windows. The plain was to support
both MBCS and UTF-8-variant on Windows, but that isn't a thing yet.
Perhaps we'll have to write a platform compatibility layer (or serialization/deserialization
layer) on top of the Rust's file API, something like vfs.py we have in Python code.

Thank you for confirming that, I am a bit confusing when I read Encoding Plan wiki page. I am looking at Mozilla's rust winapi bindings, let me see if I can directly wrap around winapi::um::fileapi::FindFirstFileA

I am looking at Mozilla's rust winapi bindings, let me see if I can directly wrap around winapi::um::fileapi::FindFirstFileA

That's probably a hard way. I was thinking of something converting
between OsStr (i.e. Path) and MBCS bytes by using Win32 API, instead
of calling out the "A" API.

https://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx

We don't do that in Python, but Rust's type system will help making it right.

https://crates.io/crates/local-encoding seems to be the right choice.

I'm not a windows expert but it seems like the rust OsStr, Path and filesystem APIs should handle these conversions for you. I think the only place where you would need to do os-specific code is when doing serialization and serialization which I think should be handled by https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStringExt.html and https://doc.rust-lang.org/std/os/windows/ffi/trait.OsStringExt.html.

I think the only place where you would need to do os-specific code is when
doing serialization and serialization

Yes, that will be feasible in strictly typed language like Rust.

which I think should be handled by https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStringExt.html
and https://doc.rust-lang.org/std/os/windows/ffi/trait.OsStringExt.html.

Not true for Windows because Rust uses Unicode (UTF-16-ish) API, whereas
Python 2 does ANSI. We need to convert a "wide" string to a locale-dependent string.

Maybe the local-encoding crate will do that for us?

I think the only place where you would need to do os-specific code is when
doing serialization and serialization

Yes, that will be feasible in strictly typed language like Rust.

To be clear, I meant serialization/deserialization between filesystem path and
internal dirstate/manifest path, not between dirstate storage and in-memory
dirstate object.

Ah, I forgot to consider the python interop. Now the need for that crate makes sense. Thanks for explaining.

add revlog and mpatch facilities
add changelog parsing
add manifest parsing
path encoding for data store
add dirstate and matcher facilities
add local repository and the supporting modules
use cargo fmt to format code
add hg r-status command
bincode 1.0.0 is a bit slow in my test
delay pattern matching during dir walk
optimize out trie and enable CoreXL profiling
use hashmap
remove thread pool
rust default read is not buf-ed, this is the key of slowness
change to globset
convert glob to regex
hg ignore patterns are all converted to regex (as hg does), and now it is faster
filter dir early to prevent walking
Update matcher mod after testing Mozilla unified repo
bug fix: use byte literals instead of numbers
hg store path encoding is per byte style, update code according to Kevin Cox's comments
update matcher testing according to Match interface change
If clap fails to recognize r-* subcommands, then run python-version hg
changelog coding style revised
remove legacy revlog v0 and unfinished v2.
partially revise the dirstate reviews
remove duplicated build.rs, let the executable module guarantee the python
use cursor in base85 encoding, reducing raw index-math
use cursor in base85 decoding, reducing raw index-math
dirstate update according to review comments
config update according to review comments
mpatch rename to more meaningful names
simplify matcher as when there is no syntax named in the beginning, use regexp
local repo coding style update
dirstate coding style update
manifest coding style update

Ivzhh marked 45 inline comments as done.Mar 21 2018, 2:55 PM

Ivzhh added inline comments.

rust/hgbase85/src/base85.rs
23	It should be any &[u8], but the current cpython crate doesn't wrap for &[u8]. I think I need to fork and add that part. I put it in my checklist now.
23	This crate is my previous try to integrate rust into hg. Right now I guess mine main pursue is to add hg r-* commands for rust. I will follow your suggestion when I am implementing the wire protocol and reuse the code for pure rust crate.
23	pad is a bool, however when I checked it in hg-python, int are passed to the function. I guess I need to update cpython wrapper for this, a more broad logic conversion.
46	I guess it is because NLL. When I started the work, rust compiler reported borrow check error on this part. I later read an article talking about NLL update in rust. But before that, I use the braces to avoid the error.
91	when I removed the unsafe, I got error: error[E0133]: use of mutable static requires unsafe function or block
rust/hgcli/src/main.rs
233–261	Thank you for the suggestion! I guess I need to extend clap later to support hg style command line. Right now whenever clap cannot handle the argument parsing, I will redirect the arguments to hg directly.
rust/hgstorage/src/changelog.rs
31	I guess I will use the rest info later. hg seems put some meta data in the commit comments. I will keep it for now. Thank you!
rust/hgstorage/src/config.rs
78	I like to_owned(), I will them in later occasions. Thank you!
95	Sure, I will use v1 only for now. In the beginning I kinda over designed this part.
rust/hgstorage/src/dirstate.rs
48	I remember the python hg uses the name, in the beginning, I tried to replicate py-hg's behaviour. But I think it needs to be renamed. I agree with you.
108	I think dir state needs to 1. read existing one; 2. create one if not exits; maybe private for now.
152	For the filter, I follow the example in the walkdir doc. I guess what I want is to skip the dir for later recursive visiting.
161	I add the error handling back
170	I kind of get borrow check compile error here. Later I use Occupied() when possible.
rust/hgstorage/src/local_repo.rs
136	I think LRU will update reference count (or timestamp) when the data is accessed.
rust/hgstorage/src/matcher.rs
14	I borrow this logic as whole from python code. It will need sometime to re-translate to non-pointer-arithmetic way.
rust/hgstorage/src/mpatch.rs
51	This part, including the stream-style, is from python part. I will update later with xi-rope.
rust/hgstorage/src/revlog_v1.rs
279	I think it explains why in mercurial repo, rust version is significantly faster. I am working on cpu future, but I did not finalize design style yet. I will keep working on that.
290–293	I guess I did this because I met some empty change delta in the beginning. I think I won't try to parallelize unzip for now.

In D2057#46726, @yuja wrote:

I think the only place where you would need to do os-specific code is when
doing serialization and serialization

Yes, that will be feasible in strictly typed language like Rust.

To be clear, I meant serialization/deserialization between filesystem path and
internal dirstate/manifest path, not between dirstate storage and in-memory
dirstate object.

I guess your suggestion is like this: @yuja

if it is windows and the code page is MBCS, try to decode the paths read from manifest and dirstate into unicode equivalent
use utf internally and with rust IO api
when writing back to dirstate and manifest, encode utf to MBCS

Please let me know if I have misunderstanding. Thank you!

In D2057#46980, @Ivzhh wrote:

In D2057#46726, @yuja wrote:

I think the only place where you would need to do os-specific code is when
doing serialization and serialization

Yes, that will be feasible in strictly typed language like Rust.

To be clear, I meant serialization/deserialization between filesystem path and
internal dirstate/manifest path, not between dirstate storage and in-memory
dirstate object.

I guess your suggestion is like this: @yuja

if it is windows and the code page is MBCS, try to decode the paths read from manifest and dirstate into unicode equivalent

use utf internally and with rust IO api

when writing back to dirstate and manifest, encode utf to MBCS

No. My suggestion is:

keep manifest/dirstate paths as bytes (which are probably wrapped by some type, say HgPath)
but we want to use Rust's standard library for I/O
so, add utility function/trait to convert HgPath to Path/PathBuf, where MBCS-Wide conversion will occur.

I think raw byte paths will be needed to build store paths (e.g. .hg/store/data/~2eclang-format.i).

https://www.mercurial-scm.org/repo/hg/file/4.5.2/mercurial/store.py

The latest changes are looking really good. I have a couple more comments but I didn't have time for a full review. I'll try to get more reviewed tomorrow. It seems that you still have a lot of stuff still in-flight so I'll try to slowly review the changes as I have time. If you want input/feedback on any particular part just ask and I will prioritize it.

This change is very large so it might be worth splitting off a smaller component and getting that submitted first. However I do realize that for starting out it is often helpful to get some actual use cases implemented before committing the base structures.

rust/hgbase85/src/base85.rs
23	I get that, but I still think it makes the code easier to read when the python-interop and the logic as separated where it is easy to do so.
91	I meant safe not as it it didn't need the unsafe keyword, but in that the use of the `unsafe` block is safe. It should really be called the `trust_me,_I_know_this_is_safe` block. But since you are not getting the compiler checking it is often useful to add a comment why the action you are performing is correct. In this case it is correct because the caller initializes this variable before the function is called.
106	If this computation only depends on `len` it would be nice to put it in a helper function.
rust/hgcli/src/main.rs
250	These are `HashSet`'s which don't have a defined iterator order. IIRC the python implementation sorts the results which is probably desirable.
rust/hgstorage/src/config.rs
51	A link to the mentioned wiki page would be very helpful to readers.
rust/hgstorage/src/dirstate.rs
104	Switch the return type to `std::io::Result` and then you can have let metadata = p.metadata()?; let mtime = metadata.modified()?; // ...
170	Sorry, I misunderstood the logic. You can do this: diff -r ccc683587fdb rust/hgstorage/src/dirstate.rs --- a/rust/hgstorage/src/dirstate.rs Sat Mar 24 10:05:53 2018 +0000 +++ b/rust/hgstorage/src/dirstate.rs Sat Mar 24 10:14:58 2018 +0000 @@ -184,8 +184,7 @@ continue; } - if self.dir_state_map.contains_key(rel_path) { - let dir_entry = &self.dir_state_map[rel_path]; + if let Some(dir_entry) = self.dir_state_map.get(rel_path) { files_not_in_walkdir.remove(rel_path); DirState::check_status(&mut res, abs_path, rel_path, dir_entry); } else if !matcher.check_path_ignored(rel_path.to_str().unwrap()) {
264	Does it make sense to make `DirStateEntry.mtime` be a `std::time::SystemTime` and convert upon reading the structure in? If not I would prefer doing the conversion here: else if mtd.modified().unwrap() == UNIX_EPOCH + Duration::from_secs(dir_entry.mtime as u64) { (Maybe extract the system time to higher up, or even a helper function on dir_entry)
rust/hgstorage/src/local_repo.rs
136	Actually I didn't realize that RwLock doesn't get a regular `get()` since it is doing a compile time borrow check. https://doc.rust-lang.org/std/sync/struct.RwLock.html#method.get_mut. My mistake, the code is fine.

There seems to have been no activities on this Diff for the past 3 Months.

By policy, we are automatically moving it out of the need-review state.

Please, move it back to need-review without hesitation if this diff should still be discussed.

:baymax:need-review-idle:

This revision now requires changes to proceed.Jan 24 2020, 12:33 PM

Revision Contents
Changeset List

		Path
M		rust/Cargo.lock (412 lines)
M		rust/Cargo.toml (8 lines)
A	M	rust/hgbase85/Cargo.toml (36 lines)
A	M	rust/hgbase85/build.rs (129 lines)
A	M	rust/hgbase85/src/base85.rs (387 lines)
A	M	rust/hgbase85/src/cpython_ext.rs (26 lines)
A	M	rust/hgbase85/src/lib.rs (104 lines)
M		rust/hgcli/Cargo.toml (2 lines)
M		rust/hgcli/build.rs (12 lines)
M		rust/hgcli/src/main.rs (49 lines)
A	M	rust/hgstorage/Cargo.toml (31 lines)
A	M	rust/hgstorage/src/changelog.rs (37 lines)
A	M	rust/hgstorage/src/config.rs (98 lines)
A	M	rust/hgstorage/src/dirstate.rs (229 lines)
A	M	rust/hgstorage/src/lib.rs (64 lines)
A	M	rust/hgstorage/src/local_repo.rs (177 lines)
A	M	rust/hgstorage/src/manifest.rs (119 lines)
A	M	rust/hgstorage/src/matcher.rs (248 lines)
A	M	rust/hgstorage/src/mpatch.rs (158 lines)
A	M	rust/hgstorage/src/path_encoding.rs (190 lines)
A	M	rust/hgstorage/src/repository.rs (5 lines)
A	M	rust/hgstorage/src/revlog.rs (82 lines)
A	M	rust/hgstorage/src/revlog_v1.rs (422 lines)
A	M	rust/hgstorage/src/working_context.rs (108 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	5238		Feb 5 2018, 5:31 PM	★	★
Diff 2	6724	- merge with stable	Mar 8 2018, 1:30 AM	★	★
Diff 3	7188	- add revlog and mpatch facilities	Mar 21 2018, 2:23 PM	★	★

Commit	Local	Parents	Author	Summary	Date
a1f330a046c7	36910	1553b0717b91	Sheng Mao	hg status implementation in rust (Show More…)	Mar 8 2018, 1:22 AM
1553b0717b91	36639	c10cd444c82c	Sheng Mao	add hgstorage crate	Mar 5 2018, 3:07 PM
c10cd444c82c	36638	c6b017545ece	Sheng Mao	move hgbase85 into independent module	Mar 5 2018, 2:59 PM
c6b017545ece	35952	34287c2e3fc4	Sheng Mao	translate base85.c into rust code (Show More…)	Feb 5 2018, 5:21 PM
34287c2e3fc4	35830	4fb2bb61597c 1d60ad093792	Augie Fackler	merge with stable	Feb 1 2018, 2:28 PM

Diff 6724

rust/Cargo.lock

	[[package]]			[[package]]
				name = "adler32"
				version = "1.0.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
	name = "aho-corasick"			name = "aho-corasick"
	version = "0.5.3"			version = "0.5.3"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"
	dependencies = [			dependencies = [
	"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",			"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
				name = "aho-corasick"
				version = "0.6.4"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "ansi_term"
				version = "0.10.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "atty"
				version = "0.2.6"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "bitflags"
				version = "1.0.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "build_const"
				version = "0.2.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "byteorder"
				version = "1.2.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "cc"
				version = "1.0.5"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "clap"
				version = "2.30.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"ansi_term 0.10.2 (registry+https://github.com/rust-lang/crates.io-index)",
				"atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
				"bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"textwrap 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
				"vec_map 0.8.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "cpython"			name = "cpython"
	version = "0.1.0"			version = "0.1.0"
	source = "git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52#c90d65cf84abfffce7ef54476bbfed56017a2f52"			source = "git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52#c90d65cf84abfffce7ef54476bbfed56017a2f52"
	dependencies = [			dependencies = [
	"libc 0.2.35 (registry+https://github.com/rust-lang/crates.io-index)",			"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
	"num-traits 0.1.41 (registry+https://github.com/rust-lang/crates.io-index)",			"num-traits 0.1.42 (registry+https://github.com/rust-lang/crates.io-index)",
	"python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",			"python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",
	]			]

	[[package]]			[[package]]
				name = "crc"
				version = "1.7.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"build_const 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "flate2"
				version = "1.0.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"miniz_oxide_c_api 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "fuchsia-zircon"
				version = "0.3.3"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "fuchsia-zircon-sys"
				version = "0.3.3"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "hex"
				version = "0.3.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
	name = "hgcli"			name = "hgcli"
	version = "0.1.0"			version = "0.1.0"
	dependencies = [			dependencies = [
				"clap 2.30.0 (registry+https://github.com/rust-lang/crates.io-index)",
	"cpython 0.1.0 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",			"cpython 0.1.0 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",
	"libc 0.2.35 (registry+https://github.com/rust-lang/crates.io-index)",			"hgstorage 0.1.0",
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
	"python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",			"python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",
	]			]

	[[package]]			[[package]]
				name = "hgext"
				version = "0.1.0"
				dependencies = [
				"cpython 0.1.0 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)",
				]

				[[package]]
				name = "hgstorage"
				version = "0.1.0"
				dependencies = [
				"byteorder 1.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"flate2 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"hex 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"lru-cache 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
				"sha1 0.6.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"tempdir 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
				"threadpool 1.7.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "kernel32-sys"			name = "kernel32-sys"
	version = "0.2.2"			version = "0.2.2"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"
	dependencies = [			dependencies = [
	"winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",			"winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
	"winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",			"winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
				name = "lazy_static"
				version = "1.0.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
	name = "libc"			name = "libc"
	version = "0.2.35"			version = "0.2.36"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"

	[[package]]			[[package]]
				name = "linked-hash-map"
				version = "0.4.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "lru-cache"
				version = "0.1.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"linked-hash-map 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "memchr"			name = "memchr"
	version = "0.1.11"			version = "0.1.11"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"
	dependencies = [			dependencies = [
	"libc 0.2.35 (registry+https://github.com/rust-lang/crates.io-index)",			"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "memchr"
				version = "2.0.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "miniz_oxide"
				version = "0.1.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"adler32 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "miniz_oxide_c_api"
				version = "0.1.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"cc 1.0.5 (registry+https://github.com/rust-lang/crates.io-index)",
				"crc 1.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"miniz_oxide 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
	name = "num-traits"			name = "num-traits"
	version = "0.1.41"			version = "0.1.42"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"

	[[package]]			[[package]]
				name = "num_cpus"
				version = "1.8.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "python27-sys"			name = "python27-sys"
	version = "0.1.2"			version = "0.1.2"
	source = "git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52#c90d65cf84abfffce7ef54476bbfed56017a2f52"			source = "git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52#c90d65cf84abfffce7ef54476bbfed56017a2f52"
	dependencies = [			dependencies = [
	"libc 0.2.35 (registry+https://github.com/rust-lang/crates.io-index)",			"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
	"regex 0.1.80 (registry+https://github.com/rust-lang/crates.io-index)",			"regex 0.1.80 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
				name = "rand"
				version = "0.4.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "redox_syscall"
				version = "0.1.37"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "redox_termios"
				version = "0.1.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "regex"			name = "regex"
	version = "0.1.80"			version = "0.1.80"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"
	dependencies = [			dependencies = [
	"aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",			"aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
	"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",			"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
	"regex-syntax 0.3.9 (registry+https://github.com/rust-lang/crates.io-index)",			"regex-syntax 0.3.9 (registry+https://github.com/rust-lang/crates.io-index)",
	"thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)",			"thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)",
	"utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",			"utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
				name = "regex"
				version = "0.2.6"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)",
				"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
				"regex-syntax 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
				"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
				"utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "regex-syntax"			name = "regex-syntax"
	version = "0.3.9"			version = "0.3.9"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"

	[[package]]			[[package]]
				name = "regex-syntax"
				version = "0.4.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "remove_dir_all"
				version = "0.3.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
				"winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "same-file"
				version = "1.0.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "sha1"
				version = "0.6.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "strsim"
				version = "0.7.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "tempdir"
				version = "0.3.6"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
				"remove_dir_all 0.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "termion"
				version = "1.5.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
				"redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)",
				"redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "textwrap"
				version = "0.9.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "thread-id"			name = "thread-id"
	version = "2.0.0"			version = "2.0.0"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"
	dependencies = [			dependencies = [
	"kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",			"kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
	"libc 0.2.35 (registry+https://github.com/rust-lang/crates.io-index)",			"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
	name = "thread_local"			name = "thread_local"
	version = "0.2.7"			version = "0.2.7"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"
	dependencies = [			dependencies = [
	"thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)",			"thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
	]			]

	[[package]]			[[package]]
				name = "thread_local"
				version = "0.3.5"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "threadpool"
				version = "1.7.1"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
				name = "unicode-width"
				version = "0.1.4"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "unreachable"
				version = "1.0.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"void 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "utf8-ranges"			name = "utf8-ranges"
	version = "0.1.3"			version = "0.1.3"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"

	[[package]]			[[package]]
				name = "utf8-ranges"
				version = "1.0.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "vec_map"
				version = "0.8.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "void"
				version = "1.0.2"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "walkdir"
				version = "2.1.4"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
				"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "winapi"			name = "winapi"
	version = "0.2.8"			version = "0.2.8"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"

	[[package]]			[[package]]
				name = "winapi"
				version = "0.3.4"
				source = "registry+https://github.com/rust-lang/crates.io-index"
				dependencies = [
				"winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
				"winapi-x86_64-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
				]

				[[package]]
	name = "winapi-build"			name = "winapi-build"
	version = "0.1.1"			version = "0.1.1"
	source = "registry+https://github.com/rust-lang/crates.io-index"			source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "winapi-i686-pc-windows-gnu"
				version = "0.4.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

				[[package]]
				name = "winapi-x86_64-pc-windows-gnu"
				version = "0.4.0"
				source = "registry+https://github.com/rust-lang/crates.io-index"

	[metadata]			[metadata]
				"checksum adler32 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6cbd0b9af8587c72beadc9f72d35b9fbb070982c9e6203e46e93f10df25f8f45"
	"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"			"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
				"checksum aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)" = "d6531d44de723825aa81398a6415283229725a00fa30713812ab9323faa82fc4"
				"checksum ansi_term 0.10.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6b3568b48b7cefa6b8ce125f9bb4989e52fbcc29ebea88df04cc7c5f12f70455"
				"checksum atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "8352656fd42c30a0c3c89d26dea01e3b77c0ab2af18230835c15e2e13cd51859"
				"checksum bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "b3c30d3802dfb7281680d6285f2ccdaa8c2d8fee41f93805dba5c4cf50dc23cf"
				"checksum build_const 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "e90dc84f5e62d2ebe7676b83c22d33b6db8bd27340fb6ffbff0a364efa0cb9c9"
				"checksum byteorder 1.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "652805b7e73fada9d85e9a6682a4abd490cb52d96aeecc12e33a0de34dfd0d23"
				"checksum cc 1.0.5 (registry+https://github.com/rust-lang/crates.io-index)" = "9be26b24e988625409b19736d130f0c7d224f01d06454b5f81d8d23d6c1a618f"
				"checksum clap 2.30.0 (registry+https://github.com/rust-lang/crates.io-index)" = "1c07b9257a00f3fc93b7f3c417fc15607ec7a56823bc2c37ec744e266387de5b"
	"checksum cpython 0.1.0 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)" = "<none>"			"checksum cpython 0.1.0 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)" = "<none>"
				"checksum crc 1.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bd5d02c0aac6bd68393ed69e00bbc2457f3e89075c6349db7189618dc4ddc1d7"
				"checksum flate2 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "9fac2277e84e5e858483756647a9d0aa8d9a2b7cba517fd84325a0aaa69a0909"
				"checksum fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "2e9763c69ebaae630ba35f74888db465e49e259ba1bc0eda7d06f4a067615d82"
				"checksum fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "3dcaa9ae7725d12cdb85b3ad99a434db70b468c09ded17e012d86b5c1010f7a7"
				"checksum hex 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "459d3cf58137bb02ad4adeef5036377ff59f066dbb82517b7192e3a5462a2abc"
	"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"			"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
	"checksum libc 0.2.35 (registry+https://github.com/rust-lang/crates.io-index)" = "96264e9b293e95d25bfcbbf8a88ffd1aedc85b754eba8b7d78012f638ba220eb"			"checksum lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c8f31047daa365f19be14b47c29df4f7c3b581832407daabe6ae77397619237d"
				"checksum libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)" = "1e5d97d6708edaa407429faa671b942dc0f2727222fb6b6539bf1db936e4b121"
				"checksum linked-hash-map 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7860ec297f7008ff7a1e3382d7f7e1dcd69efc94751a2284bafc3d013c2aa939"
				"checksum lru-cache 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "4d06ff7ff06f729ce5f4e227876cb88d10bc59cd4ae1e09fbb2bde15c850dc21"
	"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"			"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
	"checksum num-traits 0.1.41 (registry+https://github.com/rust-lang/crates.io-index)" = "cacfcab5eb48250ee7d0c7896b51a2c5eec99c1feea5f32025635f5ae4b00070"			"checksum memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "796fba70e76612589ed2ce7f45282f5af869e0fdd7cc6199fa1aa1f1d591ba9d"
				"checksum miniz_oxide 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "aaa2d3ad070f428fffbd7d3ca2ea20bb0d8cffe9024405c44e1840bc1418b398"
				"checksum miniz_oxide_c_api 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "92d98fdbd6145645828069b37ea92ca3de225e000d80702da25c20d3584b38a5"
				"checksum num-traits 0.1.42 (registry+https://github.com/rust-lang/crates.io-index)" = "9936036cc70fe4a8b2d338ab665900323290efb03983c86cbe235ae800ad8017"
				"checksum num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c51a3322e4bca9d212ad9a158a02abc6934d005490c054a2778df73a70aa0a30"
	"checksum python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)" = "<none>"			"checksum python27-sys 0.1.2 (git+https://github.com/indygreg/rust-cpython.git?rev=c90d65cf84abfffce7ef54476bbfed56017a2f52)" = "<none>"
				"checksum rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "eba5f8cb59cc50ed56be8880a5c7b496bfd9bd26394e176bc67884094145c2c5"
				"checksum redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)" = "0d92eecebad22b767915e4d529f89f28ee96dbbf5a4810d2b844373f136417fd"
				"checksum redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7e891cfe48e9100a70a3b6eb652fef28920c117d366339687bd5576160db0f76"
	"checksum regex 0.1.80 (registry+https://github.com/rust-lang/crates.io-index)" = "4fd4ace6a8cf7860714a2c2280d6c1f7e6a413486c13298bbc86fd3da019402f"			"checksum regex 0.1.80 (registry+https://github.com/rust-lang/crates.io-index)" = "4fd4ace6a8cf7860714a2c2280d6c1f7e6a413486c13298bbc86fd3da019402f"
				"checksum regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "5be5347bde0c48cfd8c3fdc0766cdfe9d8a755ef84d620d6794c778c91de8b2b"
	"checksum regex-syntax 0.3.9 (registry+https://github.com/rust-lang/crates.io-index)" = "f9ec002c35e86791825ed294b50008eea9ddfc8def4420124fbc6b08db834957"			"checksum regex-syntax 0.3.9 (registry+https://github.com/rust-lang/crates.io-index)" = "f9ec002c35e86791825ed294b50008eea9ddfc8def4420124fbc6b08db834957"
				"checksum regex-syntax 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "8e931c58b93d86f080c734bfd2bce7dd0079ae2331235818133c8be7f422e20e"
				"checksum remove_dir_all 0.3.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b5d2f806b0fcdabd98acd380dc8daef485e22bcb7cddc811d1337967f2528cf5"
				"checksum same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "cfb6eded0b06a0b512c8ddbcf04089138c9b4362c2f696f3c3d76039d68f3637"
				"checksum sha1 0.6.0 (registry+https://github.com/rust-lang/crates.io-index)" = "2579985fda508104f7587689507983eadd6a6e84dd35d6d115361f530916fa0d"
				"checksum strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bb4f380125926a99e52bc279241539c018323fab05ad6368b56f93d9369ff550"
				"checksum tempdir 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)" = "f73eebdb68c14bcb24aef74ea96079830e7fa7b31a6106e42ea7ee887c1e134e"
				"checksum termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "689a3bdfaab439fd92bc87df5c4c78417d3cbe537487274e9b0b2dce76e92096"
				"checksum textwrap 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c0b59b6b4b44d867f1370ef1bd91bfb262bf07bf0ae65c202ea2fbc16153b693"
	"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"			"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
	"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"			"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
				"checksum thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279ef31c19ededf577bfd12dfae728040a21f635b06a24cd670ff510edd38963"
				"checksum threadpool 1.7.1 (registry+https://github.com/rust-lang/crates.io-index)" = "e2f0c90a5f3459330ac8bc0d2f879c693bb7a2f59689c1083fc4ef83834da865"
				"checksum unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "bf3a113775714a22dcb774d8ea3655c53a32debae63a063acc00a91cc586245f"
				"checksum unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "382810877fe448991dfc7f0dd6e3ae5d58088fd0ea5e35189655f84e6814fa56"
	"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"			"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
				"checksum utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "662fab6525a98beff2921d7f61a39e7d59e0b425ebc7d0d9e66d316e55124122"
				"checksum vec_map 0.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "887b5b631c2ad01628bbbaa7dd4c869f80d3186688f8d0b6f58774fbe324988c"
				"checksum void 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d"
				"checksum walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "63636bd0eb3d00ccb8b9036381b526efac53caf112b7783b730ab3f8e44da369"
	"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"			"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
				"checksum winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)" = "04e3bd221fcbe8a271359c04f21a76db7d0c6028862d1bb5512d85e1e2eb5bb3"
	"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"			"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"
				"checksum winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
				"checksum winapi-x86_64-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"

rust/Cargo.toml

	[workspace]			[workspace]
	members = ["hgcli"]			members = ["hgcli", "hgbase85", "hgstorage"]

				[profile.release]
				debug = true

				[profile.debug]
				debug = true

rust/hgbase85/Cargo.toml

This file was added.

				[package]
				name = "hgext"
				version = "0.1.0"
				authors = ["Sheng Mao <shngmao@gmail.com>"]
				license = "GPL-2.0"

				build = "build.rs"

				[lib]
				name = "hgbase85"
				crate-type = ["cdylib", "rlib"]

				[features]
				# localdev: detect Python in PATH and use files from source checkout.
				default = ["localdev"]
				localdev = []

				[dependencies]
				libc = "0.2.34"

				# We currently use a custom build of cpython and python27-sys with the
				# following changes:
				# * GILGuard call of prepare_freethreaded_python() is removed.
				# TODO switch to official release when our changes are incorporated.
				[dependencies.cpython]
				version = "0.1"
				default-features = false
				features = ["python27-sys"]
				git = "https://github.com/indygreg/rust-cpython.git"
				rev = "c90d65cf84abfffce7ef54476bbfed56017a2f52"

				[dependencies.python27-sys]
				version = "0.1.2"
				git = "https://github.com/indygreg/rust-cpython.git"
				rev = "c90d65cf84abfffce7ef54476bbfed56017a2f52"

rust/hgbase85/build.rs

This file was added.

				// build.rs -- Configure build environment for `hgcli` Rust package.
				indygregUnsubmitted Not Done I see this file was copied. There's nothing wrong with that. But does this mean we will need a custom build.rs for each Rust package doing Python? If that's the case, then I would prefer to isolate all our rust-cpython code to a single package, if possible. I'm guessing that could be challenging due to crossing create boundaries. I'm sure there are placed where we don't want to expose symbols outside the crate. I'm curious how others feel about this. indygreg: I see this file was copied. There's nothing wrong with that. But does this mean we will need a…
				kevincoxUnsubmitted Not Done If this is going to be reused I would move it into it's own crate. It seems like everything here could be boiled down to a single function call in main. kevincox: If this is going to be reused I would move it into it's own crate. It seems like everything…
				//
				// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
				//
				// This software may be used and distributed according to the terms of the
				// GNU General Public License version 2 or any later version.

				use std::collections::HashMap;
				use std::env;
				use std::path::Path;
				use std::process::Command;

				struct PythonConfig {
				python: String,
				config: HashMap<String, String>,
				}

				fn get_python_config() -> PythonConfig {
				// The python27-sys crate exports a Cargo variable defining the full
				// path to the interpreter being used.
				let python = env::var("DEP_PYTHON27_PYTHON_INTERPRETER")
				.expect("Missing DEP_PYTHON27_PYTHON_INTERPRETER; bad python27-sys crate?");

				println!("{}", python);
				if !Path::new(&python).exists() {
				panic!(
				"Python interpreter {} does not exist; this should never happen",
				python
				);
				}

				// This is a bit hacky but it gets the job done.
				let separator = "SEPARATOR STRING";

				let script = "import sysconfig; \
				c = sysconfig.get_config_vars(); \
				print('SEPARATOR STRING'.join('%s=%s' % i for i in c.items()))";

				let mut command = Command::new(&python);
				command.arg("-c").arg(script);

				let out = command.output().unwrap();

				if !out.status.success() {
				panic!(
				"python script failed: {}",
				String::from_utf8_lossy(&out.stderr)
				);
				}

				let stdout = String::from_utf8_lossy(&out.stdout);
				let mut m = HashMap::new();

				for entry in stdout.split(separator) {
				let mut parts = entry.splitn(2, "=");
				let key = parts.next().unwrap();
				let value = parts.next().unwrap();
				m.insert(String::from(key), String::from(value));
				}

				PythonConfig {
				python: python,
				config: m,
				}
				}

				#[cfg(not(target_os = "windows"))]
				fn have_shared(config: &PythonConfig) -> bool {
				match config.config.get("Py_ENABLE_SHARED") {
				Some(value) => value == "1",
				None => false,
				}
				}

				#[cfg(target_os = "windows")]
				fn have_shared(config: &PythonConfig) -> bool {
				use std::path::PathBuf;

				// python27.dll should exist next to python2.7.exe.
				let mut dll = PathBuf::from(&config.python);
				dll.pop();
				dll.push("python27.dll");

				return dll.exists();
				}

				const REQUIRED_CONFIG_FLAGS: [&str; 2] = ["Py_USING_UNICODE", "WITH_THREAD"];

				fn main() {
				let config = get_python_config();

				println!("Using Python: {}", config.python);
				println!("cargo:rustc-env=PYTHON_INTERPRETER={}", config.python);

				let prefix = config.config.get("prefix").unwrap();

				println!("Prefix: {}", prefix);

				// TODO Windows builds don't expose these config flags. Figure out another
				// way.
				#[cfg(not(target_os = "windows"))]
				for key in REQUIRED_CONFIG_FLAGS.iter() {
				let result = match config.config.get(*key) {
				Some(value) => value == "1",
				None => false,
				};

				if !result {
				panic!("Detected Python requires feature {}", key);
				}
				}

				println!("have shared {}", have_shared(&config));

				// We need a Python shared library.
				if !have_shared(&config) {
				panic!("Detected Python lacks a shared library, which is required");
				}

				let ucs4 = match config.config.get("Py_UNICODE_SIZE") {
				Some(value) => value == "4",
				None => false,
				};

				if !ucs4 {
				#[cfg(not(target_os = "windows"))]
				panic!("Detected Python doesn't support UCS-4 code points");
				}
				}

rust/hgbase85/src/base85.rs

This file was added.

				use cpython::{exc, PyBytes, PyErr, PyObject, PyResult, Py_ssize_t, Python, PythonObject};
				use cpython::_detail::ffi;

				use std;
				use std::{mem, sync};
				use super::cpython_ext;

				const B85CHARS: &[u8; 85] =
				b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{\|}~";
				static mut B85DEC: [u8; 256] = [0; 256];
				static B85DEC_START: sync::Once = sync::ONCE_INIT;

				fn b85prep() {
				B85DEC_START.call_once(\|\| {
				for i in 0..mem::size_of_val(B85CHARS) {
				unsafe {
				B85DEC[B85CHARS[i] as usize] = (i + 1) as u8;
				}
				}
				});
				}
				kevincoxUnsubmitted Not Done I prefer something like this: https://play.rust-lang.org/?gist=5ca18a5b95335600e911b8f9310ea5c7&version=stable I doubt lazy_static is too slow. Otherwise we can stay like this until const functions get implemented. Either way note: I changed the type of B85CHARS to an array as opposed to an array ref. The loop condition is much nicer. kevincox: I prefer something like this: https://play.rust-lang.org/?

				pub fn b85encode(py: Python, text: &str, pad: i32) -> PyResult<PyObject> {
				kevincoxUnsubmitted Not Done Would it be possible to separate the decode from the python objects. I'm thinking two helper functions. fn b85_required_len(text: &str) -> usize fn b85_encode(text: &str, pad: i32, out: &mut [u8]) -> Result<()> kevincox: Would it be possible to separate the decode from the python objects. I'm thinking two helper…
				IvzhhAuthorUnsubmitted Not Done This crate is my previous try to integrate rust into hg. Right now I guess mine main pursue is to add hg r-* commands for rust. I will follow your suggestion when I am implementing the wire protocol and reuse the code for pure rust crate. Ivzhh: This crate is my previous try to integrate rust into hg. Right now I guess mine main pursue is…
				kevincoxUnsubmitted Not Done I get that, but I still think it makes the code easier to read when the python-interop and the logic as separated where it is easy to do so. kevincox: I get that, but I still think it makes the code easier to read when the python-interop and the…
				kevincoxUnsubmitted Not Done `&str` can only hold valid utf8 data? Does it make more sense to have `&[u8]` here for a list of bytes? kevincox: `&str` can only hold valid utf8 data? Does it make more sense to have `&[u8]` here for a list…
				IvzhhAuthorUnsubmitted Not Done It should be any &[u8], but the current cpython crate doesn't wrap for &[u8]. I think I need to fork and add that part. I put it in my checklist now. Ivzhh: It should be any &[u8], but the current cpython crate doesn't wrap for &[u8]. I think I need to…
				kevincoxUnsubmitted Not Done IIUC pad is only ever checked `== 0`. Can it be made into a bool? kevincox: IIUC pad is only ever checked `== 0`. Can it be made into a bool?
				IvzhhAuthorUnsubmitted Not Done pad is a bool, however when I checked it in hg-python, int are passed to the function. I guess I need to update cpython wrapper for this, a more broad logic conversion. Ivzhh: pad is a bool, however when I checked it in hg-python, int are passed to the function. I guess…
				let text = text.as_bytes();
				let tlen: Py_ssize_t = { text.len() as Py_ssize_t };
				let olen: Py_ssize_t = if pad != 0 {
				((tlen + 3) / 4 * 5) - 3
				} else {
				let mut olen: Py_ssize_t = tlen % 4;
				if olen > 0 {
				olen += 1;
				}
				olen += tlen / 4 * 5;
				olen
				};

				let out: PyBytes = cpython_ext::pybytes_new_without_copying(py, olen + 3);

				let dst = unsafe {
				let buffer = ffi::PyBytes_AsString(out.as_object().as_ptr()) as *mut u8;
				let length = ffi::PyBytes_Size(out.as_object().as_ptr()) as usize;
				std::slice::from_raw_parts_mut(buffer, length)
				};

				let mut ptext = &text[..];
				kevincoxUnsubmitted Not Done `ptext` isn't very descriptive. kevincox: `ptext` isn't very descriptive.
				let mut len = { ptext.len() };
				kevincoxUnsubmitted Not Done Why the braces here? kevincox: Why the braces here?
				IvzhhAuthorUnsubmitted Not Done I guess it is because NLL. When I started the work, rust compiler reported borrow check error on this part. I later read an article talking about NLL update in rust. But before that, I use the braces to avoid the error. Ivzhh: I guess it is because NLL. When I started the work, rust compiler reported borrow check error…
				let mut dst_off: usize = 0;
				kevincoxUnsubmitted Not Done I suspect this type annotation isn't required. kevincox: I suspect this type annotation isn't required.
				kevincoxUnsubmitted Not Done It might be best to use a `std::io::Cursor` and let it drack `dst_off` for your. kevincox: It might be best to use a [[ https://doc.rust-lang.org/stable/std/io/struct.Cursor.html \| `std…

				loop {
				if len == 0 {
				break;
				}
				kevincoxUnsubmitted Not Done while !ptext.is_empty() kevincox: ```while !ptext.is_empty()```

				let mut acc: u32 = 0;
				kevincoxUnsubmitted Not Done I would prefer the name `chunk` or even `accum` is a lot mode obvious to me than `acc`. kevincox: I would prefer the name `chunk` or even `accum` is a lot mode obvious to me than `acc`.

				for i in [24, 16, 8, 0].iter() {
				kevincoxUnsubmitted Not Done for i in &[24, 16, 8, 0] kevincox: ``` for i in &[24, 16, 8, 0] ```
				let ch = ptext[0] as u32;
				acc \|= ch << i;
				kevincoxUnsubmitted Not Done I would just combine these into one line as the name `ch` isn't adding much. kevincox: I would just combine these into one line as the name `ch` isn't adding much.

				ptext = &ptext[1..];
				len -= 1;

				if len == 0 {
				kevincoxUnsubmitted Not Done Tracking len manually is a smell. Why not drop it and use `ptest.is_empty()`. kevincox: Tracking len manually is a smell. Why not drop it and use `ptest.is_empty()`.
				break;
				}
				}

				for i in [4, 3, 2, 1, 0].iter() {
				let val: usize = (acc % 85) as usize;
				acc /= 85;

				dst[*i + dst_off] = B85CHARS[val];
				}

				dst_off += 5;
				}

				if pad == 0 {
				unsafe {
				ffi::_PyString_Resize(
				&mut out.as_object().as_ptr() as mut mut ffi::PyObject,
				olen,
				);
				}
				}

				return Ok(out.into_object());
				}

				pub fn b85decode(py: Python, text: &str) -> PyResult<PyObject> {
				let b85dec = unsafe { B85DEC };
				kevincoxUnsubmitted Not Done This is probably worth a comment that this is safe because D85DEC is required to be initialized before this function is called. kevincox: This is probably worth a comment that this is safe because D85DEC is required to be initialized…
				IvzhhAuthorUnsubmitted Not Done when I removed the unsafe, I got error: error[E0133]: use of mutable static requires unsafe function or block Ivzhh: when I removed the unsafe, I got error: error[E0133]: use of mutable static requires unsafe…
				kevincoxUnsubmitted Not Done I meant safe not as it it didn't need the unsafe keyword, but in that the use of the `unsafe` block is safe. It should really be called the `trust_me,_I_know_this_is_safe` block. But since you are not getting the compiler checking it is often useful to add a comment why the action you are performing is correct. In this case it is correct because the caller initializes this variable before the function is called. kevincox: I meant safe not as it it didn't need the unsafe keyword, but in that the use of the `unsafe`…

				let text = text.as_bytes();
				let len = { text.len() };
				let mut ptext = &text[..];
				let i = len % 5;
				let olen_g: usize = len / 5 * 4 + {
				if i > 0 {
				i - 1
				} else {
				0
				}
				};

				let out: PyBytes = cpython_ext::pybytes_new_without_copying(py, olen_g as Py_ssize_t);

				kevincoxUnsubmitted Not Done If this computation only depends on `len` it would be nice to put it in a helper function. kevincox: If this computation only depends on `len` it would be nice to put it in a helper function.
				let dst = unsafe {
				let buffer = ffi::PyBytes_AsString(out.as_object().as_ptr()) as *mut u8;
				let length = ffi::PyBytes_Size(out.as_object().as_ptr()) as usize;
				std::slice::from_raw_parts_mut(buffer, length)
				};
				let mut dst_off = 0;

				let mut i = 0;
				while i < len {
				let mut acc: u32 = 0;
				let mut cap = len - i - 1;
				if cap > 4 {
				cap = 4
				}
				for _ in 0..cap {
				let c = b85dec[ptext[0] as usize] as i32 - 1;
				ptext = &ptext[1..];
				if c < 0 {
				return Err(PyErr::new::<exc::ValueError, _>(
				py,
				format!("bad base85 character at position {}", i),
				));
				}
				acc = acc * 85 + (c as u32);
				i += 1;
				}
				if i < len {
				i += 1;
				let c = b85dec[ptext[0] as usize] as i32 - 1;
				ptext = &ptext[1..];
				if c < 0 {
				return Err(PyErr::new::<exc::ValueError, _>(
				py,
				format!("bad base85 character at position {}", i),
				));
				}
				/* overflow detection: 0xffffffff == "\|NsC0",
				* "\|NsC" == 0x03030303 */
				if acc > 0x03030303 {
				return Err(PyErr::new::<exc::ValueError, _>(
				py,
				format!("bad base85 character at position {}", i),
				));
				}

				acc *= 85;
				kevincoxUnsubmitted Done Let rust do the overflow checking. acc = acc.checked_mul(85) .ok_or_else(\|\| { PyErr::new::<exc::ValueError, _>( py, format!("bad base85 character at position {}", i)) })?; kevincox: Let rust do the overflow checking. ``` acc = acc.checked_mul(85) .ok_or_else(\|\| {…

				if acc > (0xffffffff_u32 - (c as u32)) {
				return Err(PyErr::new::<exc::ValueError, _>(
				py,
				format!("bad base85 character at position {}", i),
				));
				}
				acc += c as u32;
				}

				let olen = olen_g - dst_off;

				cap = if olen < 4 { olen } else { 4 };

				for _ in 0..(4 - cap) {
				acc *= 85;
				}

				if (cap > 0) && (cap < 4) {
				acc += 0xffffff >> (cap - 1) * 8;
				}

				for j in 0..cap {
				acc = (acc << 8) \| (acc >> 24);
				dst[j + dst_off] = acc as u8;
				}

				dst_off += cap;
				}

				return Ok(out.into_object());
				}

				py_module_initializer!(base85, initbase85, PyInit_base85, \|py, m\| {
				b85prep();
				m.add(py, "__doc__", "base85 module")?;
				m.add(py, "b85encode", py_fn!(py, b85encode(text: &str, pad: i32)))?;
				m.add(py, "b85decode", py_fn!(py, b85decode(text: &str)))?;
				Ok(())
				});

				#[cfg(test)]
				mod test {
				use cpython::Python;

				#[test]
				fn test_encoder_abc_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let res: String = super::b85encode(py, "abc", 1).unwrap().extract(py).unwrap();
				assert_eq!(res, "VPazd");

				let base85 = py.import("base85").unwrap();
				let res: String = base85
				.call(py, "b85encode", ("abc", 1), None)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "VPazd");
				}

				#[test]
				fn test_encoder_chinese_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let res: String = super::b85encode(py, "这是一个测试的例子", 1)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kqa");

				let base85 = py.import("base85").unwrap();
				let res: String = base85
				.call(py, "b85encode", ("这是一个测试的例子", 1), None)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kqa");
				}

				#[test]
				fn test_encoder_abc_no_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let res: String = super::b85encode(py, "abc", 0).unwrap().extract(py).unwrap();
				assert_eq!(res, "VPaz");

				let base85 = py.import("base85").unwrap();
				let res: String = base85
				.call(py, "b85encode", ("abc", 0), None)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "VPaz");
				}

				#[test]
				fn test_encoder_chinese_no_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let res: String = super::b85encode(py, "这是一个测试的例子", 0)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kq");

				let base85 = py.import("base85").unwrap();
				let res: String = base85
				.call(py, "b85encode", ("这是一个测试的例子", 0), None)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kq");
				}

				#[test]
				fn test_decoder_abc_no_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let res: String = super::b85decode(py, "VPaz").unwrap().extract(py).unwrap();
				assert_eq!(res, "abc");

				let base85 = py.import("base85").unwrap();
				let res: String = base85
				.call(py, "b85decode", ("VPaz",), None)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "abc");
				}

				#[test]
				fn test_decoder_abc_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let mut res: String = super::b85decode(py, "VPazd").unwrap().extract(py).unwrap();
				let len = { res.len() };
				res.truncate(len - 1);
				assert_eq!(res, "abc");

				let base85 = py.import("base85").unwrap();
				let mut res: String = base85
				.call(py, "b85decode", ("VPazd",), None)
				.unwrap()
				.extract(py)
				.unwrap();
				let len = { res.len() };
				res.truncate(len - 1);
				assert_eq!(res, "abc");
				}

				#[test]
				fn test_decoder_chinese_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let mut res: String = super::b85decode(py, "=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kqa")
				.unwrap()
				.extract(py)
				.unwrap();
				let len = { res.len() };
				res.truncate(len - 1);
				assert_eq!(res, "这是一个测试的例子");

				let base85 = py.import("base85").unwrap();
				let mut res: String = base85
				.call(
				py,
				"b85decode",
				("=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kqa",),
				None,
				)
				.unwrap()
				.extract(py)
				.unwrap();
				let len = { res.len() };
				res.truncate(len - 1);
				assert_eq!(res, "这是一个测试的例子");
				}

				#[test]
				fn test_decoder_chinese_no_pad() -> () {
				::set_py_env();

				let gil = Python::acquire_gil();
				let py = gil.python();
				::init_all_hg_ext(py);

				let res: String = super::b85decode(py, "=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kq")
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "这是一个测试的例子");

				let base85 = py.import("base85").unwrap();
				let res: String = base85
				.call(
				py,
				"b85decode",
				("=)alfn6KoxfaJKU=CzCHua)PTgyg=9<*kq",),
				None,
				)
				.unwrap()
				.extract(py)
				.unwrap();
				assert_eq!(res, "这是一个测试的例子");
				}
				}

rust/hgbase85/src/cpython_ext.rs

This file was added.

				use cpython::{PyBytes, PyObject, Py_ssize_t, Python, PythonObjectWithCheckedDowncast};

				use python27_sys as ffi;

				use std;

				#[inline]
				pub unsafe fn cast_from_owned_ptr_or_panic<T>(py: Python, p: *mut ffi::PyObject) -> T
				where
				T: PythonObjectWithCheckedDowncast,
				{
				if p.is_null() {
				panic!("NULL pointer detected.")
				} else {
				PyObject::from_owned_ptr(py, p).cast_into(py).unwrap()
				}
				}

				pub fn pybytes_new_without_copying(py: Python, len: Py_ssize_t) -> PyBytes {
				unsafe {
				if len <= 0 {
				panic!("the request bytes length should be > 0.")
				}
				cast_from_owned_ptr_or_panic(py, ffi::PyBytes_FromStringAndSize(std::ptr::null(), len))
				}
				}

rust/hgbase85/src/lib.rs

This file was added.

				#[macro_use]
				extern crate cpython;
				extern crate libc;
				extern crate python27_sys;

				use python27_sys as ffi;

				pub mod base85;
				pub mod cpython_ext;

				use std::{env, sync};
				use std::path::PathBuf;
				use std::ffi::{CString, OsStr};

				#[cfg(target_family = "unix")]
				use std::os::unix::ffi::OsStrExt;

				static HG_EXT_REG: sync::Once = sync::ONCE_INIT;

				#[no_mangle]
				pub fn init_all_hg_ext(_py: cpython::Python) {
				HG_EXT_REG.call_once(\|\| unsafe {
				base85::initbase85();
				});
				}

				#[derive(Debug)]
				pub struct Environment {
				_exe: PathBuf,
				python_exe: PathBuf,
				python_home: PathBuf,
				mercurial_modules: PathBuf,
				}

				// On UNIX, platform string is just bytes and should not contain NUL.
				#[cfg(target_family = "unix")]
				fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
				CString::new(s.as_ref().as_bytes()).unwrap()
				}

				#[cfg(target_family = "windows")]
				fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
				CString::new(s.as_ref().to_str().unwrap()).unwrap()
				}

				fn set_python_home(env: &Environment) {
				let raw = cstring_from_os(&env.python_home).into_raw();
				unsafe {
				ffi::Py_SetPythonHome(raw);
				}
				}

				static PYTHON_ENV_START: sync::Once = sync::ONCE_INIT;

				/// the second half initialization code are copied from rust-cpython
				/// fn pythonrun::prepare_freethreaded_python()
				/// because this function is called mainly by `cargo test`
				/// and the multi-thread nature requires to properly
				/// set up threads and GIL. In the corresponding version,
				/// prepare_freethreaded_python() is turned off, so the cargo
				/// test features must be properly called.
				pub fn set_py_env() {
				PYTHON_ENV_START.call_once(\|\| {
				let env = {
				let exe = env::current_exe().unwrap();

				let mercurial_modules = std::env::var("HGROOT").expect(
				"must set mercurial's root folder (one layer above mercurial folder itself",
				);

				let python_exe = std::env::var("HGRUST_PYTHONEXE")
				.expect("set PYTHONEXE to the full path of the python.exe file");

				let python_home = std::env::var("HGRUST_PYTHONHOME").expect(
				"if you don't want to use system one, set PYTHONHOME according to python doc",
				);

				Environment {
				_exe: exe.clone(),
				python_exe: PathBuf::from(python_exe),
				python_home: PathBuf::from(python_home),
				mercurial_modules: PathBuf::from(mercurial_modules),
				}
				};

				set_python_home(&env);

				let program_name = cstring_from_os(&env.python_exe).as_ptr();
				unsafe {
				ffi::Py_SetProgramName(program_name as *mut i8);
				}

				unsafe {
				if ffi::Py_IsInitialized() != 0 {
				assert!(ffi::PyEval_ThreadsInitialized() != 0);
				} else {
				assert!(ffi::PyEval_ThreadsInitialized() == 0);
				ffi::Py_InitializeEx(0);
				ffi::PyEval_InitThreads();
				let _thread_state = ffi::PyEval_SaveThread();
				}
				}
				});
				}

rust/hgcli/Cargo.toml


	[features]			[features]
	# localdev: detect Python in PATH and use files from source checkout.			# localdev: detect Python in PATH and use files from source checkout.
	default = ["localdev"]			default = ["localdev"]
	localdev = []			localdev = []

	[dependencies]			[dependencies]
	libc = "0.2.34"			libc = "0.2.34"
				hgstorage = { path = "../hgstorage" }
				clap = ""

	# We currently use a custom build of cpython and python27-sys with the			# We currently use a custom build of cpython and python27-sys with the
	# following changes:			# following changes:
	# * GILGuard call of prepare_freethreaded_python() is removed.			# * GILGuard call of prepare_freethreaded_python() is removed.
	# TODO switch to official release when our changes are incorporated.			# TODO switch to official release when our changes are incorporated.
	[dependencies.cpython]			[dependencies.cpython]
	version = "0.1"			version = "0.1"
	default-features = false			default-features = false
	features = ["python27-sys"]			features = ["python27-sys"]
	git = "https://github.com/indygreg/rust-cpython.git"			git = "https://github.com/indygreg/rust-cpython.git"
	rev = "c90d65cf84abfffce7ef54476bbfed56017a2f52"			rev = "c90d65cf84abfffce7ef54476bbfed56017a2f52"

	[dependencies.python27-sys]			[dependencies.python27-sys]
	version = "0.1.2"			version = "0.1.2"
	git = "https://github.com/indygreg/rust-cpython.git"			git = "https://github.com/indygreg/rust-cpython.git"
	rev = "c90d65cf84abfffce7ef54476bbfed56017a2f52"			rev = "c90d65cf84abfffce7ef54476bbfed56017a2f52"

rust/hgcli/build.rs

	struct PythonConfig {			struct PythonConfig {
	python: String,			python: String,
	config: HashMap<String, String>,			config: HashMap<String, String>,
	}			}

	fn get_python_config() -> PythonConfig {			fn get_python_config() -> PythonConfig {
	// The python27-sys crate exports a Cargo variable defining the full			// The python27-sys crate exports a Cargo variable defining the full
	// path to the interpreter being used.			// path to the interpreter being used.
	let python = env::var("DEP_PYTHON27_PYTHON_INTERPRETER").expect(			let python = env::var("DEP_PYTHON27_PYTHON_INTERPRETER")
	"Missing DEP_PYTHON27_PYTHON_INTERPRETER; bad python27-sys crate?",			.expect("Missing DEP_PYTHON27_PYTHON_INTERPRETER; bad python27-sys crate?");
	);

				println!("{}", python);
	if !Path::new(&python).exists() {			if !Path::new(&python).exists() {
	panic!(			panic!(
	"Python interpreter {} does not exist; this should never happen",			"Python interpreter {} does not exist; this should never happen",
	python			python
	);			);
	}			}

	// This is a bit hacky but it gets the job done.			// This is a bit hacky but it gets the job done.
	let separator = "SEPARATOR STRING";			let separator = "SEPARATOR STRING";

	let script = "import sysconfig; \			let script = "import sysconfig; \
	c = sysconfig.get_config_vars(); \			c = sysconfig.get_config_vars(); \
	print('SEPARATOR STRING'.join('%s=%s' % i for i in c.items()))";			print('SEPARATOR STRING'.join('%s=%s' % i for i in c.items()))";

	let mut command = Command::new(&python);			let mut command = Command::new(&python);
	command.arg("-c").arg(script);			command.arg("-c").arg(script);

	let out = command.output().unwrap();			let out = command.output().unwrap();

	if !out.status.success() {			if !out.status.success() {
	panic!(			panic!(
	None => false,			None => false,
	};			};

	if !result {			if !result {
	panic!("Detected Python requires feature {}", key);			panic!("Detected Python requires feature {}", key);
	}			}
	}			}

				println!("have shared {}", have_shared(&config));

	// We need a Python shared library.			// We need a Python shared library.
	if !have_shared(&config) {			if !have_shared(&config) {
	panic!("Detected Python lacks a shared library, which is required");			panic!("Detected Python lacks a shared library, which is required");
	}			}

	let ucs4 = match config.config.get("Py_UNICODE_SIZE") {			let ucs4 = match config.config.get("Py_UNICODE_SIZE") {
	Some(value) => value == "4",			Some(value) => value == "4",
	None => false,			None => false,
	};			};

	if !ucs4 {			if !ucs4 {
	#[cfg(not(target_os = "windows"))]			#[cfg(not(target_os = "windows"))]
	panic!("Detected Python doesn't support UCS-4 code points");			panic!("Detected Python doesn't support UCS-4 code points");
	}			}
	}			}

rust/hgcli/src/main.rs

	// main.rs -- Main routines for `hg` program			// main.rs -- Main routines for `hg` program
	//			//
	// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>			// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
	//			//
	// This software may be used and distributed according to the terms of the			// This software may be used and distributed according to the terms of the
	// GNU General Public License version 2 or any later version.			// GNU General Public License version 2 or any later version.

	extern crate libc;			extern crate clap;
	extern crate cpython;			extern crate cpython;
				extern crate libc;
	extern crate python27_sys;			extern crate python27_sys;

				extern crate hgstorage;

	use cpython::{NoArgs, ObjectProtocol, PyModule, PyResult, Python};			use cpython::{NoArgs, ObjectProtocol, PyModule, PyResult, Python};
	use libc::{c_char, c_int};			use libc::{c_char, c_int};

	use std::env;			use std::env;
	use std::path::PathBuf;			use std::path::PathBuf;
	use std::ffi::{CString, OsStr};			use std::ffi::{CString, OsStr};
	#[cfg(target_family = "unix")]			#[cfg(target_family = "unix")]
	use std::os::unix::ffi::{OsStrExt, OsStringExt};			use std::os::unix::ffi::{OsStrExt, OsStringExt};

				use hgstorage::local_repo;
				use hgstorage::repository::Repository;

	#[derive(Debug)]			#[derive(Debug)]
	struct Environment {			struct Environment {
	_exe: PathBuf,			_exe: PathBuf,
	python_exe: PathBuf,			python_exe: PathBuf,
	python_home: PathBuf,			python_home: PathBuf,
	mercurial_modules: PathBuf,			mercurial_modules: PathBuf,
	}			}


	let dispatch_mod = py.import("mercurial.dispatch")?;			let dispatch_mod = py.import("mercurial.dispatch")?;
	dispatch_mod.call(py, "run", NoArgs, None)?;			dispatch_mod.call(py, "run", NoArgs, None)?;

	Ok(())			Ok(())
	}			}

	fn main() {			fn main() {
				let matches = clap::App::new("hg rust oxidation")
				.arg(
				clap::Arg::with_name("repository")
				.short("c")
				.long("repository")
				.value_name("dash_r"),
				)
				.subcommand(clap::SubCommand::with_name("r-status"))
				.get_matches();

				if let Some(_r_matches) = matches.subcommand_matches("r-status") {
				let dash_r = match matches.value_of("dash_r") {
				Some(dash_r) => Some(PathBuf::from(dash_r)),
				None => None,
				};
				let repo = local_repo::LocalRepo::new(dash_r);
				let res = repo.status();
				for f in res.modified.iter() {
				kevincoxUnsubmitted Not Done These are `HashSet`'s which don't have a defined iterator order. IIRC the python implementation sorts the results which is probably desirable. kevincox: These are `HashSet`'s which don't have a defined iterator order. IIRC the python implementation…
				println!("M {}", f.to_str().unwrap());
				}
				for f in res.added.iter() {
				println!("A {}", f.to_str().unwrap());
				}
				for f in res.removed.iter() {
				println!("R {}", f.to_str().unwrap());
				}
				for f in res.unknown.iter() {
				println!("? {}", f.to_str().unwrap());
				}
				indygregUnsubmitted Not Done This is definitely nifty and an impressive achievement \o/ The `r-` commands for testing pure Rust code paths are an interesting idea! I think I'm OK with including support for this in `hgcli`. But I think the code should live in a separate file so it doesn't pollute `main()`. And it should be behind a Cargo feature flag so we maintain compatibility with `hg` as much as possible by default. Also, Mercurial's command line parser is extremely wonky and has some questionable behavior. If the intent is to make `rhg` compatible with `hg`, we would need to preserve this horrible behavior. We'll likely have to write a custom argument parser because of how quirky Mercurial's argument parser is :( indygreg: This is definitely nifty and an impressive achievement \o/ The `r-` commands for testing pure…
				IvzhhAuthorUnsubmitted Not Done Thank you for the suggestion! I guess I need to extend clap later to support hg style command line. Right now whenever clap cannot handle the argument parsing, I will redirect the arguments to hg directly. Ivzhh: Thank you for the suggestion! I guess I need to extend clap later to support hg style command…
				} else {
	let exit_code = match run() {			let exit_code = match run() {
	Err(err) => err,			Err(err) => err,
	Ok(()) => 0,			Ok(()) => 0,
	};			};

	std::process::exit(exit_code);			std::process::exit(exit_code);
	}			}
				}

rust/hgstorage/Cargo.toml

This file was added.

				[package]
				name = "hgstorage"
				version = "0.1.0"
				authors = ["Sheng Mao <shngmao@gmail.com>"]
				license = "GPL-2.0"

				#build = "build.rs"

				[lib]
				name = "hgstorage"
				crate-type = ["cdylib", "rlib"]

				[features]
				# localdev: detect Python in PATH and use files from source checkout.
				default = ["localdev"]
				localdev = []

				[dependencies]
				libc = "0.2.34"
				byteorder = "1.0"
				walkdir = "2"
				tempdir = "0.3.6"
				regex = "0.2.6"
				threadpool = "1.7.1"
				num_cpus = "1.0"
				lazy_static = "1.0.0"
				lru-cache = "0.1.1"
				flate2 = { version = "1.0", features = ["rust_backend"], default-features = false }
				sha1 = "0.6.0"
				hex = "0.3.1"

rust/hgstorage/src/changelog.rs

This file was added.

				use std::sync::{Arc, RwLock};

				use hex;

				use revlog::{NodeId, Revlog};

				#[derive(Clone)]
				pub struct CommitInfo {
				pub manifest_id: NodeId,
				pub msg: Vec<u8>,
				}

				#[derive(Clone)]
				pub struct ChangeLog {
				pub inner: Arc<RwLock<Revlog>>,
				}

				impl ChangeLog {
				pub fn get_commit_info(&self, id: &NodeId) -> CommitInfo {
				let rev = self.inner.read().unwrap().node_id_to_rev(id).unwrap();

				let mut content = self.inner.read().unwrap().revision(&rev).unwrap();

				assert_eq!(content[NodeId::hex_len()], '\n' as u8);
				kevincoxUnsubmitted Done Passing a message as a third argument is really useful. kevincox: Passing a message as a third argument is really useful.

				let manifest_id = {
				let hex_id = &content[..NodeId::hex_len()];
				NodeId::new_from_bytes(&hex::decode(hex_id).unwrap())
				};

				content.drain(..NodeId::hex_len());
				kevincoxUnsubmitted Done If you aren't using the value I would prefer `truncate(NodeId::hex_len())` kevincox: If you aren't using the value I would prefer `truncate(NodeId::hex_len())`
				IvzhhAuthorUnsubmitted Not Done I guess I will use the rest info later. hg seems put some meta data in the commit comments. I will keep it for now. Thank you! Ivzhh: I guess I will use the rest info later. hg seems put some meta data in the commit comments. I…

				let msg = content;
				kevincoxUnsubmitted Done Just put `msg: content` in the struct construction. kevincox: Just put `msg: content` in the struct construction.

				CommitInfo { manifest_id, msg }
				}
				}

rust/hgstorage/src/config.rs

This file was added.

				use std::default::Default;
				use std::collections::{HashMap, HashSet};
				use std::io::{BufRead, BufReader};
				use std::fs::File;
				use std::path::Path;

				pub enum RevlogFormat {
				V0,
				V1,
				V2,
				}

				impl Default for RevlogFormat {
				fn default() -> Self {
				RevlogFormat::V1
				}
				}

				pub enum Compressor {
				Zlib,
				Zstd,
				Gzip,
				None,
				}

				impl Default for Compressor {
				fn default() -> Self {
				Compressor::Zlib
				}
				}

				pub enum DeltaPolicy {
				ParentDelta,
				GeneralDelta,
				}

				impl Default for DeltaPolicy {
				fn default() -> Self {
				DeltaPolicy::GeneralDelta
				}
				}

				#[derive(Default)]
				pub struct Configuration {
				pub requires: HashSet<String>,
				pub reg_conf: HashMap<String, RegFn>,
				pub revlog_format: RevlogFormat,
				pub delta: DeltaPolicy,
				}
				kevincoxUnsubmitted Not Done Is this used yet? It probably also needs some documentation because I don't really understand the fields (but I do have little domain knowledge). kevincox: Is this used yet? It probably also needs some documentation because I don't really understand…

				pub type RegFn = fn(&mut Configuration) -> ();
				kevincoxUnsubmitted Not Done A link to the mentioned wiki page would be very helpful to readers. kevincox: A link to the mentioned wiki page would be very helpful to readers.

				impl Configuration {
				pub fn new(path: &Path) -> Self {
				let mut s: Configuration = Default::default();

				s.register_all();

				if path.exists() {
				let f = File::open(path).unwrap();

				let buffer = BufReader::new(&f);

				for line in buffer.lines() {
				let key = line.unwrap();

				if s.reg_conf.contains_key(&key) {
				s.reg_conf[&key](&mut s);
				}

				s.requires.insert(key);
				}
				}

				s
				}

				pub fn register_conf(&mut self, key: &str, func: RegFn) {
				kevincoxUnsubmitted Done If you are just going to convert to String I would recommend taking a String argument. Also prefer `.to_owned()` over `.to_string()`. kevincox: If you are just going to convert to String I would recommend taking a String argument. Also…
				IvzhhAuthorUnsubmitted Not Done I like to_owned(), I will them in later occasions. Thank you! Ivzhh: I like to_owned(), I will them in later occasions. Thank you!
				self.reg_conf.insert(key.to_string(), func);
				}

				fn register_all(&mut self) {
				self.register_conf("revlogv1", \|conf\| {
				conf.revlog_format = RevlogFormat::V1;
				});
				self.register_conf("generaldelta", \|conf\| {
				conf.delta = DeltaPolicy::GeneralDelta;
				});
				}

				pub fn get_revlog_format(&self) -> RevlogFormat {
				if self.requires.contains("revlogv1") {
				RevlogFormat::V1
				} else {
				RevlogFormat::V0
				indygregUnsubmitted Not Done I would not worry about supporting v0 or v2 at this time. v0 is only important for backwards compatibility with ancient repos. And v2 never got off the ground. indygreg: I would not worry about supporting v0 or v2 at this time. v0 is only important for backwards…
				IvzhhAuthorUnsubmitted Not Done Sure, I will use v1 only for now. In the beginning I kinda over designed this part. Ivzhh: Sure, I will use v1 only for now. In the beginning I kinda over designed this part.
				}
				}
				}

rust/hgstorage/src/dirstate.rs

This file was added.

				use std::str;
				use std::path::{Path, PathBuf};
				use std::io::{Read, Result};
				use std::fs::File;
				use std::collections::HashMap as Map;
				kevincoxUnsubmitted Done I recommend not renaming this. It is confusing. kevincox: I recommend not renaming this. It is confusing.
				#[cfg(target_family = "unix")]
				use std::os::unix::fs::FileTypeExt;
				use std::collections::HashSet as Set;
				use std::time::{SystemTime, UNIX_EPOCH};

				use byteorder::{BigEndian, ReadBytesExt};
				use walkdir::{DirEntry, WalkDir};

				use matcher;
				use revlog::*;

				#[derive(Debug, Default)]
				pub struct DirStateEntry {
				status: u8,
				mode: u32,
				/// size of file
				size: u32,
				mtime: u32,
				/// length of file name
				length: u32,
				}

				pub fn read_dirstate_entry<R: Read + ?Sized + ReadBytesExt>(rdr: &mut R) -> Result<DirStateEntry> {
				let status = rdr.read_u8()?;
				let mode = rdr.read_u32::<BigEndian>()?;
				let size = rdr.read_u32::<BigEndian>()?;
				let mtime = rdr.read_u32::<BigEndian>()?;
				let length = rdr.read_u32::<BigEndian>()?;

				Ok(DirStateEntry {
				status,
				mode,
				size,
				mtime,
				length,
				})
				}

				pub struct DirState {
				pub p1: NodeId,
				pub p2: NodeId,
				pub path: PathBuf,
				pub dmap: Map<PathBuf, DirStateEntry>,
				kevincoxUnsubmitted Done This could have a better name. kevincox: This could have a better name.
				IvzhhAuthorUnsubmitted Not Done I remember the python hg uses the name, in the beginning, I tried to replicate py-hg's behaviour. But I think it needs to be renamed. I agree with you. Ivzhh: I remember the python hg uses the name, in the beginning, I tried to replicate py-hg's…

				pub mtime: SystemTime,
				}

				#[derive(Debug)]
				pub struct CurrentState {
				/// per status-call
				pub added: Set<PathBuf>,
				/// a.k.a forget
				pub removed: Set<PathBuf>,
				/// a.k.a missing
				pub deleted: Set<PathBuf>,
				pub modified: Set<PathBuf>,
				/// the worker thread handling ignored will first add all sub files of the ignored dir/file
				/// to the ignored set, later, when checking the remaining paths, if the path is in ignored set,
				/// then remove them from ignored set
				pub ignored: Set<PathBuf>,
				pub unknown: Set<PathBuf>,
				pub clean: Set<PathBuf>,
				pub lookup: Set<PathBuf>,
				}

				impl CurrentState {
				pub fn new() -> Self {
				Self {
				added: Set::new(),
				removed: Set::new(),
				deleted: Set::new(),
				modified: Set::new(),
				ignored: Set::new(),
				unknown: Set::new(),
				clean: Set::new(),
				lookup: Set::new(),
				}
				}
				}

				impl DirState {
				pub fn new(p: PathBuf) -> Self {
				kevincoxUnsubmitted Not Done This should Probably return a `Result<Self>` and pass the error to the caller. kevincox: This should Probably return a `Result<Self>` and pass the error to the caller.
				if !p.exists() {
				panic!("dirstate file is missing")
				}
				kevincoxUnsubmitted Done I would skip this check and rely on `p.metadata()`. Just switch `.unwrap()` to `.expect()` with a nicer message. This also handles race conditions more nicely. kevincox: I would skip this check and rely on `p.metadata()`. Just switch `.unwrap()` to `.expect()` with…

				let mtime = p.metadata().unwrap().modified().unwrap();

				let mut ret = Self {
				p1: NULL_ID.clone(),
				p2: NULL_ID.clone(),
				path: p,
				dmap: Map::new(),

				mtime,
				};

				ret.parse_dirstate();

				kevincoxUnsubmitted Not Done Switch the return type to `std::io::Result` and then you can have let metadata = p.metadata()?; let mtime = metadata.modified()?; // ... kevincox: Switch the return type to `std::io::Result` and then you can have ``` let metadata = p.
				return ret;
				}

				pub fn parse_dirstate(&mut self) {
				kevincoxUnsubmitted Not Done Does this function need to be public? It seems internal to the constructor. If it doesn't need to be I would prefer it return the Map so that you don't have a partial-constructed DirState. kevincox: 1. Does this function need to be public? It seems internal to the constructor. 1. If it doesn't…
				IvzhhAuthorUnsubmitted Not Done I think dir state needs to 1. read existing one; 2. create one if not exits; maybe private for now. Ivzhh: I think dir state needs to 1. read existing one; 2. create one if not exits; maybe private for…
				let mut dfile = File::open(&self.path).expect("Cannot open dirstate file");

				dfile.read_exact(&mut self.p1.node).unwrap();
				dfile.read_exact(&mut self.p2.node).unwrap();

				loop {
				let entry: DirStateEntry = match read_dirstate_entry(&mut dfile) {
				Ok(v) => v,
				Err(_) => break,
				};

				let mut fname = vec![0u8; entry.length as usize];
				dfile.read_exact(fname.as_mut()).unwrap();

				self.dmap
				.entry(PathBuf::from(str::from_utf8(fname.as_ref()).unwrap()))
				.or_insert(entry);
				kevincoxUnsubmitted Not Done Is ignoring duplicate entries desired? It might be worth a comment explaining why. kevincox: Is ignoring duplicate entries desired? It might be worth a comment explaining why.
				}
				}

				#[cfg(target_family = "unix")]
				fn _is_bad(entry: &DirEntry) -> bool {
				kevincoxUnsubmitted Done Don't use `_` prefix for privates. Rely on rust viability. Also `is_bad` isn't very informative. kevincox: Don't use `_` prefix for privates. Rely on rust viability. Also `is_bad` isn't very…
				entry.file_type().is_block_device() \|\| entry.file_type().is_fifo()
				\|\| entry.file_type().is_char_device() \|\| entry.file_type().is_symlink()
				\|\| entry.file_type().is_socket()
				}

				#[cfg(not(target_family = "unix"))]
				fn _is_bad(_entry: &DirEntry) -> bool {
				false
				}

				pub fn walk_dir(&mut self, root: &Path, mtc: &matcher::Matcher) -> CurrentState {
				kevincoxUnsubmitted Done s/mtc/matcher/ kevincox: s/mtc/matcher/
				let mut grey = {
				let mut grey = Set::new();
				grey.extend(self.dmap.keys().map(\|s\| s.as_path()));
				grey
				};
				kevincoxUnsubmitted Done let mut grey = Set::new(); grey.extend(self.dmap.keys().map(\|s\| s.as_path())); Also I would pick a name like `undiscovered_paths` or something. `grey` is cryptic. kevincox: ``` let mut grey = Set::new(); grey.extend(self.dmap.keys().map(\|s\| s.as_path())); ``` Also I…

				let mut res = CurrentState::new();

				let walker = WalkDir::new(root).into_iter();

				for entry in walker.filter_entry(\|ent\| {
				kevincoxUnsubmitted Not Done I would prefer doing the filter before the loop and storing it in a variable. kevincox: I would prefer doing the filter before the loop and storing it in a variable.
				IvzhhAuthorUnsubmitted Not Done For the filter, I follow the example in the walkdir doc. I guess what I want is to skip the dir for later recursive visiting. Ivzhh: For the filter, I follow the example in the walkdir doc. I guess what I want is to skip the dir…
				if ent.file_type().is_dir() {
				let mut p = ent.path().strip_prefix(root).unwrap().to_owned();
				p.push("");
				kevincoxUnsubmitted Not Done This is probably worth a helper function. kevincox: This is probably worth a helper function.
				!mtc.check_path_ignored(p.to_str().unwrap())
				} else {
				true
				}
				}) {
				if let Ok(entry) = entry {
				kevincoxUnsubmitted Done Please explain why you are ignoring the error condition. kevincox: Please explain why you are ignoring the error condition.
				IvzhhAuthorUnsubmitted Not Done I add the error handling back Ivzhh: I add the error handling back
				let pbuf = entry.path();
				kevincoxUnsubmitted Not Done I would just call this `path` or `pathbuf`. kevincox: I would just call this `path` or `pathbuf`.
				let relpath = pbuf.strip_prefix(root).unwrap();

				if DirState::_is_bad(&entry) {
				continue;
				}
				kevincoxUnsubmitted Not Done I would move this filter beside the filter in the loop. kevincox: I would move this filter beside the filter in the loop.

				if !entry.file_type().is_dir() {
				kevincoxUnsubmitted Not Done I would also put this filter above. But more importantly all `_is_bad()` does is check for file types. So it seems like the former filter is redundant with this one. kevincox: I would also put this filter above. But more importantly all `_is_bad()` does is check for file…
				if self.dmap.contains_key(relpath) {
				kevincoxUnsubmitted Not Done You could do the following for a slight performance win and save a line. if let Occupied(entry) = self.dmap.entry(relpath) { ... } kevincox: You could do the following for a slight performance win and save a line. ``` if let Occupied…
				IvzhhAuthorUnsubmitted Not Done I kind of get borrow check compile error here. Later I use Occupied() when possible. Ivzhh: I kind of get borrow check compile error here. Later I use Occupied() when possible.
				kevincoxUnsubmitted Not Done Sorry, I misunderstood the logic. You can do this: diff -r ccc683587fdb rust/hgstorage/src/dirstate.rs --- a/rust/hgstorage/src/dirstate.rs Sat Mar 24 10:05:53 2018 +0000 +++ b/rust/hgstorage/src/dirstate.rs Sat Mar 24 10:14:58 2018 +0000 @@ -184,8 +184,7 @@ continue; } - if self.dir_state_map.contains_key(rel_path) { - let dir_entry = &self.dir_state_map[rel_path]; + if let Some(dir_entry) = self.dir_state_map.get(rel_path) { files_not_in_walkdir.remove(rel_path); DirState::check_status(&mut res, abs_path, rel_path, dir_entry); } else if !matcher.check_path_ignored(rel_path.to_str().unwrap()) { kevincox: Sorry, I misunderstood the logic. You can do this: ``` diff -r ccc683587fdb…
				let stent = &self.dmap[relpath];
				grey.remove(relpath);
				DirState::check_status(&mut res, pbuf, relpath, stent);
				} else {
				if !mtc.check_path_ignored(relpath.to_str().unwrap()) {
				kevincoxUnsubmitted Not Done Use an `else if`. kevincox: Use an `else if`.
				res.unknown.insert(relpath.to_path_buf());
				}
				}
				}
				}
				}

				for rem in grey.drain() {
				kevincoxUnsubmitted Done s/rem/path/ or remaining_path. kevincox: s/rem/path/ or remaining_path.
				if res.ignored.contains(rem) {
				kevincoxUnsubmitted Not Done You can use the entry api here. kevincox: You can use the entry api here.
				res.ignored.remove(rem);
				}

				let relpath = rem;
				let abspath = root.join(relpath);

				let stent = &self.dmap[relpath];

				DirState::check_status(&mut res, &abspath, &relpath, stent);
				}

				return res;
				}

				fn check_status(res: &mut CurrentState, abspath: &Path, relpath: &Path, stent: &DirStateEntry) {
				kevincoxUnsubmitted Not Done Please use a better name for `sent`. kevincox: Please use a better name for `sent`.
				let pb = relpath.to_path_buf();

				// the order here is very important
				// if it is 'r' then it can be 'forget' or 'remove',
				// so the file existence doesn't matter.
				// other status all rely on file existence.
				if stent.status == ('r' as u8) {
				kevincoxUnsubmitted Done In rust we generally avoid brackets around `as` as it is very tightly binding. kevincox: In rust we generally avoid brackets around `as` as it is very tightly binding.
				res.removed.insert(pb);
				} else if !abspath.exists() {
				res.deleted.insert(pb);
				} else if stent.status == ('a' as u8) {
				res.added.insert(pb);
				} else {
				let mtd = abspath.metadata().unwrap();

				if mtd.len() != (stent.size as u64) {
				res.modified.insert(pb);
				} else if mtd.modified()
				.unwrap()
				.duration_since(UNIX_EPOCH)
				.unwrap()
				.as_secs() != (stent.mtime as u64)
				{
				res.lookup.insert(pb);
				} else {
				res.clean.insert(pb);
				}
				}
				}
				}
				kevincoxUnsubmitted Not Done Does it make sense to make `DirStateEntry.mtime` be a `std::time::SystemTime` and convert upon reading the structure in? If not I would prefer doing the conversion here: else if mtd.modified().unwrap() == UNIX_EPOCH + Duration::from_secs(dir_entry.mtime as u64) { (Maybe extract the system time to higher up, or even a helper function on dir_entry) kevincox: Does it make sense to make `DirStateEntry.mtime` be a `std::time::SystemTime` and convert upon…

rust/hgstorage/src/lib.rs

This file was added.

				extern crate byteorder;
				extern crate flate2;
				extern crate hex;
				#[macro_use]
				extern crate lazy_static;
				extern crate lru_cache;
				extern crate num_cpus;
				extern crate regex;
				extern crate sha1;
				extern crate tempdir;
				extern crate threadpool;
				extern crate walkdir;

				use std::process::Command;
				use std::path::Path;

				pub mod mpatch;
				pub mod revlog;
				pub mod revlog_v1;
				pub mod changelog;
				pub mod manifest;
				pub mod path_encoding;
				pub mod matcher;
				pub mod dirstate;
				pub mod repository;
				pub mod local_repo;
				pub mod config;
				pub mod working_context;

				/// assume cffi repo is in the same level of hg repo
				pub fn prepare_testing_repo(temp_dir: &Path) {
				if !Path::new("../../../cffi/").exists() {
				let hg_msg = Command::new("hg")
				.args(&[
				"clone",
				"https://ivzhh@bitbucket.org/ivzhh/cffi",
				"../../../cffi/",
				"-u",
				"e8f05076085cd24d01ba1f5d6163fdee16e90103",
				])
				.output()
				.unwrap();
				println!("stdout: {}", String::from_utf8_lossy(&hg_msg.stdout));
				println!("stderr: {}", String::from_utf8_lossy(&hg_msg.stderr));
				}

				let dst = temp_dir.join("cffi");
				let hg_msg = Command::new("hg")
				.args(&[
				"clone",
				"../../../cffi/",
				"-u",
				"e8f05076085cd24d01ba1f5d6163fdee16e90103",
				dst.to_str().unwrap(),
				kevincoxUnsubmitted Not Done You can add a later `.arg(dst)` to support non-utf8 paths instead of converting to a str here. kevincox: You can add a later `.arg(dst)` to support non-utf8 paths instead of converting to a str here.
				])
				.output()
				.unwrap();
				if !hg_msg.stdout.is_empty() {
				println!("stdout: {}", String::from_utf8_lossy(&hg_msg.stdout));
				}
				if !hg_msg.stderr.is_empty() {
				println!("stderr: {}", String::from_utf8_lossy(&hg_msg.stderr));
				}
				}

rust/hgstorage/src/local_repo.rs

This file was added.

				use std;
				use std::path::{Path, PathBuf};
				use std::sync::{Arc, RwLock};

				use lru_cache::LruCache;

				use repository::Repository;
				use config;
				use revlog::Revlog;
				use revlog_v1 as rv1;
				use matcher::Matcher;
				use dirstate::CurrentState;
				use path_encoding::encode_path;
				use manifest::FlatManifest;
				use working_context::WorkCtx;
				use changelog::ChangeLog;

				const LRU_SIZE: usize = 100;

				type RevlogPtr = Arc<RwLock<Revlog>>;
				type RevlogLRU = Arc<RwLock<LruCache<PathBuf, RevlogPtr>>>;

				pub struct LocalRepo {
				pub repo_root: Arc<PathBuf>,
				pub dot_hg_path: Arc<PathBuf>,
				pub pwd: Arc<PathBuf>,
				pub store_data: Arc<PathBuf>,
				pub matcher: Matcher,
				pub config: Arc<config::Configuration>,
				pub revlog_factory: Arc<Revlog>,
				pub revlog_db: RevlogLRU,
				pub manifest: Arc<FlatManifest>,
				pub changelog: Arc<ChangeLog>,
				pub work_ctx: Arc<WorkCtx>,
				}

				impl LocalRepo {
				pub fn new(dash_r: Option<PathBuf>) -> Self {
				let pwd = Arc::new(std::env::current_dir().unwrap());

				let repo_root = Arc::new(match dash_r {
				Some(p) => {
				let dot_hg_path = p.join(".hg");
				if dot_hg_path.exists() {
				p
				} else {
				panic!(format!(
				".hg folder not found for the path given by -R argument: {:?}",
				p
				));
				kevincoxUnsubmitted Done I would replace the condition with. assert!(dot_hg_path.exists(), ".hg folder not found for the path given by -R argument: {:?}", p); kevincox: I would replace the condition with. ``` assert!(dot_hg_path.exists(), ".hg folder not found…
				}
				}
				None => {
				let mut root = pwd.as_path();
				loop {
				let dot_hg_path = root.join(".hg");
				if dot_hg_path.exists() {
				break root.to_path_buf();
				}
				match root.parent() {
				Some(p) => {
				root = p;
				}
				None => panic!(".hg folder not found"),
				}
				}
				kevincoxUnsubmitted Done while !root.join(".hg").exists() { root = root.parent().expect(".hg folder not found"); } kevincox: ``` while !root.join(".hg").exists() { root = root.parent().expect(".hg folder not found")…
				}
				});

				let dot_hg_path = Arc::new(repo_root.join(".hg"));
				let requires = dot_hg_path.join("requires");
				let config = Arc::new(config::Configuration::new(&requires));
				let store = dot_hg_path.join("store");
				let store_data = Arc::new(store.join("data"));
				//let fn_changelog = store.join("00changelog.i");
				let fn_manifest = store.join("00manifest.i");
				let fn_changelog = store.join("00changelog.i");

				let revlog_factory = match config.revlog_format {
				config::RevlogFormat::V1 => Arc::new(rv1::RevlogIO::get_factory()),
				_ => panic!("other revlog formats not supported yet."),
				};

				let manifest = Arc::new(FlatManifest {
				inner: revlog_factory.create(fn_manifest.as_path()),
				});

				let changelog = Arc::new(ChangeLog {
				inner: revlog_factory.create(fn_changelog.as_path()),
				});

				let matcher = {
				let default_hgignore = repo_root.join(".hgignore");
				let mut matcher = Matcher::new(&[default_hgignore]);
				matcher
				};

				let revlog_db = Arc::new(RwLock::new(LruCache::new(LRU_SIZE)));

				let work_ctx = Arc::new(WorkCtx::new(
				dot_hg_path.clone(),
				manifest.clone(),
				changelog.clone(),
				));

				return Self {
				repo_root,
				dot_hg_path,
				pwd,
				store_data,
				matcher,
				config,
				revlog_factory,
				revlog_db,
				manifest,
				changelog,
				work_ctx,
				};
				}

				pub fn get_filelog(&self, fp: &Path) -> Arc<RwLock<Revlog>> {
				kevincoxUnsubmitted Done s/fp/path/ kevincox: s/fp/path/
				let relpath = encode_path(fp);
				let abspath = self.store_data.join(&relpath);

				if !abspath.exists() {
				panic!(format!("path not exists: {:?}", abspath));
				}
				kevincoxUnsubmitted Done assert!(abspath.exists(), "path not exists: {:?}", abspath); kevincox: ``` assert!(abspath.exists(), "path not exists: {:?}", abspath); ```

				let mut gd = self.revlog_db.write().unwrap();
				kevincoxUnsubmitted Not Done `gd` is cryptic. kevincox: `gd` is cryptic.

				if !gd.contains_key(fp) {
				let rl = self.revlog_factory.create(abspath.as_path());
				gd.insert(fp.to_path_buf(), rl);
				}

				return gd.get_mut(fp).unwrap().clone();
				kevincoxUnsubmitted Not Done Why does it need to be mutable to clone? kevincox: Why does it need to be mutable to clone?
				IvzhhAuthorUnsubmitted Not Done I think LRU will update reference count (or timestamp) when the data is accessed. Ivzhh: I think LRU will update reference count (or timestamp) when the data is accessed.
				kevincoxUnsubmitted Not Done Actually I didn't realize that RwLock doesn't get a regular `get()` since it is doing a compile time borrow check. https://doc.rust-lang.org/std/sync/struct.RwLock.html#method.get_mut. My mistake, the code is fine. kevincox: Actually I didn't realize that RwLock doesn't get a regular `get()` since it is doing a compile…
				}
				}

				impl Repository for LocalRepo {
				fn status(&self) -> CurrentState {
				self.work_ctx.status(&self)
				}
				}

				#[cfg(test)]
				mod test {
				use std::env;
				use tempdir::TempDir;
				use local_repo::LocalRepo;
				use repository::Repository;
				use dirstate::DirState;

				#[test]
				fn test_hgstorage_dirstate() -> () {
				kevincoxUnsubmitted Done This test has no assetions. Consider calling it `test_create_...` or something to indicate that you are just checking for panics. kevincox: This test has no assetions. Consider calling it `test_create_...` or something to indicate that…
				println!(
				"current dir {}",
				env::current_dir().unwrap().to_str().unwrap()
				);
				let tmp_dir = TempDir::new("cffi").unwrap();

				println!("tmp_dir {:?}", tmp_dir.path());

				::prepare_testing_repo(tmp_dir.path());

				let root = tmp_dir.path().join("cffi");

				let dfile = tmp_dir.path().join("cffi/.hg/dirstate");
				let mut ds = DirState::new(dfile);
				ds.parse_dirstate();
				println!("p1: {}", ds.p1);

				let repo = LocalRepo::new(Some(root));

				println!("status: {:?}", repo.status());
				}
				}

rust/hgstorage/src/manifest.rs

This file was added.

				use std::sync::{Arc, RwLock};
				use std::collections::HashMap as Map;
				use std::path::PathBuf;
				use std::str;

				use hex;

				use revlog::{NodeId, Revlog};

				#[derive(Debug)]
				pub struct ManifestEntry {
				pub id: NodeId,
				pub flag: String,
				}

				type FileRevMap = Map<PathBuf, ManifestEntry>;

				#[derive(Clone)]
				pub struct FlatManifest {
				pub inner: Arc<RwLock<Revlog>>,
				}

				impl FlatManifest {
				pub fn build_file_rev_mapping(&self, rev: &i32) -> FileRevMap {
				let mut res = FileRevMap::new();

				let content = self.inner.read().unwrap().revision(rev).unwrap();

				let mut line_start = 0;
				let mut prev_i: usize = 0;

				for i in 0..(content.len()) {
				if content[i] == 0 {
				kevincoxUnsubmitted Done What are these magic numbers? kevincox: What are these magic numbers?
				prev_i = i;
				} else if content[i] == 10 {
				let file_name = str::from_utf8(&content[line_start..prev_i])
				.unwrap()
				.to_string();

				line_start = i + 1;

				let ent = if i - prev_i - 1 == NodeId::hex_len() {
				kevincoxUnsubmitted Done s/ent/entry/ kevincox: s/ent/entry/
				let id =
				NodeId::new_from_bytes(&hex::decode(&content[(prev_i + 1)..i]).unwrap());
				let flag = "".to_string();
				ManifestEntry { id, flag }
				} else {
				let id = NodeId::new_from_bytes(&hex::decode(
				&content[(prev_i + 1)..(prev_i + 41)],
				kevincoxUnsubmitted Done What are these numbers? kevincox: What are these numbers?
				).unwrap());
				let flag = str::from_utf8(&content[(prev_i + 41)..i])
				.unwrap()
				.to_string();
				ManifestEntry { id, flag }
				};

				res.insert(PathBuf::from(file_name.as_str()), ent);
				}
				}

				return res;
				}
				}

				#[cfg(test)]
				mod test {
				use std::env;
				use tempdir::TempDir;
				use super::*;
				use manifest::FlatManifest;
				use std::sync::{Arc, RwLock};
				use revlog_v1::*;

				#[test]
				fn test_hgstorage_manifest() -> () {
				println!(
				"current dir {}",
				env::current_dir().unwrap().to_str().unwrap()
				);
				let tmp_dir = TempDir::new("cffi").unwrap();

				println!("tmp_dir {:?}", tmp_dir.path());

				::prepare_testing_repo(tmp_dir.path());
				let mfest = tmp_dir.path().join("cffi/.hg/store/00manifest.i");
				println!("mfest {:?}", mfest);

				assert!(mfest.exists());

				let rvlg = RevlogIO::new(&mfest);

				assert_eq!(rvlg.tip(), 2957);

				let tip = rvlg.tip() as i32;
				let node = rvlg.node_id(&tip);
				println!("node rev{}: {}", tip, node);

				let p1r = rvlg.p1(&tip);
				let p1id = rvlg.p1_nodeid(&tip);
				println!("p1 rev{}: {}", p1r, p1id);

				let p2r = rvlg.p2(&tip);
				let p2id = rvlg.p2_nodeid(&tip);
				println!("p2 rev{}: {}", p2r, p2id);

				let content = rvlg.revision(&tip).unwrap();

				let hash = rvlg.check_hash(&content, &p1id, &p2id);
				println!("{}", str::from_utf8(&content).unwrap());

				assert_eq!(hash, rvlg.rev(&tip).unwrap().borrow().node_id());

				let manifest = FlatManifest {
				inner: Arc::new(RwLock::new(rvlg)),
				};

				manifest.build_file_rev_mapping(&2957);
				}
				}

rust/hgstorage/src/matcher.rs

This file was added.

				use std::fs;
				use std::path::PathBuf;
				use std::vec::Vec;
				use std::io::BufReader;
				use std::io::BufRead;

				use regex::{escape, RegexSet};

				pub fn glob_to_re(pat: &str) -> String {
				kevincoxUnsubmitted Done s/pat/glob/ kevincox: s/pat/glob/
				let mut res = String::new();
				kevincoxUnsubmitted Done Might be worth calling `String::with_capacity(pat.len())` since it will be at least that long. kevincox: Might be worth calling `String::with_capacity(pat.len())` since it will be at least that long.

				let pat: Vec<char> = pat.chars().collect();
				let mut i = 0;
				let n = pat.len();
				kevincoxUnsubmitted Not Done Can you manage a `&[u8]` rather then pointer arithmetic for the whole string. It will make me feel better and will probably be easier to read. kevincox: Can you manage a `&[u8]` rather then pointer arithmetic for the whole string. It will make me…
				IvzhhAuthorUnsubmitted Not Done I borrow this logic as whole from python code. It will need sometime to re-translate to non-pointer-arithmetic way. Ivzhh: I borrow this logic as whole from python code. It will need sometime to re-translate to non…

				let mut group = 0;

				while i < n {
				let c = pat[i];
				i += 1;

				match c {
				'*' => {
				if i < n && pat[i] == '*' {
				i += 1;
				if i < n && pat[i] == '/' {
				i += 1;
				res.push_str("(?:.*/)?");
				} else {
				res.push_str(".*");
				}
				} else {
				res.push_str("[^/]*");
				}
				}
				'?' => {
				res.push('.');
				}
				'[' => {
				let mut j = i;
				if j < n && (pat[j] == '!' \|\| pat[j] == ']') {
				j += 1;
				}
				while j < n && pat[j] != ']' {
				j += 1;
				}
				if j >= n {
				res.push_str("\\[");
				} else {
				let mut stuff = String::new();

				if pat[i] == '!' {
				stuff.push('^');
				i += 1;
				} else if pat[i] == '^' {
				stuff.push('\\');
				stuff.push('^');
				i += 1;
				}

				for cc in pat[i..j].iter().cloned() {
				stuff.push(cc);
				if cc == '\\' {
				stuff.push('\\');
				}
				}
				i = j + 1;

				res.push('[');
				res.push_str(stuff.as_str());
				res.push(']');
				}
				}
				'{' => {
				group += 1;
				res.push_str("(?:");
				}
				'}' if group != 0 => {
				res.push(')');
				group -= 1;
				}
				',' if group != 0 => {
				res.push('\|');
				}
				'\\' => {
				if i < n {
				res.push_str(escape(pat[i].to_string().as_str()).as_str());
				i += 1;
				} else {
				res.push_str(escape(pat[i].to_string().as_str()).as_str());
				}
				}
				_ => {
				res.push_str(escape(c.to_string().as_str()).as_str());
				}
				}
				}

				let res = if cfg!(target_family = "unix") {
				res
				} else {
				res.replace("/", "\\\\")
				};

				res
				}

				fn relglob(pat: String) -> String {
				kevincoxUnsubmitted Done If you are going to call `String.as_str()` just take a `&str`. kevincox: If you are going to call `String.as_str()` just take a `&str`.
				kevincoxUnsubmitted Not Done s/relglob/relative_glob_re/ kevincox: s/relglob/relative_glob_re/
				let mut res = String::new();
				//res.push_str("(?:\|.*/)");
				res.push_str(glob_to_re(pat.as_str()).as_str());
				kevincoxUnsubmitted Not Done If you are just doing one call just return the result. kevincox: If you are just doing one call just return the result.
				kevincoxUnsubmitted Not Done You should be able to do `&string` rather then `string.as_str()` as it coerces. kevincox: You should be able to do `&string` rather then `string.as_str()` as it coerces.
				//res.push_str("(?:/\|$)");

				res
				}

				#[derive(Debug)]
				pub struct Matcher {
				pub inner: Option<RegexSet>,
				}

				impl Matcher {
				pub fn new(hgignores: &[PathBuf]) -> Self {
				let mut inner = Vec::<String>::new();

				for fname in hgignores {
				if !fname.exists() {
				continue;
				}

				let fhnd = BufReader::new(fs::File::open(fname).unwrap());
				kevincoxUnsubmitted Done Better name please. kevincox: Better name please.

				#[derive(PartialEq)]
				enum State {
				None,
				Glob,
				Re,
				}

				inner.push(relglob(".hg/*".to_owned()));

				let mut cur_state = State::None;
				for ln in fhnd.lines() {
				kevincoxUnsubmitted Done s/ln/line/ kevincox: s/ln/line/
				let ln = match ln {
				Err(_) => break,
				Ok(line) => line,
				};

				let ln = ln.trim();

				if ln.is_empty() {
				continue;
				}

				if cur_state == State::None && !ln.starts_with("syntax:") {
				eprintln!(
				"syntax error in {:?}, please use 'syntax: [glob\|regexp]'",
				fname
				);
				kevincoxUnsubmitted Done Is this a warning or error? You might want to switch to `panic!`. kevincox: Is this a warning or error? You might want to switch to `panic!`.
				}
				kevincoxUnsubmitted Done I would move this into the following match because it dedupes the `starts_with` check and puts the logic closer together. kevincox: I would move this into the following match because it dedupes the `starts_with` check and puts…

				if ln.starts_with("syntax:") {
				let mut iter = ln.split_whitespace();
				assert_eq!("syntax:", iter.next().unwrap());
				let pat = iter.next().unwrap();

				cur_state = if pat == "glob" {
				State::Glob
				} else if pat == "regexp" {
				State::Re
				} else {
				panic!("unsupported pattern {} in file {:?}", pat, fname);
				}
				} else {
				match cur_state {
				State::None => (),
				State::Glob => {
				inner.push(relglob(ln.to_owned()));
				}
				State::Re => {
				inner.push(ln.to_owned());
				}
				}
				}
				}
				}

				return match RegexSet::new(inner) {
				Ok(inner) => Self { inner: Some(inner) },
				Err(e) => panic!("error in building ignore {:?}", e),
				};
				}

				/// rp: relative path, relative to the root of repository
				pub fn check_path_ignored(&self, rp: &str) -> bool {
				kevincoxUnsubmitted Done s/rp/path/ kevincox: s/rp/path/
				if let Some(ref m) = self.inner {
				m.is_match(rp)
				} else {
				false
				}
				kevincoxUnsubmitted Not Done I would do `self.inner.map(\|m\| m.is_match(rp)).unwrap_or(false)` but this is fine. kevincox: I would do `self.inner.map(\|m\| m.is_match(rp)).unwrap_or(false)` but this is fine.
				}
				}

				#[cfg(test)]
				mod test {
				use std::env;
				use std::path::Path;
				use tempdir::TempDir;
				use super::*;
				use regex::escape;

				#[test]
				fn test_hgstorage_ignore() -> () {
				println!(
				"current dir {}",
				env::current_dir().unwrap().to_str().unwrap()
				);
				let tmp_dir = TempDir::new("cffi").unwrap();

				println!("tmp_dir {:?}", tmp_dir.path());

				::prepare_testing_repo(tmp_dir.path());

				let mut m = Matcher::new();
				let ignore_file = tmp_dir.path().join("cffi/.hgignore");

				m.build_from_hgignore_files(&[ignore_file]);

				assert!(!m.check_path_ignored(&Path::new("a.py")));
				assert!(m.check_path_ignored(&Path::new("testing/__pycache__")));
				assert!(m.check_path_ignored(&Path::new("test/dfsdf/a.egg-info")));
				assert!(!m.check_path_ignored(&Path::new("a.egg-info.tmp")));
				}

				#[test]
				fn test_hgstorage_globre() -> () {
				//assert_eq!(escape(r"/"), r"\/");
				assert_eq!(glob_to_re(r"?"), r".");
				assert_eq!(glob_to_re(r""), r"[^/]");
				assert_eq!(glob_to_re(r"*"), r".");
				assert_eq!(glob_to_re(r"*/a"), r"(?:./)?a");
				//assert_eq!(glob_to_re(r"a/*/b"), r"a\/(?:./)?b");
				assert_eq!(glob_to_re(r"a/*/b"), r"a/(?:./)?b");
				assert_eq!(glob_to_re(r"[a?!^][^b][!c]"), r"[a?!^][\^b][^c]");
				assert_eq!(glob_to_re(r"{a,b}"), r"(?:a\|b)");
				assert_eq!(glob_to_re(r".\\?"), r"\.\\?");
				}
				}

rust/hgstorage/src/mpatch.rs

This file was added.

				use std::io::prelude::*;
				use std::io::{Cursor, Seek};
				use std::io::SeekFrom::Start;
				use std::vec::Vec;
				use std::mem::swap;
				use std::iter::Extend;

				use byteorder::{BigEndian, ReadBytesExt};

				#[derive(Debug, Default)]
				pub struct DiffHeader {
				/// the line where previous chunk ends
				prev_cnk_ln: u32,
				kevincoxUnsubmitted Done Spell these out please. kevincox: Spell these out please.
				/// the line where next chunk starts
				next_cnk_ln: u32,
				/// size of the current diff patch
				diff_size: u32,
				}

				#[derive(Clone, Debug)]
				struct Fragment {
				frag_len: u32,
				frag_ofs: u32,
				}
				kevincoxUnsubmitted Done struct Fragment { len: u32, offset: u32, } kevincox: ``` struct Fragment { len: u32, offset: u32, } ```

				impl Fragment {
				pub fn new(len: u32, ofs: u32) -> Self {
				Fragment {
				frag_len: len,
				frag_ofs: ofs,
				}
				}
				}

				fn pull(dst: &mut Vec<Fragment>, src: &mut Vec<Fragment>, l: u32) {
				kevincoxUnsubmitted Done Maybe it's just me but I think it is more common to put the source before the destination. kevincox: Maybe it's just me but I think it is more common to put the source before the destination.
				kevincoxUnsubmitted Done `pull` is very generic. kevincox: `pull` is very generic.
				let mut l = l;

				while l > 0 {
				assert_ne!(src.len(), 0);
				kevincoxUnsubmitted Done assert!(!src.is_empty()) kevincox: ``` assert!(!src.is_empty()) ```
				let f = src.pop().unwrap();
				kevincoxUnsubmitted Done If you are unwrapping the `pop` there is no need for the prior check. kevincox: If you are unwrapping the `pop` there is no need for the prior check.
				kevincoxUnsubmitted Not Done s/f/fragment/ kevincox: s/f/fragment/
				if f.frag_len > l {
				src.push(Fragment::new(f.frag_len - l, f.frag_ofs + l));
				dst.push(Fragment::new(l, f.frag_ofs));
				return;
				}
				l -= f.frag_len;
				dst.push(f);
				}
				}

				fn mov(m: &mut Cursor<Vec<u8>>, dest: u32, src: u32, count: u32) {
				kevincoxUnsubmitted Done `mov` is overly shortened and generic. kevincox: `mov` is overly shortened and generic.
				kevincoxUnsubmitted Not Done It seems weird to take a cursor to a vec if you are just going to do an absolute seek. Can it work with `&mut [u8]`? kevincox: It seems weird to take a cursor to a vec if you are just going to do an absolute seek. Can it…
				IvzhhAuthorUnsubmitted Not Done This part, including the stream-style, is from python part. I will update later with xi-rope. Ivzhh: This part, including the stream-style, is from python part. I will update later with xi-rope.
				m.seek(Start(src as u64)).unwrap();
				let mut buf: Vec<u8> = Vec::new();
				buf.resize(count as usize, 0);
				kevincoxUnsubmitted Not Done `vec![0; count]` works. (The arguments might be the other way around). kevincox: `vec![0; count]` works. (The arguments might be the other way around).

				m.read_exact(&mut buf[..]).unwrap();
				m.seek(Start(dest as u64)).unwrap();
				m.write(&buf[..]).unwrap();
				}

				fn collect(m: &mut Cursor<Vec<u8>>, buf: u32, list: &Vec<Fragment>) -> Fragment {
				let start = buf;
				let mut buf = buf;

				for &Fragment {
				frag_len: l,
				frag_ofs: p,
				} in list.iter().rev()
				kevincoxUnsubmitted Not Done for &Fragment{frag_len, frag_ofs} in list.iter().rev() kevincox: ``` for &Fragment{frag_len, frag_ofs} in list.iter().rev() ```
				{
				mov(m, buf, p, l);
				buf += l;
				}
				return Fragment::new(buf - start, start);
				}

				#[derive(Debug)]
				pub struct Patch {
				pub base: Vec<u8>,
				pub bins: Vec<Vec<u8>>,
				}

				pub fn patches(ptc: &Patch) -> Vec<u8> {
				let &Patch {
				base: ref a,
				ref bins,
				} = ptc;
				kevincoxUnsubmitted Not Done Make this one line and don't bother renaming. kevincox: Make this one line and don't bother renaming.

				if bins.len() == 0 {
				return a.iter().cloned().collect();
				}

				let plens: Vec<u32> = bins.iter().map(\|it\| it.len() as u32).collect();
				let pl: u32 = plens.iter().sum();
				let bl: u32 = a.len() as u32 + pl;
				let tl: u32 = bl + bl + pl;

				if tl == 0 {
				return a.iter().cloned().collect();
				}

				let (mut b1, mut b2) = (0_u32, bl);

				let mut m_buf = Vec::<u8>::new();
				m_buf.resize(tl as usize, 0);

				let mut m = Cursor::new(m_buf);

				m.write(&a[..]).unwrap();

				let mut frags = vec![Fragment::new(a.len() as u32, b1 as u32)];

				let mut pos: u32 = b2 + bl;
				m.seek(Start(pos as u64)).unwrap();

				for p in bins.iter() {
				m.write(p).unwrap();
				}

				for plen in plens.iter() {
				if frags.len() > 128 {
				kevincoxUnsubmitted Not Done Please explain. kevincox: Please explain.
				swap(&mut b1, &mut b2);
				frags = vec![collect(&mut m, b1, &mut frags)];
				}

				let mut new: Vec<Fragment> = Vec::new();
				let end = pos + plen;
				let mut last = 0;

				while pos < end {
				m.seek(Start(pos as u64)).unwrap();

				let p1 = m.read_u32::<BigEndian>().unwrap();
				let p2 = m.read_u32::<BigEndian>().unwrap();
				let l = m.read_u32::<BigEndian>().unwrap();

				pull(&mut new, &mut frags, p1 - last);
				assert_ne!(frags.len(), 0);
				kevincoxUnsubmitted Not Done assert!(!frags.is_empty()); kevincox: ``` assert!(!frags.is_empty()); ```
				pull(&mut vec![], &mut frags, p2 - p1);

				new.push(Fragment::new(l, pos + 12));
				pos += l + 12;
				last = p2;
				}

				frags.extend(new.iter().rev().cloned());
				}

				let t = collect(&mut m, b2, &mut frags);

				m.seek(Start(t.frag_ofs as u64)).unwrap();

				let mut res: Vec<u8> = Vec::new();
				res.resize(t.frag_len as usize, 0);

				m.read_exact(&mut res[..]).unwrap();

				return res;
				}

rust/hgstorage/src/path_encoding.rs

This file was added.

				use std::path::{Path, PathBuf};

				const HEX_DIGIT: [char; 16] = [
				'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
				];
				kevincoxUnsubmitted Not Done const HEX_DIGIT: [u8; 16] = b"0123456789abcdef"; kevincox:* const HEX_DIGIT: [u8; 16] = *b"0123456789abcdef";

				fn hex_encode(c: usize) -> (char, char) {
				kevincoxUnsubmitted Not Done c should be a `u8`. kevincox: c should be a `u8`.
				(HEX_DIGIT[c >> 4], HEX_DIGIT[c & 15])
				}

				fn escape(buf: &mut Vec<char>, c: &char) {
				kevincoxUnsubmitted Done `Vec<char>` is odd. Is there any reason not to use a `String` or `Vec<u8>` kevincox: `Vec<char>` is odd. Is there any reason not to use a `String` or `Vec<u8>`
				kevincoxUnsubmitted Done Don't pass a `char` by reference. Also it seems your function wants a `u8`. kevincox: Don't pass a `char` by reference. Also it seems your function wants a `u8`.
				let c = *c as usize;
				assert!(c < 256);

				buf.push('~');
				let res = hex_encode(c);
				buf.push(res.0);
				buf.push(res.1);
				}

				fn encode_dir(p: &String) -> String {
				let mut ps: Vec<char> = p.chars().collect();
				kevincoxUnsubmitted Done I don't think you need this. kevincox: I don't think you need this.
				let len = ps.len();
				kevincoxUnsubmitted Done This isn't necessary. kevincox: This isn't necessary.

				if len >= 2 && ps[len - 2] == '.' && (ps[len - 1] == 'i' \|\| ps[len - 1] == 'd') {
				kevincoxUnsubmitted Done p.ends_with(".i") \|\| p.ends_with(".d") kevincox: ``` p.ends_with(".i") \|\| p.ends_with(".d") ```
				ps.extend(".hg".chars());
				} else if len >= 3 && ps[len - 3] == '.' && ps[len - 2] == 'h' && ps[len - 1] == 'g' {
				ps.extend(".hg".chars());
				}

				return ps.into_iter().collect();
				}

				fn encode_fn(p: &String) -> String {
				kevincoxUnsubmitted Not Done Take a `&str` kevincox: Take a `&str`
				kevincoxUnsubmitted Done `encode_file_name`? kevincox: `encode_file_name`?
				let mut ps: Vec<char> = Vec::new();
				kevincoxUnsubmitted Done Use a String. kevincox: Use a String.

				for c in p.bytes() {
				match c {
				0...32 \| 126...255 => escape(&mut ps, &char::from(c)),
				65...90 => {
				// A...Z \| _
				ps.push('_');
				ps.push(char::from(c - 65 + 97));
				}
				95 => {
				ps.push('_');
				ps.push('_');
				}
				_ => {
				let c = char::from(c);
				match c {
				'\\' \| ':' \| '*' \| '?' \| '"' \| '<' \| '>' \| '\|' => escape(&mut ps, &c),
				_ => ps.push(c),
				}
				}
				}
				}
				kevincoxUnsubmitted Done fn escape(out: &mut String, b: char) { unimplemented!() } pub fn encode_path(path: &str) -> String { let mut out = String::with_capacity(path.len()); for c in path.bytes() { let c = c as char; match c { 'A'...'Z' => { out.push('_'); out.push(c.to_ascii_lowercase()); } '\\' \| ':' \| '' \| '?' \| '"' \| '<' \| '>' \| '\|' => { escape(&mut out, c); } // The rest of the printable range. ' '...'~' => { out.push(c); } _ => { escape(&mut out, c); } } } out } https://godbolt.org/g/3WCQs3 kevincox:* ``` fn escape(out: &mut String, b: char) { unimplemented!() } pub fn encode_path(path…

				return ps.into_iter().collect();
				}

				pub fn aux_encode(p: &String) -> String {
				kevincoxUnsubmitted Not Done Take a `&str`. kevincox: Take a `&str`.
				let mut ps: Vec<char> = Vec::new();

				let ch: Vec<char> = p.chars().collect();
				let len = ch.len();

				if ch[0] == '.' \|\| ch[0] == ' ' {
				escape(&mut ps, &ch[0]);
				ps.extend(ch[1..].iter());
				} else {
				let dotpos = {
				let mut i = 0;
				loop {
				if i < ch.len() && ch[i] == '.' {
				break i as i32;
				} else if i >= ch.len() {
				break -1 as i32;
				}

				i += 1;
				}
				};

				let l = if dotpos == -1 {
				ch.len()
				} else {
				dotpos as usize
				};

				let mut is_aux = false;
				let mut cursor: usize;

				if l == 3 {
				let key: String = ch[..3].into_iter().collect();
				if key == "aux" \|\| key == "con" \|\| key == "prn" \|\| key == "nul" {
				ps.extend(ch[..2].iter());
				escape(&mut ps, &ch[2]);
				is_aux = true;
				}
				} else if l == 4 {
				let key: String = ch[..3].into_iter().collect();
				if (key == "com" \|\| key == "lpt") && '1' <= ch[3] && ch[3] <= '9' {
				ps.extend(ch[..2].iter());
				escape(&mut ps, &ch[2]);
				ps.push(ch[3]);
				is_aux = true;
				}
				}

				if !is_aux {
				ps.extend(ch[..l].iter());
				}

				cursor = l;

				if cursor < len - 1 {
				ps.extend(ch[cursor..(len - 1)].iter());
				cursor = len - 1;
				}

				if cursor == len - 1 {
				if ch[cursor] == '.' \|\| ch[cursor] == ' ' {
				escape(&mut ps, &ch[cursor]);
				} else {
				ps.push(ch[cursor]);
				}
				}
				}

				return ps.into_iter().collect();
				}

				pub fn encode_path(p: &Path) -> PathBuf {
				let mut res = PathBuf::new();

				let leaves: Vec<String> = p.iter().map(\|s\| s.to_str().unwrap().to_string()).collect();

				let mut i = 0;

				assert_ne!(leaves.len(), 0);

				while i < leaves.len() - 1 {
				let leaf = &leaves[i];
				let leaf = encode_dir(leaf);
				let leaf = encode_fn(&leaf);
				let leaf = aux_encode(&leaf);

				res.push(leaf);
				i += 1;
				}

				let leaf = &leaves[leaves.len() - 1];
				let leaf = encode_fn(&leaf);
				let mut leaf = aux_encode(&leaf);
				leaf.push_str(".i");
				res.push(leaf);

				return res;
				}

				#[cfg(test)]
				mod test {
				use std::path::Path;
				use super::*;

				#[test]
				fn test_hgstorage_path_encoding() -> () {
				assert_eq!(
				"~2efoo/au~78.txt/txt.aux/co~6e/pr~6e/nu~6c/foo~2e",
				encode_path(&Path::new(".foo/aux.txt/txt.aux/con/prn/nul/foo."))
				.to_str()
				.unwrap()
				);

				assert_eq!(
				"foo.i.hg/bar.d.hg/bla.hg.hg/hi~3aworld~3f/_h_e_l_l_o",
				encode_path(&Path::new("foo.i/bar.d/bla.hg/hi:world?/HELLO"))
				.to_str()
				.unwrap()
				);

				assert_eq!(
				"~2ecom1com2/lp~749.lpt4.lpt1/conprn/com0/lpt0/foo~2e",
				encode_path(&Path::new(".com1com2/lpt9.lpt4.lpt1/conprn/com0/lpt0/foo."))
				.to_str()
				.unwrap()
				);
				}
				}

rust/hgstorage/src/repository.rs

This file was added.

				use dirstate::CurrentState;

				pub trait Repository {
				fn status(&self) -> CurrentState;
				}

rust/hgstorage/src/revlog.rs

This file was added.

				use std::path::Path;
				use std::cell::RefCell;
				use std::fmt;
				use std::sync::{Arc, RwLock};

				#[derive(Debug, Default, Clone, PartialEq, Eq, Hash)]
				pub struct NodeId {
				pub node: [u8; 20],
				}

				lazy_static! {
				pub static ref NULL_ID: NodeId = { NodeId::new([0_u8; 20]) };
				}

				impl NodeId {
				pub fn new(content: [u8; 20]) -> Self {
				assert_eq!(content.len(), 20);
				return Self { node: content };
				}

				pub fn new_from_bytes(bytes: &[u8]) -> Self {
				assert_eq!(bytes.len(), 20);

				let mut content = [0_u8; 20];

				content.copy_from_slice(bytes);

				return Self { node: content };
				}

				pub fn null_id() -> Self {
				NULL_ID.clone()
				}

				pub fn is_valid(&self) -> bool {
				self.node.len() <= 20
				}

				pub fn hex_len() -> usize {
				40
				}
				pub fn bin_len() -> usize {
				20
				}
				}

				impl fmt::Display for NodeId {
				fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
				for &byte in self.node.iter() {
				write!(f, "{:02X}", byte).expect("Unable to write");
				}
				return Ok(());
				}
				}

				#[derive(Debug)]
				pub enum NodeLookupError {
				KeyNotFound,
				AmbiguousKeys,
				}

				pub trait RevEntry {
				fn p1(&self) -> u32;
				fn p2(&self) -> u32;
				fn node_id(&self) -> NodeId;
				}

				pub trait Revlog: Send + Sync {
				fn node_id_to_rev(&self, id: &NodeId) -> Result<i32, NodeLookupError>;
				fn rev(&self, id: &i32) -> Result<&RefCell<RevEntry>, NodeLookupError>;
				fn delta_chain(&self, id: &i32) -> Result<Vec<usize>, NodeLookupError>;
				fn revision(&self, id: &i32) -> Result<Vec<u8>, NodeLookupError>;
				fn tip(&self) -> usize;
				fn check_hash(&self, text: &[u8], p1: &NodeId, p2: &NodeId) -> NodeId;
				fn p1(&self, rev: &i32) -> i32;
				fn p1_nodeid(&self, rev: &i32) -> NodeId;
				fn p2(&self, rev: &i32) -> i32;
				fn p2_nodeid(&self, rev: &i32) -> NodeId;
				fn node_id(&self, rev: &i32) -> NodeId;

				fn create(&self, index_file: &Path) -> Arc<RwLock<Revlog>>;
				}

rust/hgstorage/src/revlog_v1.rs

This file was added.

				use std::path::{Path, PathBuf};
				use std::io;
				use std::io::{BufReader, Read, Seek, SeekFrom};
				use std::fs;
				use std::cell::RefCell;
				use std::sync::{Arc, RwLock};
				use std::collections::HashMap as Map;

				use byteorder::{BigEndian, ReadBytesExt};
				use flate2::read::ZlibDecoder;
				use sha1::Sha1 as Sha;

				use revlog::*;
				use revlog::NodeLookupError;
				use revlog::NodeLookupError::*;
				use mpatch::{patches, Patch};

				pub const FLAG_INLINE_DATA: u32 = (1 << 16) as u32;
				pub const FLAG_GENERALDELTA: u32 = (1 << 17) as u32;

				#[derive(Debug, Default)]
				pub struct Entry {
				offset_w_flags: u64,
				len_compressed: u32,
				len_uncompressesd: u32,
				base_rev: u32,
				link_rev: u32,
				parent1_rev: u32,
				parent2_rev: u32,
				nodeid: [u8; 20],
				padding: [u8; 12],
				}

				pub fn read_entry<R: Read + ?Sized + ReadBytesExt>(rdr: &mut R) -> io::Result<Entry> {
				let offset_w_flags = rdr.read_u64::<BigEndian>()?;
				let len_compressed = rdr.read_u32::<BigEndian>()?;
				let len_uncompressesd = rdr.read_u32::<BigEndian>()?;
				let base_rev = rdr.read_u32::<BigEndian>()?;
				let link_rev = rdr.read_u32::<BigEndian>()?;
				let parent1_rev = rdr.read_u32::<BigEndian>()?;
				let parent2_rev = rdr.read_u32::<BigEndian>()?;
				let mut nodeid = [0_u8; 20];
				rdr.read_exact(&mut nodeid)?;
				let mut padding = [0_u8; 12];
				rdr.read_exact(&mut padding)?;

				Ok(Entry {
				offset_w_flags,
				len_compressed,
				len_uncompressesd,
				base_rev,
				link_rev,
				parent1_rev,
				parent2_rev,
				nodeid,
				padding,
				})
				}

				impl Entry {
				pub fn packed_size() -> u32 {
				8 + 4 * 6 + 20 + 12
				}

				fn offset(&self) -> u64 {
				self.offset_w_flags >> 16
				}
				}

				pub enum DFileFlag {
				Inline,
				Separated(PathBuf),
				}

				use self::DFileFlag::*;

				pub enum CachedEntry {
				Offset(u32),
				Cached(Box<Entry>),
				}

				use self::CachedEntry::*;

				impl RevEntry for CachedEntry {
				fn p1(&self) -> u32 {
				return match self {
				&Offset(_) => panic!("this rev should have been cached."),
				&Cached(ref ent) => ent.parent1_rev,
				};
				}

				fn p2(&self) -> u32 {
				return match self {
				&Offset(_) => panic!("this rev should have been cached."),
				&Cached(ref ent) => ent.parent2_rev,
				};
				}

				fn node_id(&self) -> NodeId {
				return match self {
				&Offset(_) => panic!("this rev should have been cached."),
				&Cached(ref ent) => NodeId::new(ent.nodeid.clone()),
				};
				}
				}

				pub struct RevlogIO {
				index_file: PathBuf,
				dflag: DFileFlag,
				other_flags: u32,
				node2rev: Map<NodeId, i32>,
				revs: Vec<RefCell<CachedEntry>>,
				}

				unsafe impl Send for RevlogIO {}
				unsafe impl Sync for RevlogIO {}

				/// currently asssume RevlogNG v1 format, with parent-delta, without inline
				impl RevlogIO {
				pub fn get_factory() -> Self {
				Self {
				index_file: PathBuf::new(),
				dflag: DFileFlag::Inline,
				other_flags: 0,
				node2rev: Map::new(),
				revs: Vec::<RefCell<CachedEntry>>::new(),
				}
				}

				pub fn new(index_file: &Path) -> Self {
				println!("create revlog for {:?}", index_file);
				if !index_file.exists() {
				panic!("index file must exist: {:?}", index_file);
				}

				let mut f = BufReader::new(fs::File::open(index_file).unwrap());
				let flag = f.read_u32::<BigEndian>().unwrap();
				f.seek(SeekFrom::Start(0)).unwrap();

				let dflag = if (flag & FLAG_INLINE_DATA) > 0 {
				Inline
				} else {
				let data_file = index_file.with_extension("d");
				assert!(data_file.exists());
				Separated(data_file)
				};

				let other_flags = flag;

				let max_cap = (index_file.metadata().unwrap().len() as u32) / Entry::packed_size();

				let mut node2rev = Map::new();
				let mut revs = Vec::with_capacity(max_cap as usize);

				loop {
				match read_entry(&mut f) {
				Ok(hd) => {
				let tip = revs.len() as i32;

				let id = NodeId::new(hd.nodeid.clone());

				node2rev.insert(id, tip);
				revs.push(RefCell::new(Cached(Box::new(hd))));
				}
				Err(_) => break,
				}
				}

				return Self {
				index_file: index_file.to_path_buf(),
				dflag,
				other_flags,
				node2rev,
				revs,
				};
				}

				/// outer functions, which serve as interface, should check argument validity,
				/// the internal calls use execute without question and thus panic on errors
				fn make_sure_cached(&self, r: &usize) {
				let r = *r;

				let ofs_opt = match *self.revs[r].borrow() {
				Offset(ofs) => Some(ofs),
				_ => None,
				};

				if let Some(ofs) = ofs_opt {
				let mut f = fs::File::open(&self.index_file).unwrap();
				f.seek(SeekFrom::Start(ofs as u64)).unwrap();

				let ent: Entry = read_entry(&mut f).unwrap();
				self.revs[r].replace(Cached(Box::new(ent)));
				} else {
				}
				}

				fn prepare_id(&self, r: &i32) -> Result<usize, NodeLookupError> {
				let len = self.revs.len() as i32;

				if r < -len \|\| r >= len {
				return Err(KeyNotFound);
				}

				let r = if *r < 0 {
				(len + *r) as usize
				} else {
				*r as usize
				};

				self.make_sure_cached(&r);

				return Ok(r);
				}

				fn is_general_delta(&self) -> bool {
				(self.other_flags & FLAG_GENERALDELTA) > 0
				}

				fn get_content(&self, f: &mut io::BufReader<fs::File>, r: &usize) -> Option<Vec<u8>> {
				let r = *r;

				self.make_sure_cached(&r);

				if let Cached(ref hd) = *self.revs[r].borrow() {
				let req_len = hd.len_compressed as usize;

				if req_len == 0 {
				return None;
				}

				let mut buf: Vec<u8> = Vec::new();
				buf.resize(req_len, 0);

				let ofs: u64 = if r == 0 {
				0_u64 + Entry::packed_size() as u64
				} else {
				match self.dflag {
				Inline => hd.offset() + (r * (Entry::packed_size() as usize)) as u64,
				Separated(_) => hd.offset(),
				}
				};

				f.seek(SeekFrom::Start(ofs)).unwrap();

				f.read(&mut buf[..]).unwrap();

				let flag_byte = buf[0];

				match flag_byte {
				120 => {
				let mut dec = ZlibDecoder::new(&buf[..]);

				let mut out_buf: Vec<u8> = Vec::new();
				//out_buf.resize(hd.len_compressed as usize, 0);
				dec.read_to_end(&mut out_buf).unwrap();

				return Some(out_buf);
				}
				0 => {
				return Some(buf);
				}
				115 => {
				return Some(buf[1..].to_vec());
				}
				_ => {
				return None;
				}
				}
				} else {
				panic!("read delta content failed.");
				}
				}

				fn get_all_bins(&self, rs: &[usize]) -> Patch {
				assert_ne!(rs.len(), 0);

				let mut fhandle = BufReader::new(match self.dflag {
				Inline => fs::File::open(self.index_file.as_path()).unwrap(),
				indygregUnsubmitted Not Done IIRC, core Mercurial keeps an open file handle on revlogs and ensures we don't run out of file handles by not keeping too many revlogs open at the same time. For scanning operations, not having to open and close the file handles all the time will make a difference for performance. Also, core Mercurial loads the entirety of the `.i` file into memory. That's a scaling problem for large revlogs. But it does make performance of index lookups really fast. indygreg: IIRC, core Mercurial keeps an open file handle on revlogs and ensures we don't run out of file…
				IvzhhAuthorUnsubmitted Not Done I think it explains why in mercurial repo, rust version is significantly faster. I am working on cpu future, but I did not finalize design style yet. I will keep working on that. Ivzhh: I think it explains why in mercurial repo, rust version is significantly faster. I am working…
				Separated(ref dfile) => fs::File::open(dfile).unwrap(),
				});

				let mut it = rs.iter().rev();

				let base_r = it.next().unwrap();
				let base = self.get_content(&mut fhandle, base_r).unwrap();

				let mut bins: Vec<Vec<u8>> = Vec::with_capacity(rs.len() - 1);

				while let Some(ref chld_r) = it.next() {
				if let Some(bin) = self.get_content(&mut fhandle, chld_r) {
				bins.push(bin);
				} else {
				indygregUnsubmitted Not Done A thread pool to help with zlib decompression should go a long way here. Probably too early to think about this, but we'll likely eventually want a global thread pool for doing I/O and CPU expensive tasks, such as reading chunks from a revlog and decompressing them. FWIW, we're going to radically alter the storage format in order to better support shallow clones. But that work hasn't started yet. I still think there is a benefit to implementing the revlog code in Rust though. indygreg: A thread pool to help with zlib decompression should go a long way here. Probably too early to…
				IvzhhAuthorUnsubmitted Not Done I guess I did this because I met some empty change delta in the beginning. I think I won't try to parallelize unzip for now. Ivzhh: I guess I did this because I met some empty change delta in the beginning. I think I won't try…
				}
				}

				return Patch { base, bins };
				}
				}

				impl Revlog for RevlogIO {
				fn node_id_to_rev(&self, id: &NodeId) -> Result<i32, NodeLookupError> {
				let rev = {
				if let Some(st) = self.node2rev.get(id) {
				Ok(st)
				} else {
				Err(KeyNotFound)
				}
				};

				match rev {
				Ok(r) => Ok(*r),
				Err(err) => Err(err),
				}
				}

				fn rev(&self, r: &i32) -> Result<&RefCell<RevEntry>, NodeLookupError> {
				let r = self.prepare_id(r).unwrap();

				Ok(&self.revs[r])
				}

				fn delta_chain(&self, r: &i32) -> Result<Vec<usize>, NodeLookupError> {
				let mut r = self.prepare_id(r).unwrap();

				let mut res = vec![r];

				loop {
				if let Cached(ref rev) = *self.revs[r].borrow() {
				if (r == (rev.base_rev as usize)) \|\| (r == 0) {
				break;
				}

				r = if self.is_general_delta() {
				rev.base_rev as usize
				} else {
				r - 1
				};
				res.push(r);

				self.make_sure_cached(&r);
				} else {
				panic!("the rev must has been cached.");
				}
				}

				return Ok(res);
				}

				fn revision(&self, id: &i32) -> Result<Vec<u8>, NodeLookupError> {
				if let Ok(chn) = self.delta_chain(id) {
				let ptc = self.get_all_bins(&chn);
				let res = patches(&ptc);
				return Ok(res);
				} else {
				return Err(KeyNotFound);
				}
				}

				fn tip(&self) -> usize {
				return self.revs.len() - 1;
				}

				fn check_hash(&self, text: &[u8], p1: &NodeId, p2: &NodeId) -> NodeId {
				let mut s = Sha::new();

				if p2.node == NULL_ID.node {
				s.update(&NULL_ID.node);
				s.update(&p1.node);
				} else {
				let (a, b) = if p1.node < p2.node {
				(p1, p2)
				} else {
				(p2, p1)
				};
				s.update(&a.node);
				s.update(&b.node);
				}
				s.update(text);
				return NodeId::new(s.digest().bytes());
				}

				fn p1(&self, rev: &i32) -> i32 {
				return self.rev(rev).unwrap().borrow().p1() as i32;
				}

				fn p1_nodeid(&self, rev: &i32) -> NodeId {
				let prev = self.rev(rev).unwrap().borrow().p1() as i32;

				if prev == -1 {
				return NULL_ID.clone();
				} else {
				return self.rev(&prev).unwrap().borrow().node_id().clone();
				}
				}

				fn p2(&self, rev: &i32) -> i32 {
				return self.rev(rev).unwrap().borrow().p2() as i32;
				}

				fn p2_nodeid(&self, rev: &i32) -> NodeId {
				let prev = self.rev(rev).unwrap().borrow().p2() as i32;

				if prev == -1 {
				return NULL_ID.clone();
				} else {
				return self.rev(&prev).unwrap().borrow().node_id().clone();
				}
				}

				fn node_id(&self, rev: &i32) -> NodeId {
				if *rev == -1 {
				return NULL_ID.clone();
				} else {
				return self.rev(rev).unwrap().borrow().node_id().clone();
				}
				}

				fn create(&self, index_file: &Path) -> Arc<RwLock<Revlog>> {
				return Arc::new(RwLock::new(RevlogIO::new(index_file)));
				}
				}

rust/hgstorage/src/working_context.rs

This file was added.

				use std::path::PathBuf;
				use std::io::prelude::*;
				use std::fs;
				use std::collections::HashMap;
				use std::collections::HashSet as Set;
				use std::sync::{Arc, Mutex, RwLock};

				use threadpool::ThreadPool;
				use num_cpus;

				use dirstate::{CurrentState, DirState};
				use local_repo::LocalRepo;
				use manifest::{FlatManifest, ManifestEntry};
				use changelog::ChangeLog;

				pub struct WorkCtx {
				pub dirstate: Arc<RwLock<DirState>>,
				pub file_revs: HashMap<PathBuf, ManifestEntry>,
				}

				impl WorkCtx {
				pub fn new(
				dot_hg_path: Arc<PathBuf>,
				manifest: Arc<FlatManifest>,
				changelog: Arc<ChangeLog>,
				) -> Self {
				let dirstate = DirState::new(dot_hg_path.join("dirstate"));

				let manifest_id = changelog.get_commit_info(&dirstate.p1);

				let rev = manifest
				.inner
				.read()
				.unwrap()
				.node_id_to_rev(&manifest_id.manifest_id)
				.unwrap();

				let file_revs = manifest.build_file_rev_mapping(&rev);

				let dirstate = Arc::new(RwLock::new(dirstate));

				Self {
				dirstate,
				file_revs,
				}
				}

				pub fn status(&self, repo: &LocalRepo) -> CurrentState {
				let mut state = self.dirstate
				.write()
				.unwrap()
				.walk_dir(repo.repo_root.as_path(), &repo.matcher);

				if !state.lookup.is_empty() {
				let ncpus = num_cpus::get();

				let nworkers = if state.lookup.len() < ncpus {
				state.lookup.len()
				} else {
				ncpus
				};

				let pool = ThreadPool::new(nworkers);

				let clean = Arc::new(Mutex::new(Set::new()));
				let modified = Arc::new(Mutex::new(Set::new()));

				for f in state.lookup.drain() {
				let rl = repo.get_filelog(f.as_path());
				let fl = Arc::new(repo.repo_root.join(f.as_path()));

				let (id, p1, p2) = {
				let id = &self.file_revs[f.as_path()].id;
				let gd = rl.read().unwrap();
				let rev = gd.node_id_to_rev(id).unwrap();

				let p1 = gd.p1_nodeid(&rev);
				let p2 = gd.p2_nodeid(&rev);
				(id.clone(), p1, p2)
				};

				let clean = clean.clone();
				let modified = modified.clone();

				pool.execute(move \|\| {
				let mut wfile = fs::File::open(fl.as_path()).unwrap();
				let mut content = Vec::<u8>::new();
				wfile.read_to_end(&mut content).unwrap();
				if rl.read().unwrap().check_hash(&content, &p1, &p2) == id {
				clean.lock().unwrap().insert(f);
				} else {
				modified.lock().unwrap().insert(f);
				}
				});
				}

				pool.join();
				assert_eq!(pool.panic_count(), 0);

				let mut gd = modified.lock().unwrap();
				state.modified.extend(gd.drain());
				let mut gd = clean.lock().unwrap();
				state.clean.extend(gd.drain());
				}

				return state;
				}
				}