Download Raw Diff

Details

Reviewers

sid0
jsgf
kulshrax
mbthomas
ryanmce

Group Reviewers

Restricted Project

Commits

rFBHGXc0492b73c7ef: vlqencoding: encodes integers to variable-length byte arrays

Summary

This is a common technique to store variable-length integers efficiently.
It's compatible with both Thrift and Protobuf [1].

It's intended to be used in:

On-disk file format to make the file compact and avoid issues like https://bz.mercurial-scm.org/5681 (Obsolete markers code crashes with metadata keys/values longer than 255 bytes).
Thrift layer.

[1]: https://developers.google.com/protocol-buffers/docs/encoding#varints

Test Plan

cargo test
cargo clippy

Also ran a kcov coverage check and it says 100%.

cargo rustc --lib --profile test -- -Ccodegen-units=1 -Clink-dead-code -Zno-landing-pads
kcov --include-path $PWD/src --verify target/kcov ./target/debug/*-????????????????

Diff Detail

Repository

rFBHGX Facebook Mercurial Extensions

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Oct 3 2017, 9:41 PM

Herald added a reviewer: Restricted Project. · View Herald TranscriptOct 3 2017, 9:41 PM

quark updated this revision to Diff 2396.Oct 3 2017, 9:49 PM

quark added reviewers: sid0, jsgf, kulshrax.Oct 3 2017, 9:55 PM

ryanmce added a subscriber: ryanmce.Oct 4 2017, 6:34 AM

ryanmce added inline comments.

rust/vlq/src/lib.rs
16 ↗	(On Diff #2396)	I don't know too much about rust, but it seems like it would be great to have a type associated with the `Vec<u8>` to make it safer and more self-explanatory to use.
34–51 ↗	(On Diff #2396)	nit: why `///` rather than `/**` style comments?

quark added inline comments.Oct 4 2017, 12:05 PM

rust/vlq/src/lib.rs
34–51 ↗	(On Diff #2396)	Rust documentation style is different. `///` is what Rust stdlib uses. By having an `example` section, the code could be doctest-ed by running `cargo test`.

quark added inline comments.Oct 4 2017, 12:23 PM

rust/vlq/src/lib.rs
16 ↗	(On Diff #2396)	I imagined the caller end up with `x::ByteArray`, `y::ByteArray`, `vlq::ByteArray`. It seems redundant to have that many types. I'm not sure about the best practice here. Question for more experienced Rust reviewers.
52 ↗	(On Diff #2396)	Maybe it's better to use `Result<u64>` here to detect the error case. But I'm not sure how much runtime overhead that will introduce.

I don't mean vlq::byteareay, but vlq::isn't or something, which happens to be a byteareay under the covers. But I also don't know what the "oxidized" way to do it is.

You should consider adding method overrides for Write and Read in the same way as the byteorder crate does. See https://docs.rs/byteorder/1.1.0/byteorder/trait.WriteBytesExt.html for documentation and example. What I'd like to be able to do is:

filehandle.write_vlq(12345)?;

and

let n = filehandle.read_vlq()?;

This is actually a really useful concept. Does something like this already exist as a crate? If not, can we publish this as a crate to use in other Rust code?

rust/vlq/src/lib.rs
16 ↗	(On Diff #2396)	`Vec<u8>` is kind of special in that it's considered the de-facto Rust variable-length buffer type. For example, `std::io::Read` contains a method to read directly into a `Vec<u8>`: std::io::Read::read_to_end.
52 ↗	(On Diff #2396)	`Result` type is cheap, so you should use it if you can (from what I understand, the overhead is about the same as the cost of passing back an additional boolean return value, and having the caller check it). If the caller is reading from a file, they will be using `io::Result` all over the place anyway. Use the `error-chain` crate to define errors easily.
52 ↗	(On Diff #2396)	`decode` should take a slice (`&[u8]`) so that it can decode from within any buffer. `Vec<u8>` can decay automatically to `&[u8]` depending on context anyway.

This revision now requires changes to proceed.Oct 5 2017, 11:20 AM

mbthomas added inline comments.Oct 5 2017, 1:38 PM

rust/vlq/src/lib.rs
52 ↗	(On Diff #2396)	Actually, I don't like the signature of this function at all. You should prefer tuple returns to mutable in/out parameters. The latter aren't very Rustish. Something like: fn decode(buf: &[u8]) -> (u64, usize);

In D929#15942, @mbthomas wrote:

You should consider adding method overrides for Write and Read in the same way as the byteorder crate does. See https://docs.rs/byteorder/1.1.0/byteorder/trait.WriteBytesExt.html for documentation and example. What I'd like to be able to do is:

Sorry for the late response. I was distracted by oncall stuff.

My original idea is to read (or mmap) a flat file to Vec<u8> and vlq operates on that memory buffer, similar to what Mercurial does with revlog.i and linelog does for linelog.l. So there is no file handler involved. I think it's generally more flexible to decouple computation logic from I/O.

I noticed that Vec<u8> does not implement WriteBytesExt. We can implement a wrapper around Vec<u8> that implements WriteBytesExt, but that might make things unnecessarily complex.

@jsgf: How do you think?

fn decode(buf: &[u8]) -> (u64, usize);

This is a good point. I forgot that Rust could return a tuple and was a bit concerned about the slicing overhead. I'll make the change.

jsgf requested changes to this revision.Oct 10 2017, 5:48 PM

jsgf added inline comments.

rust/vlq/src/lib.rs
16 ↗	(On Diff #2396)	The trouble with returning `Vec<u8>` is that requires a new `Vec` to be allocated for every `u64`, which is pretty expensive - and the chances are the caller won't be able to use it directly in that form. It would be better to use the `Write` trait which generalizes over a stream of output - as @mbthomas mentioned, `Vec<u8>` implements this (along with `Read`) so it's still easy for callers. The only downside is that it can fail with `io::Result` so you need to propagate that up, even if a write to - say - `Vec<u8>` can never actually fail. For example: use std::io::{self, Write}; pub fn encode<W: Write>(value: u64, out: &mut W) -> io::Result<()> { ... }
52 ↗	(On Diff #2396)	A more common signature for this would be: pub fn decode(buf: &[u8]) -> Result<(u64, &[u8]), E> { ... } ie, successful result returns the value and a slice representing the remaining input. It would also be possible to use `std::io::Cursor`, but that's probably overkill here. The error type `E` would need to represent at least too-short input (we ran out of input before seeing the high bit set) and overflow (the number overflowed before we saw the high bit).
55–62 ↗	(On Diff #2396)	This is fairly inefficient because `buf[pos]` will do an array bounds check for each byte. I'd do something like: const MAX: usize = 10; struct Bad; pub fn decode(input: &[u8]) -> Result<(u64, &[u8]), Bad> { let mut res = 0; let buf = &input[..cmp::min(MAX, input.len())]; for (idx, x) in buf.iter().enumerate() { res \|= ((x & 127) as u64) << (idx 7); if x & 128 == 0 { return Ok((res, &input[idx+1..])); } } Err(Bad) } where `MAX` is the max possible encoding length of u64. This doesn't distinguish the short input from overflow. (There's probably something fancier using more combinators, but this will be fine.)
72–75 ↗	(On Diff #2396)	I would use quickcheck to generate these kinds of round-trip test cases (perhaps in addition to the hand-picked ones).

Getting more fancy, you could define VLQEncode/VLQDecode traits, then implement them for u64 and i64 (with zigzag), and perhaps other integer types if it makes sense/is useful.

Thanks for the very detailed explanation!

I made a mistake thinking WriteBytesExt was in stdlib and only checked Vec from stdlib. I'll revise the interface.

quark updated this revision to Diff 2574.Oct 11 2017, 2:31 AM

quark updated this revision to Diff 2575.Oct 11 2017, 2:36 AM

quark updated this revision to Diff 2576.Oct 11 2017, 3:22 AM

I did some benchmarks encoding 0..100000 to a Vec::with_capacity(283488) - Vec does be very slow. So the increased complexity is worthwhile!

return_vec	3,252,425 ns/iter (+/- 518,303)
return_slice	1,414,582 ns/iter (+/- 377,286)
write_u64_buffered	946,138 ns/iter (+/- 213,011)
write_u64_unbuffered	902,002 ns/iter (+/- 92,514)
write_num_traits_buffered	1,009,901 ns/iter (+/- 183,366)
write_num_traits_unbuffered (current version)	907,168 ns/iter (+/- 58,427)

I like the Cursor idea since that makes the return value simpler and more intuitive.

num-traits seems to provide some flexibility (ex. might be easier to support BigInt in the future) at the cost of a slight overhead. Or maybe we can just use u64, which also sounds reasonable since our main target platform is amd64.

I didn't put zigzag support in this version. If we continue using num-traits, zigzag may be implemented as a separate trait:

trait ZigZagInt {
   type Output;
   fn to_zigzag(&self) -> Output;
}

And x.to_zigzag().write_vlq(...) may be neat enough. Or we can do something like impl<T: PrimInt + ZigZagInt> VLQEncode for T to merge the two features.

If we do not use num-traits, we can just implement VLQ encoder/decoder for u64 and i64 and cast everything to one of them.

rust/vlq/src/lib.rs
63 ↗	(On Diff #2576)	I'm not sure if this is a good idea or not. It seems with `impl<T: TraitX>`, `impl<T: TraitY>` becomes impossible, because they may overlap. But maybe that can be solved in a newer rust.
74 ↗	(On Diff #2576)	I benched multiple `write`s vs a single `write` with a pre-allocated buffer in stack. To my surprise, the former is a bit faster: test vlq::tests::bench_write_vlq_num_traits ... bench: 1,085,880 ns/iter (+/- 92,873) test vlq::tests::bench_write_vlq_num_traits_multi_writes ... bench: 929,348 ns/iter (+/- 106,070) So I'll leave the multi-write version. The pre-allocate version has difficulty figuring out the buffer size. Ideally it's `(8 * mem::size_of::<T>() + 6) / 7`, but Rust disallows `T` in a constant expression and associated constant is still experimental.

This looks mostly ok to me, but I think @jsgf should weigh in again.

I'd also like to reiterate that I think this has more wide appeal than just in mercurial and we should consider sharing more widely. Unfortunately there is already a crate on crates.io called vlq (similar thing but with a different encoding).

rust/vlq/src/lib.rs
63 ↗	(On Diff #2576)	It seems a bit odd to have `VLQEncode` on the integer type rather than the `Write` type.
106–108 ↗	(On Diff #2576)	This pattern can be more succinctly written as: base = base.checked_mul(&base_multiplier).ok_or(DecodeError::Overflow)?; `Option::ok_or(e)` translates `Some(x)` into `Ok(x)` and `None` into `Err(e)`. The `?` operator unwraps an `Ok` value and returns an `Err`.
123 ↗	(On Diff #2576)	You can also use `x.write_vlq(&mut v).expect(msg);` with the added advantage that you can write a message for when it fails.

quark added inline comments.Oct 11 2017, 1:20 PM

rust/vlq/src/lib.rs
63 ↗	(On Diff #2576)	Good point! I think it should be on the writer type for symmetry.
106–108 ↗	(On Diff #2576)	Sounds good. I guess line 100 could also be simplified. I'll scan Rust doc.
123 ↗	(On Diff #2576)	Nice to know!

In D929#16782, @mbthomas wrote:

I'd also like to reiterate that I think this has more wide appeal than just in mercurial and we should consider sharing more widely. Unfortunately there is already a crate on crates.io called vlq (similar thing but with a different encoding).

I agree. It could be something like vlqencoding. I think the source of truth could just be this repo so we have monorepo advantage - atomic refactoring, etc.

quark edited the summary of this revision. (Show Details)Oct 11 2017, 1:35 PM

quark updated this revision to Diff 2600.Oct 11 2017, 4:44 PM

quark updated this revision to Diff 2601.

quark edited the summary of this revision. (Show Details)Oct 11 2017, 4:48 PM

quark retitled this revision from vlq: add a library that encodes integers to variable-length byte arrays to vlqencoding: encodes integers to variable-length byte arrays.

quark updated this revision to Diff 2602.

ryanmce resigned from this revision.Oct 13 2017, 10:19 AM

jsgf requested changes to this revision.Oct 13 2017, 6:22 PM

jsgf added inline comments.

rust/vlq/src/lib.rs
63 ↗	(On Diff #2576)	Agreed.
68–69 ↗	(On Diff #2576)	This is looking pretty unpleasant. I was thinking more along the lines of having a helper function to implement this for `u64`, and then just manually implement it for each type: pub fn encode_u64<W: Write>(v: u64, out: &mut W) -> Result<()> { ... } impl VLQEncode for u64 { fn write_vlq(&self, writer: &mut W) -> Result<()> { encode_u64(self, writer) } } impl VLQEncode for u32 { fn write_vlq(&self, writer: &mut W) -> Result<()> { encode_u64(self as u64, writer) } } //... It's a bit cut'n'paste, but I think it's cleaner overall (and it would be a simple macro). It also makes implementing zigzag versions for `i64`/`i32`/... much more straightforward. Also, since all the interesting types for this are `Copy`, you can make the trait take `self` (rather than `&self`) as it would be cleaner to just pass in values by value than reference. If you later want to implement it for a complex type, then you can implement `impl<'a> VLQEncode for &'a ComplexType` later on.
70 ↗	(On Diff #2574)	Having an iterator of indexes like this is pretty much an antipattern in Rust. It's possible the compiler could work out that the range is bounded and eliminate the array bounds check below, but if it doesn't it makes this pretty inefficient.

This revision now requires changes to proceed.Oct 13 2017, 6:22 PM

quark updated this revision to Diff 2834.Oct 16 2017, 2:32 PM

quark updated this revision to Diff 2835.

quark updated this revision to Diff 3044.Oct 19 2017, 6:43 PM

quark added a child revision: D1203: radixbuf: implement basic serialization logic.Oct 20 2017, 8:01 PM

mbthomas requested changes to this revision.Nov 2 2017, 12:24 PM

mbthomas added inline comments.

rust/vlqencoding/src/lib.rs
30	nit: I think this is called `zig-zag`.
156	I don't think this works for the most negative integer values. e.g. for `i8`, `-128` is encoded as `255u8`. `(255u8 >> 1) + 1` is 128, which will cause overflow (and panic in debug) if you try to convert it to `i8` and then negate it. You can use: ((n >> 1) as $T) ^ -((n & 1) as $T) which also has the advantage of working for both positive and negative numbers, so it doesn't need the `if` block.
171	I'd like to see tests for limits (in particular `i{8,16,32,64}::min_value()`, after my comments above).

This revision now requires changes to proceed.Nov 2 2017, 12:24 PM

quark added inline comments.Nov 2 2017, 5:37 PM

rust/vlqencoding/src/lib.rs
30	Thanks!
156	Good catch and nice bit expression! Although tests are passing even for the `-128` case, it seems an undefined behavior.
171	I will add tests around every bit which will cover `{integer}::{MIN,MAX}`.

quark edited the test plan for this revision. (Show Details)Nov 2 2017, 5:37 PM

quark updated this revision to Diff 3241.

quark updated this revision to Diff 3242.Nov 2 2017, 5:46 PM

LGTM. I plan to use this, too.

Thanks!

Closed by commit rFBHGXc0492b73c7ef: vlqencoding: encodes integers to variable-length byte arrays (authored by quark). · Explain WhyNov 10 2017, 2:13 PM

This revision was automatically updated to reflect the committed changes.

		Path
M		.hgignore (1 line)
A	M	rust/.hgignore (2 lines)
A	M	rust/vlqencoding/Cargo.toml (8 lines)
A	M	rust/vlqencoding/src/lib.rs (269 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	2395		Oct 3 2017, 9:41 PM	★	★
Diff 2	2396		Oct 3 2017, 9:48 PM	★	★
Diff 3	2574		Oct 11 2017, 2:31 AM	★	★
Diff 4	2575		Oct 11 2017, 2:36 AM	★	★
Diff 5	2576		Oct 11 2017, 3:22 AM	★	★
Diff 6	2600		Oct 11 2017, 4:44 PM	★	★
Diff 7	2601		Oct 11 2017, 4:46 PM	★	★
Diff 8	2602		Oct 11 2017, 4:49 PM	★	★
Diff 9	2834		Oct 16 2017, 2:32 PM	★	★
Diff 10	2835		Oct 16 2017, 2:33 PM	★	★
Diff 11	3044		Oct 19 2017, 6:43 PM	★	★
Diff 12	3241		Nov 2 2017, 5:37 PM	★	★
Diff 13	3242		Nov 2 2017, 5:46 PM	★	★
Diff 14	3400	rFBHGXc0492b73c7efb4ce78b925c419762b73f42222e1	Nov 10 2017, 2:12 PM	★	★

Status	Author	Revision
Abandoned	quark	D1219 radixbuf: add serialization support for linked nodes
Abandoned	quark	D1218 radixbuf: add serialization support for keys (byte arrays)
Abandoned	quark	D1203 radixbuf: implement basic serialization logic
Closed	quark	D929 vlqencoding: encodes integers to variable-length byte arrays

Diff 3400

.hgignore

	.idea			.idea
	.testtimes*			.testtimes*
	^hgext3rd/.*\.c$			^hgext3rd/.*\.c$
	^hgext3rd/traceprof\.c.*$			^hgext3rd/traceprof\.c.*$
	^cdatapack/cdatapack_dump$			^cdatapack/cdatapack_dump$

	subinclude:cfastmanifest/.hgignore			subinclude:cfastmanifest/.hgignore
	subinclude:linelog/.hgignore			subinclude:linelog/.hgignore
				subinclude:rust/.hgignore

rust/.hgignore

This file was added.

				target/
				vlqencoding/Cargo.lock

rust/vlqencoding/Cargo.toml

This file was added.

				[package]
				name = "vlqencoding"
				version = "0.1.0"

				[dependencies]

				[dev-dependencies]
				quickcheck = "0.4"

rust/vlqencoding/src/lib.rs

This file was added.

				// Copyright 2017 Facebook, Inc.
				//
				// This software may be used and distributed according to the terms of the
				// GNU General Public License version 2 or any later version.

				//! VLQ (Variable-length quantity) encoding.

				#[cfg(test)]
				#[macro_use]
				extern crate quickcheck;

				use std::mem::size_of;
				use std::io::{self, Read, Write};

				pub trait VLQEncode<T> {
				/// Encode an integer to a VLQ byte array and write it directly to a stream.
				///
				/// # Examples
				///
				/// ```
				/// use vlqencoding::VLQEncode;
				/// let mut v = vec![];
				///
				/// let x = 120u8;
				/// v.write_vlq(x).expect("writing an encoded u8 to a vec should work");
				/// assert_eq!(v, vec![120]);
				///
				/// let x = 22742734291u64;
				/// v.write_vlq(x).expect("writing an encoded u64 to a vec should work");
				///
				mbthomasUnsubmitted Not Done nit: I think this is called `zig-zag`. mbthomas: nit: I think this is called `zig-zag`.
				quarkAuthorUnsubmitted Not Done Thanks! quark: Thanks!
				/// assert_eq!(v, vec![120, 211, 171, 202, 220, 84]);
				/// ```
				///
				/// Signed integers are encoded via zig-zag:
				///
				/// ```
				/// use vlqencoding::VLQEncode;
				/// let mut v = vec![];
				///
				/// let x = -3i8;
				/// v.write_vlq(x).expect("writing an encoded i8 to a vec should work");
				/// assert_eq!(v, vec![5]);
				///
				/// let x = 1000i16;
				/// v.write_vlq(x).expect("writing an encoded i16 to a vec should work");
				/// assert_eq!(v, vec![5, 208, 15]);
				/// ```
				fn write_vlq(&mut self, value: T) -> io::Result<()>;
				}

				pub trait VLQDecode<T> {
				/// Read a VLQ byte array from stream and decode it to an integer.
				///
				/// # Examples
				///
				/// ```
				/// use vlqencoding::VLQDecode;
				/// use std::io::{Cursor,Seek,SeekFrom,ErrorKind};
				///
				/// let mut c = Cursor::new(vec![120u8, 211, 171, 202, 220, 84]);
				///
				/// let x: Result<u8, _> = c.read_vlq();
				/// assert_eq!(x.unwrap(), 120u8);
				///
				/// let x: Result<u16, _> = c.read_vlq();
				/// assert_eq!(x.unwrap_err().kind(), ErrorKind::InvalidData);
				///
				/// c.seek(SeekFrom::Start(1)).expect("seek should work");
				/// let x: Result<u64, _> = c.read_vlq();
				/// assert_eq!(x.unwrap(), 22742734291u64);
				/// ```
				///
				/// Signed integers are decoded via zig-zag:
				///
				/// ```
				/// use vlqencoding::VLQDecode;
				/// use std::io::{Cursor,Seek,SeekFrom,ErrorKind};
				///
				/// let mut c = Cursor::new(vec![5u8, 208, 15]);
				///
				/// let x: Result<i8, _> = c.read_vlq();
				/// assert_eq!(x.unwrap(), -3i8);
				///
				/// let x: Result<i8, _> = c.read_vlq();
				/// assert_eq!(x.unwrap_err().kind(), ErrorKind::InvalidData);
				///
				/// c.seek(SeekFrom::Start(1)).expect("seek should work");
				/// let x: Result<i32, _> = c.read_vlq();
				/// assert_eq!(x.unwrap(), 1000i32);
				/// ```
				fn read_vlq(&mut self) -> io::Result<T>;
				}

				macro_rules! impl_unsigned_primitive {
				($T: ident) => (
				impl<W: Write> VLQEncode<$T> for W {
				fn write_vlq(&mut self, value: $T) -> io::Result<()> {
				let mut buf = [0u8];
				let mut value = value;
				loop {
				let mut byte = (value & 127) as u8;
				let next = value >> 7;
				if next != 0 {
				byte \|= 128;
				}
				buf[0] = byte;
				self.write_all(&buf)?;
				value = next;
				if value == 0 {
				break;
				}
				}
				Ok(())
				}
				}

				impl<R: Read> VLQDecode<$T> for R {
				fn read_vlq(&mut self) -> io::Result<$T> {
				let mut buf = [0u8];
				let mut value = 0 as $T;
				let mut base = 1 as $T;
				let base_multiplier = (1 << 7) as $T;
				loop {
				self.read_exact(&mut buf)?;
				let byte = buf[0];
				value = ($T::from(byte & 127)).checked_mul(base)
				.and_then(\|v\| v.checked_add(value))
				.ok_or(io::ErrorKind::InvalidData)?;
				if byte & 128 == 0 {
				break;
				}
				base = base.checked_mul(base_multiplier).ok_or(io::ErrorKind::InvalidData)?;
				}
				Ok(value)
				}
				}
				)
				}

				impl_unsigned_primitive!(usize);
				impl_unsigned_primitive!(u64);
				impl_unsigned_primitive!(u32);
				impl_unsigned_primitive!(u16);
				impl_unsigned_primitive!(u8);

				macro_rules! impl_signed_primitive {
				($T: ty, $U: ty) => (
				impl<W: Write> VLQEncode<$T> for W {
				fn write_vlq(&mut self, v: $T) -> io::Result<()> {
				self.write_vlq(((v << 1) ^ (v >> (size_of::<$U>() * 8 - 1))) as $U)
				}
				}

				impl<R: Read> VLQDecode<$T> for R {
				fn read_vlq(&mut self) -> io::Result<$T> {
				(self.read_vlq() as Result<$U, _>).map (\|n\| {
				mbthomasUnsubmitted Not Done I don't think this works for the most negative integer values. e.g. for `i8`, `-128` is encoded as `255u8`. `(255u8 >> 1) + 1` is 128, which will cause overflow (and panic in debug) if you try to convert it to `i8` and then negate it. You can use: ((n >> 1) as $T) ^ -((n & 1) as $T) which also has the advantage of working for both positive and negative numbers, so it doesn't need the `if` block. mbthomas: I don't think this works for the most negative integer values. e.g. for `i8`, `-128` is…
				quarkAuthorUnsubmitted Not Done Good catch and nice bit expression! Although tests are passing even for the `-128` case, it seems an undefined behavior. quark: Good catch and nice bit expression! Although tests are passing even for the `-128` case, it…
				((n >> 1) as $T) ^ -((n & 1) as $T)
				})
				}
				}
				)
				}

				impl_signed_primitive!(isize, usize);
				impl_signed_primitive!(i64, u64);
				impl_signed_primitive!(i32, u32);
				impl_signed_primitive!(i16, u16);
				impl_signed_primitive!(i8, u8);

				#[cfg(test)]
				mod tests {
				mbthomasUnsubmitted Not Done I'd like to see tests for limits (in particular `i{8,16,32,64}::min_value()`, after my comments above). mbthomas: I'd like to see tests for limits (in particular `i{8,16,32,64}::min_value()`, after my comments…
				quarkAuthorUnsubmitted Not Done I will add tests around every bit which will cover `{integer}::{MIN,MAX}`. quark: I will add tests around every bit which will cover `{integer}::{MIN,MAX}`.
				use std::io::{self, Cursor, Seek, SeekFrom};
				use {VLQDecode, VLQEncode};

				macro_rules! check_round_trip {
				($N: expr) => (
				{
				let mut v = vec![];
				let mut x = $N;
				v.write_vlq(x).expect("write");

				let mut c = Cursor::new(v);
				let y = x;
				x = c.read_vlq().unwrap();
				x == y
				}
				)
				}

				#[test]
				fn test_round_trip_manual() {
				for i in (0..64)
				.flat_map(\|b\| vec![1u64 << b, (1 << b) + 1, (1 << b) - 1].into_iter())
				.chain(vec![0xb3a73ce2ff2, 0xab54a98ceb1f0ad2].into_iter())
				.flat_map(\|i\| vec![i, !i].into_iter())
				{
				assert!(check_round_trip!(i as i8));
				assert!(check_round_trip!(i as i16));
				assert!(check_round_trip!(i as i32));
				assert!(check_round_trip!(i as i64));
				assert!(check_round_trip!(i as isize));
				assert!(check_round_trip!(i as u8));
				assert!(check_round_trip!(i as u16));
				assert!(check_round_trip!(i as u32));
				assert!(check_round_trip!(i as u64));
				assert!(check_round_trip!(i as usize));
				}
				}

				#[test]
				fn test_read_errors() {
				let mut c = Cursor::new(vec![]);
				assert_eq!(
				(c.read_vlq() as io::Result<u64>).unwrap_err().kind(),
				io::ErrorKind::UnexpectedEof
				);

				let mut c = Cursor::new(vec![255, 129]);
				assert_eq!(
				(c.read_vlq() as io::Result<u64>).unwrap_err().kind(),
				io::ErrorKind::UnexpectedEof
				);

				c.seek(SeekFrom::Start(0)).unwrap();
				assert_eq!(
				(c.read_vlq() as io::Result<u8>).unwrap_err().kind(),
				io::ErrorKind::InvalidData
				);
				}

				#[test]
				fn test_zig_zag() {
				let mut c = Cursor::new(vec![]);
				for &(i, u) in [
				(0, 0),
				(-1, 1),
				(1, 2),
				(-2, 3),
				(-127, 253),
				(127, 254),
				(-128i8, 255u8),
				].iter()
				{
				c.seek(SeekFrom::Start(0)).expect("seek");
				c.write_vlq(i).expect("write");
				c.seek(SeekFrom::Start(0)).expect("seek");
				let x: u8 = c.read_vlq().unwrap();
				assert_eq!(x, u);
				}
				}

				quickcheck! {
				fn test_round_trip_u64_quickcheck(x: u64) -> bool {
				check_round_trip!(x)
				}

				fn test_round_trip_i64_quickcheck(x: i64) -> bool {
				check_round_trip!(x)
				}

				fn test_round_trip_u8_quickcheck(x: u8) -> bool {
				check_round_trip!(x)
				}

				fn test_round_trip_i8_quickcheck(x: i8) -> bool {
				check_round_trip!(x)
				}
				}
				}

This is an archive of the discontinued Mercurial Phabricator instance.

vlqencoding: encodes integers to variable-length byte arrays
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 3400

.hgignore

rust/.hgignore

rust/vlqencoding/Cargo.toml

rust/vlqencoding/src/lib.rs

This is an archive of the discontinued Mercurial Phabricator instance.

vlqencoding: encodes integers to variable-length byte arraysClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 3400

.hgignore

rust/.hgignore

rust/vlqencoding/Cargo.toml

rust/vlqencoding/src/lib.rs

vlqencoding: encodes integers to variable-length byte arrays
ClosedPublic

Revision Contents
Changeset List