This is an archive of the discontinued Mercurial Phabricator instance.

Differential D1291

radixbuf: implement the main radix tree
ClosedPublic

Authored by quark on Nov 2 2017, 2:45 PM.

Download Raw Diff

Details

Reviewers

durham

Group Reviewers

Restricted Project

Commits

rFBHGX747212d2dd06: radixbuf: implement the main radix tree

Summary

Implement the main radix tree with tests. A quick benchmark shows the
insertion performance is similar to the known revlog.c implementation.

The time complexity is about O(N * log N) for inserting or looking up N
entries. The log part is because the prefix length is increasing. A rough
(not so accurate) real world benchmark is like:

N	Insert	Lookup	Checked Lookup [1]	Index Size
10k	0.70ms	0.25ms	0.36ms	0.25MB
20k	1.3ms	0.58ms	0.8ms	0.45MB
50k	4.9ms	1.9ms	2.6ms	1.1MB
100k	11ms	4.5ms	6.8ms	2.5MB
200k	26ms	13ms	17ms	4.9MB
500k	68ms	46ms	54ms	11MB
1M	170ms	130ms	150ms	24MB
2M	420ms	300ms	350ms	51MB
5M	1.2s	0.9s	1.1s	110MB
10M	2.7s	2.3s	2.7s	220MB
20M	6.2s	5.1s	5.8s	490MB
50M	19s	16s	18s	1.2GB

[1]: After lookup, verify the key id maps to the key. Can be skipped if key
length is fixed and index data could be trusted.

Test Plan

cargo test --lib. Also use kcov to make sure every line is covered
except for return false in quickcheck functions, or things requiring a
buffer size that exceeds u64.

cargo rustc --lib --profile test -- -Ccodegen-units=1 -Clink-dead-code -Zno-landing-pads
kcov --include-path $PWD/src --verify target/kcov ./target/debug/*-????????????????

Diff Detail

Repository

rFBHGX Facebook Mercurial Extensions

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Nov 2 2017, 2:45 PM

Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 2 2017, 2:45 PM

quark added a child revision: D1292: radix204: implement prefix lookup.Nov 2 2017, 2:45 PM

quark edited the summary of this revision. (Show Details)Nov 2 2017, 3:02 PM

quark added a subscriber: durin42.Nov 2 2017, 3:03 PM

quark edited the summary of this revision. (Show Details)Nov 3 2017, 2:17 AM

quark updated this revision to Diff 3251.

quark edited the summary of this revision. (Show Details)Nov 3 2017, 4:04 AM

@durin42 I'm going to re-send my Rust code to internal Phabricator since Rust reviewers are much more active there and the code could be highlighted properly. I will probably send new Rust code directly there. Let me know if you still want copies to be sent here.

Frankly I don't see the point of doing the review in two places, since I won't be able to see things.

Can we please just get your rust reviewers to respond here instead of developing this in secret?

In D1291#21697, @durin42 wrote:

Frankly I don't see the point of doing the review in two places, since I won't be able to see things.
Can we please just get your rust reviewers to respond here instead of developing this in secret?

I have asked several Rust reviewers and they all prefer the internal instance. I just sent a few patches and put #rustreviewers as reviewer and got instant feedback from people I haven't explicitly notified. I think it's possible to ping certain people to check here but len(#rustreviewers) is 20+ that are harder to move here. (most of them are not doing Mercurial development).

Well, I'm 100% unwilling to participate in reviews that aren't happening in the open, so I guess come back with a giant code bomb and be prepared for it to take a *long* time to land in core. It's just too hard to coordinate.

quark edited the summary of this revision. (Show Details)Nov 6 2017, 2:10 PM

quark updated this revision to Diff 3296.

While trying to integrate with Python. The Shared type is causing trouble. I'm going to change the API to be lower-level.

quark edited the summary of this revision. (Show Details)Nov 11 2017, 1:33 AM

quark edited the test plan for this revision. (Show Details)

quark retitled this revision from radix204: implement the main radix tree to radix20: implement the main radix tree.

quark updated this revision to Diff 3419.

quark removed a child revision: D1292: radix204: implement prefix lookup.Nov 11 2017, 1:35 AM

I had to spend quite a lot of time understanding the logic. Probably this problem can be solved with just a few comments:

Mention why do we have 16 pointers in RadixNode
Mention how does this radix tree work - that we may have "fat" leafs, that they may need to be split etc.

quark edited the summary of this revision. (Show Details)Nov 15 2017, 8:55 PM

quark retitled this revision from radix20: implement the main radix tree to radixbuf: implement the main radix tree.

quark updated this revision to Diff 3546.

quark added a child revision: D1469: nodemap: implement nodemap in rust.Nov 20 2017, 7:11 PM

In D1291#22839, @stash wrote:

I had to spend quite a lot of time understanding the logic. Probably this problem can be solved with just a few comments:

Mention why do we have 16 pointers in RadixNode

Mention how does this radix tree work - that we may have "fat" leafs, that they may need to be split etc.

Sorry for the late response - got addicted writing Rust code. Will draw some ASCII graphs to explain things.

quark removed a parent revision: D1290: radixbuf: add a base16 iterator.Nov 21 2017, 1:35 AM

durham added a subscriber: durham.Nov 29 2017, 8:24 PM

durham added inline comments.

rust/radixbuf/src/radix.rs
11	Does this mean a KeyId can only be at most 2^31 instead of 2^32?
52	I'd make it clear that the match may not actually be a match for the sequence, and is instead the match for some prefix of the sequence. So the caller needs to check that the return 'i' is the length of KeyId, even if the first return parameter is not None.
53	Maybe make this sentence a bit clearer. I didn't realize that you were listing the contents of the follow state after the hyphen, so I was confused about what I was reading. Something like: Also returns the last follow state, which is useful for write operations. It is a tuple containing the RadixOffset, the position we reached within `seq`, and the last base16 index number. So a tuple of `(r, i, b)` means `buf[r.0 + b]` points to the result `KeyId` (if returned), and `b` represents the location representing `seq.nth(i)`.
71	Why `to_le` here? I assumed to_le was used when we have a value in memory and we want to serialize it to little endian to send somewhere that assumes it will receive little endian (so we take our in memory form and convert it to little endian for sending). But we don't appear to be doing that in this case.
95	Should we assert that the key id bit isn't set?
117	Looks like we call to_le before putting it in the buffer, and to_le after taking it out of the buffer. Should one of those be from_le? Sounds like they do the same thing, but would be less confusing if it was symmetrical.
264	Could we do tests that check for corruption handling? Like if the process is killed mid-write, what is the expected behavior when the next process tries to read the tree?

I've made it up to radix_insert_with_key. Will finish later this evening. Throwing back in your queue for now.

rust/radixbuf/src/radix.rs
52	Ignore that second sentence. The caller has to verify the key matches, but they can't do that via sequence length checking.
125	Maybe document what the `offset` parameter means.
127	The above code uses RB for the radix buffer parameter. Should it be consistent here?
141	I'd document all the parameters on the public facing functions, since it's not always clear what some of these do.
154	Should we use '?' Instead of unwrap here? Like you do in prefix_lookup below? Seems like a common code path that we wouldn't want to panic in.

This revision now requires changes to proceed.Nov 29 2017, 8:41 PM

quark added inline comments.Nov 29 2017, 9:11 PM

rust/radixbuf/src/radix.rs
11	Yes. I tried `u64` which removes some boundary checks but doubles the size of everything and it did introduce I/O overhead. Maybe we will need `u64` one day but probably not right now.
52	Good point.
53	Yeah, this is confusing but those states do need to be shared - I didn't come up with better ideas about the interface. Since this isn't user-facing, I didn't pay much attention to it. Will probably draw some ASCII graphs here.
71	`buf` is the raw mmap-ed buffer. So when we are reading or writing `buf`, we need to take care of endianness. I didn't pay much attention to `to_le` or `from_le` since they are the same. Will change some of them to `from_le`. Maybe we should use `be` everywhere so errors like missed a "to_be" will be caught.
95	Good idea.
117	Yes.
264	Errors like out-of-range access are covered by `test_errors`, which are basically what this library can provide at its best. Arbitrary data corruption (ex. some u32 is changed to `0` in a random radix node) cannot be detected. But simpler ones (ex. an offset is larger than the buffer size) can. Data integrity is a much more complex problem so I intentionally avoided them here - the library does not even have I/O code - all it speaks are raw buffers. It's up to the upper layer to guarantee data integrity. Currently we use atomic replace and tiprev + tipnode verification.

durham added inline comments.Nov 29 2017, 9:15 PM

rust/radixbuf/src/radix.rs
247	I wonder if writing them backwards like this has any perf implication, like on cache line reads or anything.
252	Or the keys are identical but the key_id is different. Not sure if that's worth special casing.

quark added inline comments.Nov 29 2017, 9:34 PM

rust/radixbuf/src/radix.rs
247	This is more about allowing concurrent read - no bad "pointer"s are exposed. It seems like a nice property to have although we don't really depend on that right now. I'll add a comment.
252	Yes. Will change the comment.

quark marked 19 inline comments as done.Dec 8 2017, 9:32 PM

quark updated this revision to Diff 4282.

quark added inline comments.Dec 8 2017, 9:53 PM

rust/radixbuf/src/radix.rs
95	I actually changed it a bit so `RadixNode` is not aligned but the offset is shifted by one when writing. This is more consistent with `KeyId` handling.
141	Added.
154	Good catch. I cannot recall why I used `unwrap` in the first place.

durham accepted this revision.Dec 14 2017, 7:16 PM

durham added inline comments.

rust/radixbuf/src/radix.rs
164	Why drop the #[inline]? Just curious
275	Why set `new_key = key` here? I guess the same question applies to new_key_id
307	You use `from_bin(&new_key)` (with the &) in the previous two call sites but not here?

This revision is now accepted and ready to land.Dec 14 2017, 7:16 PM

quark updated this revision to Diff 4450.Dec 14 2017, 9:21 PM

quark added inline comments.Dec 14 2017, 9:26 PM

rust/radixbuf/src/radix.rs
164	Jeremy's suggestion on a later diff. It's too large to inline.
275	Just make it more obvious since the following code uses `new_key` and `old_key`. So they are aligned.
307	Wrong codemod. Will remove `&` in previous code. Rust inserts `` automatically so the previous ones become `Base16Iter::from_bin(&new_key)`.

quark updated this revision to Diff 4459.Dec 14 2017, 10:15 PM

Closed by commit rFBHGX747212d2dd06: radixbuf: implement the main radix tree (authored by quark). · Explain WhyDec 15 2017, 2:35 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

		Path
M		rust/radixbuf/src/errors.rs (15 lines)
M		rust/radixbuf/src/lib.rs (1 line)
A	M	rust/radixbuf/src/radix.rs (593 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	3211		Nov 2 2017, 2:45 PM	★	★
Diff 2	3251		Nov 3 2017, 2:17 AM	★	★
Diff 3	3296		Nov 6 2017, 2:10 PM	★	★
Diff 4	3419		Nov 11 2017, 1:33 AM	★	★
Diff 5	3546		Nov 15 2017, 8:55 PM	★	★
Diff 6	4282		Dec 8 2017, 9:32 PM	★	★
Diff 7	4450		Dec 14 2017, 9:21 PM	★	★
Diff 8	4459		Dec 14 2017, 10:15 PM	★	★
Diff 9	4508	rFBHGX747212d2dd068c6b24502addf3cffaabe6cdc738	Dec 15 2017, 2:29 PM	★	★

Status	Author	Revision
Abandoned	quark	D1517 radixbuf: make key reader function more flexible
Closed	quark	D1472 clindex: integrate Rust nodemap
Closed	quark	D1471 setup: make rust nodemap buildable
Closed	quark	D1470 nodemap: implement Python interface
Closed	quark	D1493 pybuf: add a simple abstraction around Py_buffer interface
Closed	quark	D1469 nodemap: implement nodemap in rust
Closed	quark	D1291 radixbuf: implement the main radix tree
Closed	quark	D1432 radixbuf: add read and write functions for keys
Closed	quark	D1290 radixbuf: add a base16 iterator
Closed	quark	D1289 radixbuf: initial boilerplate

Diff 4508

rust/radixbuf/src/errors.rs

	// Copyright 2017 Facebook, Inc.			// Copyright 2017 Facebook, Inc.
	//			//
	// This software may be used and distributed according to the terms of the			// This software may be used and distributed according to the terms of the
	// GNU General Public License version 2 or any later version.			// GNU General Public License version 2 or any later version.

	use key::KeyId;			use key::KeyId;

	error_chain! {			error_chain! {
	foreign_links {			foreign_links {
	Io(::std::io::Error);			Io(::std::io::Error);
	}			}

	errors {			errors {
				OffsetOverflow(offset: u64) {
				description("offset overflow")
				display("offset {} is out of range", offset)
				}
				AmbiguousPrefix {
				description("ambiguous prefix")
				}
				PrefixConflict(key_id1: KeyId, key_id2: KeyId) {
				description("key prefix conflict")
				display("{:?} cannot be a prefix of {:?}", key_id1, key_id2)
				}
	InvalidKeyId(key_id: KeyId) {			InvalidKeyId(key_id: KeyId) {
	description("invalid key id")			description("invalid key id")
	display("{:?} cannot be resolved", key_id)			display("{:?} cannot be resolved", key_id)
	}			}
				InvalidBase16(x: u8) {
				description("invalid base16 value")
				display("{} is not a base16 value", x)
				}
	}			}
	}			}

rust/radixbuf/src/lib.rs

	extern crate test;			extern crate test;

	extern crate vlqencoding;			extern crate vlqencoding;

	pub mod errors;			pub mod errors;
	pub mod base16;			pub mod base16;
	pub mod key;			pub mod key;
	pub mod traits;			pub mod traits;
				pub mod radix;

rust/radixbuf/src/radix.rs

This file was added.

				// Copyright 2017 Facebook, Inc.
				//
				// This software may be used and distributed according to the terms of the
				// GNU General Public License version 2 or any later version.

				//! Main radix index implementation that maintains efficient Key to `KeyId` look ups.
				//!
				//! Practically, the index usually requires 2 buffers to be fully functional:
				//!
				//! - An key buffer. It stores the actual key contents. It is usually an
				//! append-only buffer.
				durhamUnsubmitted Not Done Does this mean a KeyId can only be at most 2^31 instead of 2^32? durham: Does this mean a KeyId can only be at most 2^31 instead of 2^32?
				quarkAuthorUnsubmitted Not Done Yes. I tried `u64` which removes some boundary checks but doubles the size of everything and it did introduce I/O overhead. Maybe we will need `u64` one day but probably not right now. quark: Yes. I tried `u64` which removes some boundary checks but doubles the size of everything and it…
				//! - A radix buffer. It stores radix nodes and pointers (offsets) to the key
				//! buffer. It does not contain key contents. For operations that requires
				//! contents of keys (ex. looking up unknown keys; inserting keys), the key
				//! buffer and a function to convert `KeyId` (offset) to key content must
				//! be provided.
				//!
				//! A radix node consists of 16 "pointer"s (since we follow base-16 sequence
				//! to do lookups). A pointer could be one of the following:
				//!
				//! - Empty (0).
				//! - A `RadixOffset`. Pointing to another radix node. (LSB is 0)
				//! - A `KeyId`. Need to be resolved by an external-provided "key function".
				//! Usually the "key function" uses `KeyId` as an offset of the key buffer.
				//! (LSB is 1).
				//!
				//! The radix buffer could have multiple "root" radix nodes so it contains multiple
				//! distinct indexes.
				//!
				//! A "key function" takes a `KeyId` and a "key argument" (usually the "key buffer")
				//! and returns a slice reference in the "key argument" as a key. That means, full
				//! key contents usually need to be written down in the key buffer, instead of going
				//! through extra pre-processing logic. That said, the "key argument" does not always
				//! have to be a "key buffer" so there could be some flexibility here.
				//!
				//! To give an more detailed example, suppose the "key function" is `FixedKey::read`
				//! (fixed 20-byte keys) and the key buffer ([u8]) looks like:
				//!
				//! Offset Content
				//! 0x100: 0x12 0x34 0x56 .... (key)
				//! 0x114: 0x12 0x78 0x9a .... (another key)
				//!
				//! With both of the keys inserted at radix offset 0x400, the radix buffer ([u32])
				//! looks like:
				//!
				//! Offset Content
				//! 0x400: 0 0x880 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (raidx node, 16 pointers)
				//! ^^^^^
				//! 0x1: RadixOffset 0x440
				//! 0x440: 0 0 0x900 0 0 0 0 0 0 0 0 0 0 0 0 0 (another radix node)
				//! ^^^^^
				//! 0x2: RadixOffset 0x480
				durhamUnsubmitted Done I'd make it clear that the match may not actually be a match for the sequence, and is instead the match for some prefix of the sequence. So the caller needs to check that the return 'i' is the length of KeyId, even if the first return parameter is not None. durham: I'd make it clear that the match may not actually be a match for the sequence, and is instead…
				durhamUnsubmitted Done Ignore that second sentence. The caller has to verify the key matches, but they can't do that via sequence length checking. durham: Ignore that second sentence. The caller has to verify the key matches, but they can't do that…
				quarkAuthorUnsubmitted Done Good point. quark: Good point.
				//! 0x480: 0 0 0 0x201 0 0 0 0x229 0 0 0 0 0 0 0 0 (another radix node)
				durhamUnsubmitted Done Maybe make this sentence a bit clearer. I didn't realize that you were listing the contents of the follow state after the hyphen, so I was confused about what I was reading. Something like: Also returns the last follow state, which is useful for write operations. It is a tuple containing the RadixOffset, the position we reached within `seq`, and the last base16 index number. So a tuple of `(r, i, b)` means `buf[r.0 + b]` points to the result `KeyId` (if returned), and `b` represents the location representing `seq.nth(i)`. durham: Maybe make this sentence a bit clearer. I didn't realize that you were listing the contents of…
				quarkAuthorUnsubmitted Done Yeah, this is confusing but those states do need to be shared - I didn't come up with better ideas about the interface. Since this isn't user-facing, I didn't pay much attention to it. Will probably draw some ASCII graphs here. quark: Yeah, this is confusing but those states do need to be shared - I didn't come up with better…
				//! ^^^^^ ^^^^^
				//! 0x3: KeyId 0x100 0x7: KeyId 0x114
				//!
				//! Note the radix buffer does not contain full key contents (ex. it does not
				//! have 0x34 0x56 or 0x78 0x9a). It only has the ambiguous prefix (0x12) stored.
				//!
				//! The index does not support deletion or iteration at present. It also forbids
				//! a key being the prefix of another key, to make the format simpler and more
				//! compact.
				//!
				//! Extra flexibility could be achieved by making the "key buffer" include
				//! additional information. For example, instead of just storing plain, fixed
				//! 20-byte keys one after another, a "key entry" could be
				//! "20-byte key + 4-byte offset + ..." so those key entries could contain
				//! additional data.

				use base16::Base16Iter;
				use errors::{Result, ErrorKind};
				durhamUnsubmitted Done Why `to_le` here? I assumed to_le was used when we have a value in memory and we want to serialize it to little endian to send somewhere that assumes it will receive little endian (so we take our in memory form and convert it to little endian for sending). But we don't appear to be doing that in this case. durham: Why `to_le` here? I assumed to_le was used when we have a value in memory and we want to…
				quarkAuthorUnsubmitted Done `buf` is the raw mmap-ed buffer. So when we are reading or writing `buf`, we need to take care of endianness. I didn't pay much attention to `to_le` or `from_le` since they are the same. Will change some of them to `from_le`. Maybe we should use `be` everywhere so errors like missed a "to_be" will be caught. quark: `buf` is the raw mmap-ed buffer. So when we are reading or writing `buf`, we need to take care…
				use key::KeyId;
				use traits::Resize;

				/// Number of children ("pointer"s) a radix node has
				pub const RADIX_NCHILDREN: usize = 16;

				/// Represent an offset to a radix node which contains 16 optional pointers to other
				/// radix nodes, or `KeyId`s.
				#[derive(Clone, Copy)]
				struct RadixOffset(u32);

				impl RadixOffset {
				#[inline]
				pub fn new(offset: u32) -> Self { RadixOffset(offset) }

				/// Append an empty `RadixNode` (`[u32; 16]`) at the end of a buffer.
				#[inline]
				pub fn create<R: Resize<u32> + AsRef<[u32]>>(vec: &mut R) -> Result<Self> {
				let pos = vec.as_ref().len();
				if (pos as u32) as usize != pos {
				bail!(ErrorKind::OffsetOverflow(pos as u64));
				}
				vec.resize(pos + RADIX_NCHILDREN, 0);
				Ok(RadixOffset(pos as u32))
				durhamUnsubmitted Done Should we assert that the key id bit isn't set? durham: Should we assert that the key id bit isn't set?
				quarkAuthorUnsubmitted Done Good idea. quark: Good idea.
				quarkAuthorUnsubmitted Not Done I actually changed it a bit so `RadixNode` is not aligned but the offset is shifted by one when writing. This is more consistent with `KeyId` handling. quark: I actually changed it a bit so `RadixNode` is not aligned but the offset is shifted by one when…
				}

				/// Follow a base16 sequence. Return a tuple:
				/// - The first item is `Some(key_id)` if a `KeyId` was found, or `None`
				/// - The second item the "last follow state", useful for write operations
				///
				/// The "last follow state" consists of 3 items:
				/// - r: `RadixOffset`
				/// - i: position reached within `seq`
				/// - b: last base16 index number
				///
				/// So `buf[r.0 + b]` points to the returned `KeyId` (if returned), and
				/// `seq.nth(i)` equals to `b`.
				///
				/// For example, given the base16 sequence: [1, 2, 11, 12, 13, 14], and the
				/// following radix buffer:
				///
				/// - Offset 0: RadixNode({0: 100, 1: 0, ... 15: 0}) # This RadixNode
				/// - Offset 100: RadixNode({..., 2: 200, ...})
				/// - Offset 200: RadixNode({..., 11: 501, ...}) # 501 is `KeyId` since its LSB is 1
				///
				/// This function will return `Ok(Some(501), (200, 2, 11))`. Note: the remaining
				durhamUnsubmitted Done Looks like we call to_le before putting it in the buffer, and to_le after taking it out of the buffer. Should one of those be from_le? Sounds like they do the same thing, but would be less confusing if it was symmetrical. durham: Looks like we call to_le before putting it in the buffer, and to_le after taking it out of the…
				quarkAuthorUnsubmitted Done Yes. quark: Yes.
				/// part of the base16 sequence (starting from 12) are not verified against the
				/// key. It's up to the caller to verify it if needed.
				#[inline]
				pub fn follow<R: AsRef<[u32]>, I: Iterator<Item = u8>>(self, buf: &R, seq: I)
				-> Result<(Option<KeyId>, (RadixOffset, usize, u8))> {
				let buf = buf.as_ref();
				let mut radix = self;
				for (i, b) in seq.enumerate() {
				durhamUnsubmitted Done Maybe document what the `offset` parameter means. durham: Maybe document what the `offset` parameter means.
				if b >= RADIX_NCHILDREN as u8 {
				bail!(ErrorKind::InvalidBase16(b));
				durhamUnsubmitted Done The above code uses RB for the radix buffer parameter. Should it be consistent here? durham: The above code uses RB for the radix buffer parameter. Should it be consistent here?
				}

				let pos = radix.0 as usize + usize::from(b);
				if pos >= buf.len() {
				bail!(ErrorKind::OffsetOverflow(pos as u64));
				}

				let v = u32::from_be(buf[pos]);
				if v == 0 {
				// Missing
				return Ok((None, (radix, i, b)));
				} else if v & 1 != 0 {
				// KeyId
				return Ok((Some(KeyId::from(v >> 1)), (radix, i, b)));
				durhamUnsubmitted Done I'd document all the parameters on the public facing functions, since it's not always clear what some of these do. durham: I'd document all the parameters on the public facing functions, since it's not always clear…
				quarkAuthorUnsubmitted Not Done Added. quark: Added.
				} else {
				// RadixOffset
				radix = RadixOffset::new(v >> 1);
				}
				}

				// The base16 sequence is too short and does not match a non-radix node.
				// NOTE: The error is not accurate if the prefix is empty and the radix tree is
				// also empty, or has exactly one entry. But without supporting that, the code
				// becomes much shorter. Since that is a rare case, we do not support it for now.
				Err(ErrorKind::AmbiguousPrefix.into())
				}

				durhamUnsubmitted Done Should we use '?' Instead of unwrap here? Like you do in prefix_lookup below? Seems like a common code path that we wouldn't want to panic in. durham: Should we use '?' Instead of unwrap here? Like you do in prefix_lookup below? Seems like a…
				quarkAuthorUnsubmitted Not Done Good catch. I cannot recall why I used `unwrap` in the first place. quark: Good catch. I cannot recall why I used `unwrap` in the first place.
				/// Rewrite specified entry to point to another radix node.
				#[inline]
				pub fn write_radix<R: AsMut<[u32]>>(&self, vec: &mut R, index: u8, node: RadixOffset)
				-> Result<()> {
				if node.0 > 0x7fff_ffff {
				bail!(ErrorKind::OffsetOverflow(node.0 as u64));
				}
				self.write_raw(vec, index, node.0 << 1)
				}

				durhamUnsubmitted Not Done Why drop the #[inline]? Just curious durham: Why drop the #[inline]? Just curious
				quarkAuthorUnsubmitted Not Done Jeremy's suggestion on a later diff. It's too large to inline. quark: Jeremy's suggestion on a later diff. It's too large to inline.
				/// Rewrite specified entry to point to a `KeyId`.
				#[inline]
				pub fn write_key_id<R: AsMut<[u32]>>(&self, vec: &mut R, index: u8, key_id: KeyId)
				-> Result<()> {
				let id: u32 = key_id.into();
				if id > 0x7fff_ffff {
				bail!(ErrorKind::OffsetOverflow(key_id.into()));
				}
				self.write_raw(vec, index, (id << 1) \| 1)
				}

				#[inline]
				fn write_raw<R: AsMut<[u32]>>(&self, vec: &mut R, index: u8, value: u32) -> Result<()> {
				debug_assert!(index < RADIX_NCHILDREN as u8);
				let vec = vec.as_mut();
				let pos = self.0 as usize + usize::from(index);
				if pos > vec.len() {
				bail!(ErrorKind::OffsetOverflow(pos as u64));
				}
				vec[pos] = value.to_be();
				Ok(())
				}
				}

				// Public APIs

				/// Look up a given `Key`. Return an optional potentially matched `KeyId`.
				/// `radix_buf` is a `[u32]` buffer that contains `RaidxNode`s.
				/// `offset` is the offset of the root radix node within the radix buffer.
				/// `key` is a base256 sequence.
				/// The caller is responsible to check whether `KeyId` matches the given `Key` or not.
				#[inline]
				pub fn radix_lookup_unchecked<R, K>(radix_buf: &R, offset: u32, key: &K) -> Result<Option<KeyId>>
				where
				R: AsRef<[u32]>,
				K: AsRef<[u8]>,
				{
				let (key_id, _) = RadixOffset::new(offset).follow(
				radix_buf,
				Base16Iter::from_bin(&key),
				)?;
				Ok(key_id)
				}

				// unfortunately rustfmt makes the parameter list longer than 100 chars so it's disabled for now.

				/// Lookup a given `Key`. Return a verified `KeyId` or `None`.
				/// `radix_buf` is a `[u32]` buffer that contains `RaidxNode`s.
				/// `offset` is the offset of the root radix node within the radix buffer.
				/// `key` is a base256 sequence.
				/// `key_reader` and `key_reader_arg` decide how and where to read a key given a `KeyId`.
				/// Unlike `radix_lookup_unchecked`. This function reads and checks the key.
				#[cfg_attr(rustfmt, rustfmt_skip)]
				pub fn radix_lookup<R, K, KR, KA>(
				radix_buf: &R, offset: u32, key: &K, key_reader: KR, key_reader_arg: &KA)
				-> Result<Option<KeyId>>
				where
				R: AsRef<[u32]>,
				K: AsRef<[u8]>,
				KR: Fn(&KA, KeyId) -> Result<&[u8]>,
				{
				let key_id = radix_lookup_unchecked(radix_buf, offset, key)?;
				if let Some(id) = key_id {
				let existing_key = key_reader(key_reader_arg, id)?;
				if existing_key != key.as_ref() {
				return Ok(None);
				}
				}
				Ok(key_id)
				}

				/// Lookup a unique `KeyId` given a prefix of a binary base16 sequence.
				/// `radix_buf` is a `[u32]` buffer that contains `RaidxNode`s.
				/// `offset` is the offset of the root radix node within the radix buffer.
				/// `prefix` is a base16 sequence (not base256).
				/// `key_reader` and `key_reader_arg` decide how and where to read a key given a `KeyId`.
				///
				/// Return `Err(ErrorKind::AmbiguousPrefix.into())` or `Err(ErrorKind::PrefixConflict.into())`
				/// if there are multiple matches, or `prefix` is empty. Return `Ok(None)` if there
				/// are no matches.
				///
				/// Return `Ok(key_id)` if there is a unique match. The `key_id` is guarnateed
				/// that once resolved and converted to base16 sequence, has a prefix matching
				durhamUnsubmitted Done I wonder if writing them backwards like this has any perf implication, like on cache line reads or anything. durham: I wonder if writing them backwards like this has any perf implication, like on cache line reads…
				quarkAuthorUnsubmitted Done This is more about allowing concurrent read - no bad "pointer"s are exposed. It seems like a nice property to have although we don't really depend on that right now. I'll add a comment. quark: This is more about allowing concurrent read - no bad "pointer"s are exposed. It seems like a…
				/// the given `prefix`.
				#[cfg_attr(rustfmt, rustfmt_skip)]
				pub fn radix_prefix_lookup<R, P, KR, KA>(
				radix_buf: &R, offset: u32, prefix: P, key_reader: KR, key_reader_arg: &KA)
				-> Result<Option<KeyId>>
				durhamUnsubmitted Done Or the keys are identical but the key_id is different. Not sure if that's worth special casing. durham: Or the keys are identical but the key_id is different. Not sure if that's worth special casing.
				quarkAuthorUnsubmitted Done Yes. Will change the comment. quark: Yes. Will change the comment.
				where
				R: AsRef<[u32]>,
				P: Iterator<Item = u8> + Clone,
				KR: Fn(&KA, KeyId) -> Result<&[u8]>,
				{
				let root = RadixOffset::new(offset);
				let (key_id, (_radix, i, _b)) = root.follow(radix_buf, prefix.clone())?;
				if let Some(id) = key_id {
				let key = key_reader(key_reader_arg, id)?;
				let iter = Base16Iter::from_bin(&key);
				let matched = iter.clone().skip(i).zip(prefix.clone().skip(i)).all(\|(b1, b2)\| b1 == b2);
				if !matched \|\| iter.count() < prefix.count() {
				durhamUnsubmitted Not Done Could we do tests that check for corruption handling? Like if the process is killed mid-write, what is the expected behavior when the next process tries to read the tree? durham: Could we do tests that check for corruption handling? Like if the process is killed mid-write…
				quarkAuthorUnsubmitted Not Done Errors like out-of-range access are covered by `test_errors`, which are basically what this library can provide at its best. Arbitrary data corruption (ex. some u32 is changed to `0` in a random radix node) cannot be detected. But simpler ones (ex. an offset is larger than the buffer size) can. Data integrity is a much more complex problem so I intentionally avoided them here - the library does not even have I/O code - all it speaks are raw buffers. It's up to the upper layer to guarantee data integrity. Currently we use atomic replace and tiprev + tipnode verification. quark: Errors like out-of-range access are covered by `test_errors`, which are basically what this…
				return Ok(None);
				}
				}
				Ok(key_id)
				}

				/// Insert a `key_id` into the radix tree that can be retrieved using its corresponding
				/// key afterwards.
				///
				/// `radix_buf` is a `[u32]` buffer that contains `RaidxNode`s.
				/// `offset` is the offset of the root radix node within the radix buffer.
				durhamUnsubmitted Not Done Why set `new_key = key` here? I guess the same question applies to new_key_id durham: Why set `new_key = key` here? I guess the same question applies to new_key_id
				quarkAuthorUnsubmitted Not Done Just make it more obvious since the following code uses `new_key` and `old_key`. So they are aligned. quark: Just make it more obvious since the following code uses `new_key` and `old_key`. So they are…
				/// `key_id` is the `KeyId`, which will be passed to `key_reader` to retrieve the actual key.
				/// `key_reader` and `key_reader_arg` decide how and where to read a key given a `KeyId`.
				///
				/// Return `Ok(())` on success.
				///
				/// The key being inserted can neither be a prefix of an existing key, or has a prefix that equals
				/// to an existing key. If the key already exists, `key_id` must match the existing `key_id`.
				/// Otherwise it will cause `ErrorKind::PrefixConflict` error.
				#[cfg_attr(rustfmt, rustfmt_skip)]
				pub fn radix_insert<R, KR, KA>(
				radix_buf: &mut R, offset: u32, key_id: KeyId, key_reader: KR, key_reader_arg: &KA)
				-> Result<()>
				where
				R: Resize<u32> + AsRef<[u32]> + AsMut<[u32]>,
				KR: Fn(&KA, KeyId) -> Result<&[u8]>,
				{
				let new_key = key_reader(key_reader_arg, key_id)?;
				radix_insert_with_key(
				radix_buf,
				offset,
				key_id,
				&new_key,
				key_reader,
				key_reader_arg,
				)
				}

				/// Insert a `key_id` into the radix tree that can be retrieved using `key` afterwards.
				///
				/// `radix_buf` is a `[u32]` buffer that contains `RaidxNode`s.
				/// `offset` is the offset of the root radix node within the radix buffer.
				/// `key_id` is the `KeyId` to insert.
				durhamUnsubmitted Not Done You use `from_bin(&new_key)` (with the &) in the previous two call sites but not here? durham: You use `from_bin(&new_key)` (with the &) in the previous two call sites but not here?
				quarkAuthorUnsubmitted Not Done Wrong codemod. Will remove `&` in previous code. Rust inserts `` automatically so the previous ones become `Base16Iter::from_bin(&new_key)`. quark: Wrong codemod. Will remove `&` in previous code. Rust inserts `*` automatically so the previous…
				/// `key` is the `Key` to be used. It must match provided `key_id`.
				/// `key_reader` and `key_reader_arg` decide how and where to read a key given a `KeyId`.
				///
				/// Return `Ok(())` on success.
				///
				/// The key being inserted can neither be a prefix of an existing key, or has a prefix that equals
				/// to an existing key. If the key already exists, `key_id` must match the existing `key_id`.
				/// Otherwise it will cause `ErrorKind::PrefixConflict` error.
				#[cfg_attr(rustfmt, rustfmt_skip)]
				pub fn radix_insert_with_key<R, K, KR, KA>(
				radix_buf: &mut R, offset: u32, key_id: KeyId, key: &K, key_reader: KR, key_reader_arg: &KA)
				-> Result<()>
				where
				R: Resize<u32> + AsRef<[u32]> + AsMut<[u32]>,
				K: AsRef<[u8]>,
				KR: Fn(&KA, KeyId) -> Result<&[u8]>,
				{
				let new_key_id = key_id;
				let new_key = key;
				let root = RadixOffset::new(offset);
				let (old_key_id, (radix, i, b)) = root.follow(radix_buf, Base16Iter::from_bin(new_key))?;
				match old_key_id {
				Some(old_key_id) => {
				// No need to re-insert a same key
				if old_key_id == new_key_id {
				return Ok(());
				}

				// Need to do a leaf split
				let old_key = key_reader(key_reader_arg, old_key_id)?;

				// Find common prefix starting from the next base16 integer
				let mut common_len = 0;
				let old_iter = Base16Iter::from_bin(&old_key).skip(i + 1);
				let new_iter = Base16Iter::from_bin(new_key).skip(i + 1);
				for (b1, b2) in old_iter.zip(new_iter) {
				if b1 == b2 {
				common_len += 1;
				} else {
				// Got a chain of radix nodes to write back
				// Write new `RadixNode`s in reversed order so:
				// - Looking up `old_key` works in the mean time
				// - There won't be invalid `RadixOffset` at any time
				// - Write count is optimized
				// (won't write `KeyId` first and then change it to `RadixOffset`)
				// The first two properties could help concurrent reads.
				// Although we are not depending on that right now.
				let mut node = RadixOffset::create(radix_buf)?;
				node.write_key_id(radix_buf, b1, old_key_id)?;
				node.write_key_id(radix_buf, b2, new_key_id)?;
				let new_iter = Base16Iter::from_bin(new_key).skip(i + 1);
				for k in new_iter.take(common_len).rev() {
				let new_node = RadixOffset::create(radix_buf)?;
				new_node.write_radix(radix_buf, k, node)?;
				node = new_node;
				}
				return radix.write_radix(radix_buf, b, node);
				}
				}

				// new_key is a prefix of old_key, or vice-versa.
				// or they are the same but with different key_ids.
				if old_key.len() > new_key.as_ref().len() {
				Err(ErrorKind::PrefixConflict(new_key_id, old_key_id).into())
				} else {
				Err(ErrorKind::PrefixConflict(old_key_id, new_key_id).into())
				}
				}
				None => radix.write_key_id(radix_buf, b, new_key_id),
				}
				}

				#[cfg(test)]
				mod tests {
				use super::*;
				use std::collections::HashSet;
				use std::mem::transmute;
				use key::{VariantKey, FixedKey};
				use rand::{ChaChaRng, Rng};
				use test::Bencher;

				#[test]
				fn test_errors() {
				let mut key_buf = vec![0u8; 10];
				let mut radix_buf = vec![0u32; 15];

				// KeyId exceeds format limit
				let key = [0u8; 20];
				let key_id = (1u32 << 31).into();
				let r = radix_insert_with_key(&mut radix_buf, 0, key_id, &key, FixedKey::read, &key_buf);
				assert_eq!(r.unwrap_err().description(), "offset overflow");

				// KeyId exceeds key buffer length
				let key_id = 30u32.into();
				let r = radix_insert(&mut radix_buf, 0, key_id, FixedKey::read, &key_buf);
				assert_eq!(r.unwrap_err().description(), "invalid key id");
				let r = radix_insert(&mut radix_buf, 0, key_id, FixedKey::read, &key_buf);
				let t = format!("{}", r.unwrap_err());
				assert_eq!(t, "KeyId(30) cannot be resolved");

				// Radix root node offset exceeds radix buffer length
				let r = radix_insert_with_key(&mut radix_buf, 16, key_id, &key, FixedKey::read, &key_buf);
				assert_eq!(format!("{}", r.unwrap_err()), "offset 16 is out of range");

				// Radix node offset out of range during a lookup
				let prefix = [0xf].iter().cloned();
				let r = radix_prefix_lookup(&radix_buf, 0, prefix, FixedKey::read, &key_buf);
				assert_eq!(format!("{}", r.unwrap_err()), "offset 15 is out of range");

				// Base16 sequence overflow
				let prefix = [21].iter().cloned();
				let r = radix_prefix_lookup(&radix_buf, 0, prefix.clone(), FixedKey::read, &key_buf);
				assert_eq!(r.unwrap_err().description(), "invalid base16 value");
				let r = radix_prefix_lookup(&radix_buf, 0, prefix, FixedKey::read, &key_buf);
				assert_eq!(format!("{}", r.unwrap_err()), "21 is not a base16 value");

				// Inserting a same key with a same `KeyId` is okay
				let key_id1 = VariantKey::append(&mut key_buf, &b"ab");
				let key_id2 = VariantKey::append(&mut key_buf, &b"ab");
				radix_insert(&mut radix_buf, 0, key_id1, VariantKey::read, &key_buf).expect("insert");
				radix_insert(&mut radix_buf, 0, key_id1, VariantKey::read, &key_buf).expect("insert");

				// But not okay if `KeyId` are different
				let r = radix_insert(&mut radix_buf, 0, key_id2, VariantKey::read, &key_buf);
				assert_eq!(r.unwrap_err().description(), "key prefix conflict");

				// A key cannot be a prefix of another key
				let key_id4 = VariantKey::append(&mut key_buf, &b"a");
				let key_id5 = VariantKey::append(&mut key_buf, &b"abc");
				let r = radix_insert(&mut radix_buf, 0, key_id4, VariantKey::read, &key_buf);
				assert_eq!(
				format!("{}", r.unwrap_err()),
				format!("{:?} cannot be a prefix of {:?}", key_id4, key_id1)
				);
				let r = radix_insert(&mut radix_buf, 0, key_id5, VariantKey::read, &key_buf);
				assert_eq!(
				format!("{}", r.unwrap_err()),
				format!("{:?} cannot be a prefix of {:?}", key_id1, key_id5)
				);

				// Enforce a leaf split of key_id1
				let key_id3 = VariantKey::append(&mut key_buf, &b"ac");
				radix_insert(&mut radix_buf, 0, key_id3, VariantKey::read, &key_buf).expect("insert");

				// Still impossible to cause key prefix conflicts
				let r = radix_insert(&mut radix_buf, 0, key_id4, VariantKey::read, &key_buf);
				assert_eq!(r.unwrap_err().description(), "ambiguous prefix");
				let r = radix_insert(&mut radix_buf, 0, key_id5, VariantKey::read, &key_buf);
				assert_eq!(r.unwrap_err().description(), "key prefix conflict");
				}

				#[test]
				fn test_prefix_lookup() {
				let mut key_buf: Vec<u8> = vec![];
				let mut radix_buf = vec![0u32; 16];

				let query = Base16Iter::from_bin(&b"01abc");

				// With a single key
				let key1 = b"01ab";
				let key1_id = VariantKey::append(&mut key_buf, &key1);
				radix_insert(&mut radix_buf, 0, key1_id, VariantKey::read, &key_buf).expect("insert");
				for i in 0..query.len() {
				let prefix = query.clone().take(i);
				let r = radix_prefix_lookup(&radix_buf, 0, prefix, VariantKey::read, &key_buf);
				if i == 0 {
				// This is sub-optimal. But see the NOTE in RadixOffset::follow.
				assert_eq!(r.unwrap_err().description(), "ambiguous prefix");
				} else if i <= key1.len() * 2 {
				assert_eq!(r.unwrap(), Some(key1_id));
				} else {
				assert_eq!(r.unwrap(), None);
				}
				}

				// With another key
				let key2 = b"01bbc";
				let key2_id = VariantKey::append(&mut key_buf, &key2);
				radix_insert(&mut radix_buf, 0, key2_id, VariantKey::read, &key_buf).expect("insert");
				for i in 0..query.len() {
				let prefix = query.clone().take(i);
				let r = radix_prefix_lookup(&radix_buf, 0, prefix, VariantKey::read, &key_buf);
				if i <= 5 {
				assert_eq!(r.unwrap_err().description(), "ambiguous prefix")
				} else if i <= key1.len() * 2 {
				assert_eq!(r.unwrap(), Some(key1_id));
				} else {
				assert_eq!(r.unwrap(), None);
				}
				}

				let query = Base16Iter::from_bin(&b"1");
				let r = radix_prefix_lookup(&radix_buf, 0, query, VariantKey::read, &key_buf);
				assert_eq!(r.unwrap(), None);

				let query = Base16Iter::from_bin(&b"01b");
				let r = radix_prefix_lookup(&radix_buf, 0, query, VariantKey::read, &key_buf);
				assert_eq!(r.unwrap(), Some(key2_id));
				}

				quickcheck! {
				fn test_compare_with_stdset_sparse(std_set: HashSet<u64>) -> bool {
				let std_set: HashSet<[u8; 10]> = std_set.iter().map(\|&x\| {
				let mut buf = [0u8; 10];
				let slice: [u8; 8] = unsafe { transmute(x) };
				buf[0..8].copy_from_slice(&slice);
				buf
				}).collect();
				check_with_stdset(std_set)
				}

				fn test_compare_with_stdset_dense(std_set: HashSet<u16>) -> bool {
				let std_set: HashSet<[u8; 10]> = std_set.iter().map(\|&x\| {
				let mut buf = [0u8; 10];
				let slice: [u8; 2] = unsafe { transmute(x) };
				buf[0..2].copy_from_slice(&slice);
				buf
				}).collect();
				check_with_stdset(std_set)
				}
				}

				// Compare with `HashSet`.
				fn check_with_stdset(std_set: HashSet<[u8; 10]>) -> bool {
				let mut key_buf = Vec::<u8>::with_capacity(std_set.len() * 11);
				let mut radix_buf = vec![0u32; 16];

				// Insert to radix tree
				for key in &std_set {
				let key_id = VariantKey::append(&mut key_buf, key);
				radix_insert(&mut radix_buf, 0, key_id, VariantKey::read, &key_buf).expect("insert");
				}

				// Test key existence
				std_set.iter().all(\|key\| {
				let r = radix_lookup(&radix_buf, 0, key, VariantKey::read, &key_buf);
				r.unwrap().is_some()
				})
				}

				const COUNT: usize = 51200;

				#[bench]
				fn bench_insert(b: &mut Bencher) {
				let key_buf = randomized_key_buf(COUNT);
				b.iter(\|\| { batch_insert_radix_buf(&key_buf, COUNT); })
				}

				#[bench]
				fn bench_unchecked_lookups(b: &mut Bencher) {
				let key_buf = randomized_key_buf(COUNT);
				let radix_buf = batch_insert_radix_buf(&key_buf, COUNT);
				b.iter(\|\| for i in 0..COUNT {
				let key_id = (i as u32 * 20).into();
				let key = FixedKey::read(&key_buf, key_id).unwrap();
				radix_lookup_unchecked(&radix_buf, 0, &key).expect("lookup");
				})
				}

				#[bench]
				fn bench_lookups(b: &mut Bencher) {
				let key_buf = randomized_key_buf(COUNT);
				let radix_buf = batch_insert_radix_buf(&key_buf, COUNT);
				b.iter(\|\| for i in 0..COUNT {
				let key_id = (i as u32 * 20).into();
				let key = FixedKey::read(&key_buf, key_id).unwrap();
				radix_lookup(&radix_buf, 0, &key, FixedKey::read, &key_buf).expect("lookup");
				})
				}

				fn randomized_key_buf(count: usize) -> Vec<u8> {
				let mut key_buf = vec![0u8; count * 20];
				// Using an unseeded rng so benchmarks are more stable across multiple runs
				ChaChaRng::new_unseeded().fill_bytes(key_buf.as_mut());
				key_buf
				}

				fn batch_insert_radix_buf(key_buf: &Vec<u8>, count: usize) -> Vec<u32> {
				let mut radix_buf = vec![0u32; 16];
				for i in 0..count {
				let key_id: KeyId = ((i * 20) as u32).into();
				radix_insert(&mut radix_buf, 0, key_id, FixedKey::read, key_buf).expect("insert");
				}
				radix_buf
				}
				}