radixbuf: add a base16 iterator
ClosedPublic

Authored by quark on Nov 2 2017, 2:45 PM.

Details

Reviewers
durham
Group Reviewers
Restricted Project
Commits
rFBHGXe7ac50b00874: radixbuf: add a base16 iterator
Summary

The radix tree would be using base16 to save space and support hex-string
prefix queries. Therefore a base16 iterator is needed.

DoubleEndedIterator, ExactSizeIterator and fast paths of skip and
take are implemented so certain patterns (ex. skip.take.rev, used in a
future patch) could work without a temporary Vec.

Test Plan

cargo test --lib

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
quark created this revision.Nov 2 2017, 2:45 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 2 2017, 2:45 PM
jsgf added a subscriber: jsgf.Nov 2 2017, 4:06 PM
jsgf added inline comments.
rust/radix204/src/base16.rs
10–17 ↗(On Diff #3210)

This seems a bit over-complex. You basically want an iterator that can take anything that's AsRef<[u8]> and then emit those bytes nibble-by-nibble. It may also end up being awkward for it to always borrow the value rather than taking ownership of it.

It's always more flexible to have a trait take something by ownership, because then you can either make it own something outright (ie, have 'static lifetime), or implement it for a &'a T type, and it's subject to 'a lifetime. Whereas if you specify the trait as having a lifetime, then you're stuck with that, and its hard to make it own something if that what you need to do.

quark added inline comments.Nov 2 2017, 6:14 PM
rust/radix204/src/base16.rs
10–17 ↗(On Diff #3210)

I had another flexible version that takes an iterator:

pub struct IterIter(Box<Iterator<Item = u8>>, u8, u8);

impl IntoBase16Iter<IterIter> for Vec<u8> {
    fn into_base16iter(self) -> IterIter { IterIter(Box::new(self.into_iter()), 0, 0) }
}

impl Iterator for IterIter {
    type Item = u8;

    fn next(&mut self) -> Option<Self::Item> {
        match self.2 {
            0 => {
                match self.0.next() {
                    Some(x) => {
                        self.1 = x & 0xf;
                        self.2 = 1;
                        Some(x >> 4)
                    }
                    None => None,
                }
            }
            1 => {
                self.2 = 0;
                Some(self.1)
            }
            _ => None,
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let (lower, higher) = self.0.size_hint();
        (lower * 2 + 1, higher.map(|x| x * 2 + 2))
    }

    fn count(self) -> usize { self.0.count() * 2 + 1 }
}

But it was slower.

I was using into_base16iter(self) but found it enforced copies and that did hurt perf. So I changed it to to_ version.

I agree the iterator type here is tricky. The current code is after some aggressive simplification. Previously I wanted to make things very flexible and had ended up like:

impl<K, KC, KI, KB, RB, V> RadixTree<K, KC, KI, KB, RB, V>
where
    for<'a> K: ToBase16Iter<'a, KI>,
    KC: KeyConverter<K>,
    KI: Iterator<Item = u8>,
    KB: Read + Write + Seek,
    RB: Read + Write + Seek,
    V: Serialize,

It actually worked thanks for the for<'a> syntax. But the API is not that easy to use. It was also slower (io::Cursor adds 50% overhead). So I decided to trade flexibility for performance.

quark updated this revision to Diff 3250.Nov 3 2017, 2:17 AM
quark updated this revision to Diff 3295.Nov 6 2017, 2:10 PM
quark edited the summary of this revision. (Show Details)
quark updated this revision to Diff 3418.Nov 11 2017, 1:33 AM
quark edited the summary of this revision. (Show Details)
quark retitled this revision from radix204: add a base16 iterator to radix20: add a base16 iterator.
quark updated this revision to Diff 3544.Nov 15 2017, 8:55 PM
quark edited the summary of this revision. (Show Details)
quark retitled this revision from radix20: add a base16 iterator to radixbuf: add a base16 iterator.
quark updated this revision to Diff 3699.Nov 20 2017, 7:11 PM
durham accepted this revision.Nov 29 2017, 5:39 PM
durham added a subscriber: durham.

I'd follow up with Jeremey about his suggestion, but I don't think it's worth blocking getting this out for dogfooding.

rust/radixbuf/src/base16.rs
97

Why test these particular access patterns? Why not others?

109

Why have u8 at the end of 56 and 21 but not at the end of 12 and 34?

This revision is now accepted and ready to land.Nov 29 2017, 5:39 PM
quark added a comment.EditedNov 29 2017, 5:49 PM

For the "take the ownership": The &[u8] comes from a slice owned by a mmap-ed buffer. We cannot own the entire mmap buffer here in the iterator. By copying the slice to heap the radix tree is about ~50% slower. So taking reference to achieve zero-copy here seems to be a reasonable choice.

The ToBase16Iter trait can be replaced by a function to reduce type complexity. That'll be a codemoding replacing all x.to_base16iter() to to_base16iter(x). I'll do that.

rust/radixbuf/src/base16.rs
97

There are too many combinations. I think these are complex enough to reasonably cover interesting cases.

109

Will remove u8 after 0x56. Elements in a Rust array must have a same type. So one explicit type is enough.

quark updated this revision to Diff 4448.Dec 14 2017, 9:21 PM
This revision was automatically updated to reflect the committed changes.