( )⚙ D1290 radixbuf: add a base16 iterator

This is an archive of the discontinued Mercurial Phabricator instance.

radixbuf: add a base16 iterator
ClosedPublic

Authored by quark on Nov 2 2017, 2:45 PM.
Tags
None
Subscribers

Details

Reviewers
durham
Group Reviewers
Restricted Project
Commits
rFBHGXe7ac50b00874: radixbuf: add a base16 iterator
Summary

The radix tree would be using base16 to save space and support hex-string
prefix queries. Therefore a base16 iterator is needed.

DoubleEndedIterator, ExactSizeIterator and fast paths of skip and
take are implemented so certain patterns (ex. skip.take.rev, used in a
future patch) could work without a temporary Vec.

Test Plan

cargo test --lib

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

quark created this revision.Nov 2 2017, 2:45 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 2 2017, 2:45 PM
jsgf added a subscriber: jsgf.Nov 2 2017, 4:06 PM
jsgf added inline comments.
rust/radix204/src/base16.rs
10–17 ↗(On Diff #3210)

This seems a bit over-complex. You basically want an iterator that can take anything that's AsRef<[u8]> and then emit those bytes nibble-by-nibble. It may also end up being awkward for it to always borrow the value rather than taking ownership of it.

It's always more flexible to have a trait take something by ownership, because then you can either make it own something outright (ie, have 'static lifetime), or implement it for a &'a T type, and it's subject to 'a lifetime. Whereas if you specify the trait as having a lifetime, then you're stuck with that, and its hard to make it own something if that what you need to do.

quark added inline comments.Nov 2 2017, 6:14 PM
rust/radix204/src/base16.rs
10–17 ↗(On Diff #3210)

I had another flexible version that takes an iterator:

pub struct IterIter(Box<Iterator<Item = u8>>, u8, u8);

impl IntoBase16Iter<IterIter> for Vec<u8> {
    fn into_base16iter(self) -> IterIter { IterIter(Box::new(self.into_iter()), 0, 0) }
}

impl Iterator for IterIter {
    type Item = u8;

    fn next(&mut self) -> Option<Self::Item> {
        match self.2 {
            0 => {
                match self.0.next() {
                    Some(x) => {
                        self.1 = x & 0xf;
                        self.2 = 1;
                        Some(x >> 4)
                    }
                    None => None,
                }
            }
            1 => {
                self.2 = 0;
                Some(self.1)
            }
            _ => None,
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let (lower, higher) = self.0.size_hint();
        (lower * 2 + 1, higher.map(|x| x * 2 + 2))
    }

    fn count(self) -> usize { self.0.count() * 2 + 1 }
}

But it was slower.

I was using into_base16iter(self) but found it enforced copies and that did hurt perf. So I changed it to to_ version.

I agree the iterator type here is tricky. The current code is after some aggressive simplification. Previously I wanted to make things very flexible and had ended up like:

impl<K, KC, KI, KB, RB, V> RadixTree<K, KC, KI, KB, RB, V>
where
    for<'a> K: ToBase16Iter<'a, KI>,
    KC: KeyConverter<K>,
    KI: Iterator<Item = u8>,
    KB: Read + Write + Seek,
    RB: Read + Write + Seek,
    V: Serialize,

It actually worked thanks for the for<'a> syntax. But the API is not that easy to use. It was also slower (io::Cursor adds 50% overhead). So I decided to trade flexibility for performance.

quark updated this revision to Diff 3250.Nov 3 2017, 2:17 AM
quark edited the summary of this revision. (Show Details)Nov 6 2017, 2:10 PM
quark updated this revision to Diff 3295.
quark edited the summary of this revision. (Show Details)Nov 11 2017, 1:33 AM
quark retitled this revision from radix204: add a base16 iterator to radix20: add a base16 iterator.
quark updated this revision to Diff 3418.
quark edited the summary of this revision. (Show Details)Nov 15 2017, 8:55 PM
quark retitled this revision from radix20: add a base16 iterator to radixbuf: add a base16 iterator.
quark updated this revision to Diff 3544.
quark updated this revision to Diff 3699.Nov 20 2017, 7:11 PM
durham accepted this revision.Nov 29 2017, 5:39 PM
durham added a subscriber: durham.

I'd follow up with Jeremey about his suggestion, but I don't think it's worth blocking getting this out for dogfooding.

rust/radixbuf/src/base16.rs
96 ↗(On Diff #3699)

Why test these particular access patterns? Why not others?

108 ↗(On Diff #3699)

Why have u8 at the end of 56 and 21 but not at the end of 12 and 34?

This revision is now accepted and ready to land.Nov 29 2017, 5:39 PM
quark added a comment.EditedNov 29 2017, 5:49 PM

For the "take the ownership": The &[u8] comes from a slice owned by a mmap-ed buffer. We cannot own the entire mmap buffer here in the iterator. By copying the slice to heap the radix tree is about ~50% slower. So taking reference to achieve zero-copy here seems to be a reasonable choice.

The ToBase16Iter trait can be replaced by a function to reduce type complexity. That'll be a codemoding replacing all x.to_base16iter() to to_base16iter(x). I'll do that.

rust/radixbuf/src/base16.rs
96 ↗(On Diff #3699)

There are too many combinations. I think these are complex enough to reasonably cover interesting cases.

108 ↗(On Diff #3699)

Will remove u8 after 0x56. Elements in a Rust array must have a same type. So one explicit type is enough.

quark updated this revision to Diff 4448.Dec 14 2017, 9:21 PM
This revision was automatically updated to reflect the committed changes.