Implement the main radix tree with tests. A quick benchmark shows the

insertion performance is similar to the known revlog.c implementation.

The time complexity is about `O(N * log N)` for inserting or looking up `N`

entries. The `log` part is because the prefix length is increasing. A rough

(not so accurate) real world benchmark is like:

N | Insert | Lookup | Checked Lookup [1] | Index Size |

10k | 0.70ms | 0.25ms | 0.36ms | 0.25MB |

20k | 1.3ms | 0.58ms | 0.8ms | 0.45MB |

50k | 4.9ms | 1.9ms | 2.6ms | 1.1MB |

100k | 11ms | 4.5ms | 6.8ms | 2.5MB |

200k | 26ms | 13ms | 17ms | 4.9MB |

500k | 68ms | 46ms | 54ms | 11MB |

1M | 170ms | 130ms | 150ms | 24MB |

2M | 420ms | 300ms | 350ms | 51MB |

5M | 1.2s | 0.9s | 1.1s | 110MB |

10M | 2.7s | 2.3s | 2.7s | 220MB |

20M | 6.2s | 5.1s | 5.8s | 490MB |

50M | 19s | 16s | 18s | 1.2GB |

[1]: After lookup, verify the key id maps to the key. Can be skipped if key

length is fixed and index data could be trusted.