In this implementation, we just make lookup() return also the number
of steps that have been needed to come to a conclusion from the
nodetree data, and validate_candidate() takes care of the special
cases related to NULL_NODE.
This way of doing minimizes code duplication, but it means that
the comparatively slower finding of first non zero nybble will run
for all calls to find() where it is not needed.
Still running on the file generated for the mozilla-central repository,
it seems indeed that we now get more ofter 320 ns than 310. The odds that
this could have a significant impact on real life Mercurial performance
are still looking low. Let's wait for actual benchmark runs to see if
an optimization is needed here.
can the explicit lifetime be dropped?