Computing tags requires parsing .hgtags for all heads. Mercurial
therefore keeps a cache to efficiently find the .hgtags version of a
revision without having to parse the manifest, but this cache is
computed lazily and often incomplete.
The new implementation of the test works a lot more like the
revbranchcache and updates the cache in two stages:
(1) When a new changeset is added, check if .hgtags is touched. The
common case is that it didn't change and it is therefore inherited from
the parent. Now the parent might not be fully resolved yet (missing
manifest), so just keep a dictionary mapping to the parent revision that
potentially touched it.
(2) At the end of the transaction, resolve entries before writing the
cache to disk. At this point, manifests are all known, so they can be
parsed as necessary. The fast path here is reading just the delta, but
this doesn't necessarily answer the question, since the delta could have
been to a non-parent.
If the cache logic hits an invalid or missing node, it will recheck all
nodes. This is a bit more expensive, but simplifies the logic and avoids
recursions. This penalty is normally hit only once, but read-only
clients should run debugupdatecaches once and after strips. The
rewritten version no longer uses a separate missing item. This matters
only if a node with 32bit leading nulls exist other than the nullid, but
that is as likely as 32bit leading ones with the old code.
Extend verification has the .hgtags file node at least exists. It
doesn't create a measurable difference for the rebuild time for a tag
heavy repository.
The code structure of the tag prepares for mmap access to the cache
file. The only in-place operation is append and that verifies the old
file size and is only used if nothing else changed.
Could you please use a more descriptive variable name? To me, a small comment would go a long way on clarifying the semantics of some of the less obvious ones (all the more for when the code evolves).