This is an archive of the discontinued Mercurial Phabricator instance.

storageutil: implement file identifier resolution method (BC)
ClosedPublic

Authored by indygreg on Sep 28 2018, 8:18 PM.

Details

Summary

revlog.lookup() has a number of advanced features, including partial
node matching.

These advanced features aren't needed for file id lookup because file
identifiers are almost always from internal sources. (An exception to
this is hgweb, which appears to have some code paths that attempt
to resolve a user-supplied string to a node.)

This commit implements a new function for resolving file identifiers
to nodes. It behaves like revlog.lookup() but without the
advanced features.

Tests reveal behavior changes:

  • Partial hex nodes are no longer resolved to nodes.
  • "-1" now returns nullid instead of raising LookupError.
  • "0" on an empty store now raises LookupError instead of returning nullid.

I'm not sure why "-1" wasn't recognized before. But it seems reasonable
to accept it as a value since integer -1 resolves to nullid.

These changes all seem reasonable to me. And with the exception of
partial hex node matching, we may want to consider changing
revlog.lookup() as well.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

indygreg created this revision.Sep 28 2018, 8:18 PM

@martinvonz: you may have an interest in this and D4798, specifically the existing wonky behavior of revlog.lookup(), which these changes effectively skate around.

This revision was automatically updated to reflect the committed changes.

"0" on an empty store now raises LookupError instead of returning nullid.

I haven't checked the code, but I suspect this was just a special case of interpreting "rl.lookup(len(rl))" as nullid. I thought I had fixed all those cases about a month ago when I made the index not behave like that (I made only -1 look up the nullid), but maybe I missed a case here.

with the exception of partial hex node matching, we may want to consider changing revlog.lookup() as well

Maybe even without the exception? I thought we used partial nodeids only for debug commands and maybe we can move the partial matching to a higher level.

with the exception of partial hex node matching, we may want to consider changing revlog.lookup() as well

Maybe even without the exception? I thought we used partial nodeids only for debug commands and maybe we can move the partial matching to a higher level.

I support moving partial matching to a higher level. Although some storage backends may support a more efficient "native" partial matcher. e.g. SQLite could use various string matching functions. But I'm not sure it is worth it though: I think I'd rather have a generic object holding DAG/index data [that can be shared across all storage backends] and that exposes a partial match API. I dunno. I don't plan to do anything in this area at this time. If you want to hack on things, go for it.