This is an archive of the discontinued Mercurial Phabricator instance.

debugdatapack: return 1 and warn if there are invalid entries
ClosedPublic

Authored by phillco on Oct 31 2017, 12:32 AM.
Tags
None
Subscribers
None

Details

Summary

An invalid entry is any entry with a base not in the pack, or whose deltabases
form a cycle.

If there are any entries like that, the output will look like this:

(Root):
Node          Delta Base    Delta Length  Blob Size
665a7e7913af  e66038a2894e  61            2142
52bd634be310  000000000000  2142          2142
8b5847087ce0  000000000000  2142          2142
960f5acb3e99  edf2ffd7daab  162           2142
b7d7e5aa692e  8b5847087ce0  162           2142
cdcc4d74d667  960f5acb3e99  324           2142
Total:                      14652         48920     (70.0% smaller)
Bad entry: 960f5acb3e99 has an unknown deltabase (edf2ffd7daab)
Bad entry: b7d7e5aa692e has an unknown deltabase (edf2ffd7daab)
2 invalid entries

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

phillco created this revision.Oct 31 2017, 12:32 AM
Herald added a reviewer: Restricted Project. · View Herald TranscriptOct 31 2017, 12:32 AM
phillco edited the summary of this revision. (Show Details)Oct 31 2017, 2:09 AM
phillco added inline comments.
tests/test-treemanifest.t
71–74 ↗(On Diff #3172)

hm, this is kinda interesting. I guess this triggered because subdir/'s entry interrupts this chain.

durham requested changes to this revision.Oct 31 2017, 4:09 PM

I don't think the assertion being made here is correct.

remotefilelog/debugcommands.py
216

I don't think this is a correct assumption? If you have a commit c1 with two parents, p1 and p2, it's possible that c1 could be delta'd against either p1 or p2, which may not be contiguous in the pack output.

217

We should explain why an entry is bad. Like: "BAD ENTRY - unexpected delta base"

This revision now requires changes to proceed.Oct 31 2017, 4:09 PM
phillco planned changes to this revision.Oct 31 2017, 4:18 PM
phillco added inline comments.
remotefilelog/debugcommands.py
216

Ah, then it is incorrect. I'll rework it.

phillco edited the summary of this revision. (Show Details)Nov 6 2017, 12:55 PM
phillco updated this revision to Diff 3293.

@durham I changed the algorithm to just require that all deltabases be nodes that exist earlier in the pack (with nullid always included).

I think this the most restrictive check we can do, but please lmk if I'm wrong.

phillco edited the summary of this revision. (Show Details)Nov 6 2017, 12:57 PM
phillco marked 3 inline comments as done.Nov 6 2017, 12:58 PM
phillco planned changes to this revision.Nov 6 2017, 2:36 PM

After IRL'ing with @durham, will change this to be even more permissive, and allow deltabases that appear _later_ in the pack as well.

phillco updated this revision to Diff 3309.Nov 7 2017, 11:58 AM
phillco updated this revision to Diff 3310.

I've reworked this to allow bases anywhere in the pack, and also detect cycles.

remotefilelog/debugcommands.py
212–213

Since we're sanity checking, should we warn about duplicate nodes too, while we're here?

293–295

It's probably not worth the optimization, but I think we could set bases[node] to nullid once we exit the while loop (complete the chain) to memoize and save us some iterations.

phillco edited the summary of this revision. (Show Details)Nov 7 2017, 12:09 PM
durham edited the summary of this revision. (Show Details)Nov 7 2017, 1:31 PM
durham accepted this revision.Nov 7 2017, 1:37 PM

Accepting, but we should fix the n^2 issue.

remotefilelog/debugcommands.py
278

If we're not using nodes as a list, I'd just make it a set from the beginning

293–295

I think we should actually. This algorithm is n^2, and if we have a delta chain that is 100,000 entries long then we're doing 10,000,000,000 unnecessary extra comparisons.

This revision is now accepted and ready to land.Nov 7 2017, 1:37 PM
phillco updated this revision to Diff 3318.Nov 7 2017, 1:56 PM
phillco updated this revision to Diff 3319.Nov 7 2017, 2:02 PM
phillco updated this revision to Diff 3320.Nov 7 2017, 2:05 PM
phillco closed this revision.Nov 7 2017, 2:23 PM

landed

remotefilelog/debugcommands.py
293–295

sgtm