( )⚙ D11239 debugcommands: introduce a debug command to repair repos affected by issue6528

This is an archive of the discontinued Mercurial Phabricator instance.

debugcommands: introduce a debug command to repair repos affected by issue6528

Authored by Alphare on Aug 2 2021, 9:36 AM.



This command is quite basic and slow, it will loop over the entirety of the
filelogs in the repository and check each revision for corruption, then fixes
the affected filelogs. It takes under 25 minutes for Mozilla-Central on my
not-top-of-the-line laptop, using the --to-report and --from-report options
will make this pretty tolerable to use, I think.

This change also introduces a test for the fix.

Diff Detail

rHG Mercurial
No Linters Available
No Unit Test Coverage

Event Timeline

Alphare created this revision.Aug 2 2021, 9:36 AM
marmoute requested changes to this revision.Aug 2 2021, 3:03 PM
marmoute added a subscriber: marmoute.

We are moving in the right direction, but I made multiple comment about adjustement I feel necessary before we can consider this function safe.


small nits: maybe "debug-repair-issue6528" instead of "debug-fix-issue6528" ?


idea for improvement (might be useful t write them down in comment:

allow to specify filelog pattern and revision pattern to restrict the search. Useful for faster scan when user know what they are looking for.


The following format would seems more suitable:

<ascii-hex of the corrupted file revision> <unencoded-filename>

Putting the hex first is simpler, since we then know the rest of the line if about the filename.

Putting the filename instead of encoded index path means we are more independant of the underlying format variante (with or without dot-encode for example).


(not sure this is really unlikely, I would drop that remark)


Why do we need the bool here ?


Same feedback about fixrepair


note: this seems "fine" to me for stable. However this seems a good indication that we need some store/repo level API for "give me all the filelog"


It is probably worth adding a programming error if the repository is not locked.


This seems the only line that check and enforce the version.

We need:

  1. a programming error to catch the issue on Windows/Optimized setting
  2. a higher level check of the repository format in the top level command

If there is a crash/interrupt during the operation, we will leave temporary file behind. We should add a try:, finally: clause here, and possibly a no-interrupt context.


nits: this is the sensitive part, it is probably worth factoring it out in a small dedicated function since most of it is common, but for the offset computation.

What do you think ?

I addition it seems like a good idea to mention the "speedup" idea from Joerg (about only checking the delta when possible)


Since this is a sensitive issue, we should probably add a docstring about the detection method here.


Not sure what "unfortunate cyclic sitatuation" means here ‽


I don't think we capitalize error output in general.


tiny nits: this should probably be a msg %= on the previous line


You should probably sort the resulting to_fix set to make sure we process things in a predictable order (and start to end seems like a good idea in general)


adding uninterruptible for the full command seems quite overkill, especially since the command can apparently run for half an hours.

See my earlier comment about only using ui.uninterruptible() while doing the actual fixing.


small nits: if to_fix: seem more pythonic to me


We should sort to_fix to ensure growing order.


I don't think we capitalize output.


We should make this is ===== level title and use ----- level title (and associated description) for the individual tests within this section.


We should add debug index call here to show they are swapped


small nits: the output says we "found" the corruption, but does not state that we fixed it.

We should also check the index content after the command run.


Testing a multi-lines report seems important too.

This revision now requires changes to proceed.Aug 2 2021, 3:03 PM
Alphare marked 23 inline comments as done.Aug 4 2021, 9:16 AM

Pending CI refresh


I've updated the patch to a new format that uses unencoded filenames but still groups the revisions per filelog (when exporting, the format still allows for duplicate filelogs, it's just less efficient since it's just looping over lines).

This is done following an IRC discussion where @marmoute and I disagreed on the exact format, and this is the version I feel is simpler. If more people disagree with my choice, I'll change it, but it's probably not worth bikeshedding too hard on this.


I've created a multiple revisions + multiple files repo

Alphare marked an inline comment as done.Aug 4 2021, 10:22 AM
Alphare retitled this revision from debugcommands: introduce a debug command to fix repos affected by issue6528 to debugcommands: introduce a debug command to repair repos affected by issue6528.
Alphare updated this revision to Diff 29790.
Alphare updated this revision to Diff 29791.Aug 4 2021, 11:07 AM
marmoute requested changes to this revision.Aug 4 2021, 3:55 PM

Added a handful of new comments.


I am fine with using multiple nodes per lines but I could probably use , for separation between node and still for separation with the filename.

I would also gather all the lines before processing since that isn't very costly. However this is a quite minor changes


note: this is not the right location for this comment :-/ the lock checking should be done when we actually rewrite revlog.

We do not usually needs it, but the rewrite code is "exceptionnal" enough to add some double checking.


This should go in _reorder_filelog_parents (I am starting to suspect phabricator simply shifted all comments a couple of line too low)


We need a nointerrupt context here, don't we ?


It would seem cleaner to have this function only determine the status of the revision and have the warning message only issued by the called of this method.

The "corrupted" is a bit scary too. maybe "affected by issue6528"

This will be useful if we want to detect this in more situation (e.g. during exchange)


Here it is. It still feel like too wide and in the wrong place:

  • we want it as narrow as possible since it is "a bit hacky"
  • we want it in the function that actually setup the dangerous bit so that it (1) is always in place (2) stay next to the dangerous implementation.

Looks like you want to:

  1. gather the list of filelog
  2. use its len(xxx) for total
  3. use that list for the iteration
This revision now requires changes to proceed.Aug 4 2021, 3:55 PM
Alphare marked 7 inline comments as done.Aug 5 2021, 4:09 AM
Alphare added inline comments.

I guess the memory footprint wouldn't get too bad even on larger repos.

baymax updated this revision to Diff 29800.Aug 5 2021, 5:11 AM

✅ refresh by Heptapod after a successful CI run (🐙 💚)
⚠ This patch is intended for stable ⚠

marmoute added inline comments.Aug 5 2021, 5:22 AM

Yeah, I dont expect thousand of these, and if so, the associated repository probably have large machine using it. (and worst case the report file can be feed in multiple step).

marmoute accepted this revision.Aug 6 2021, 6:03 AM
marmoute added inline comments.

My two past self eventually talked to each other and we probably want is in both. I'll prepare you a patch you can fold into this one.


This is probably a bit too verbose for the common case. I'll prepare something to control that.

So we have this part covered (slowly). We can now (in other Diffs) move to the next points.

  • implement a faster method
  • fix this issue on the fly during exchange
  • ensure things are fixed on upgrade (cf D11252)

Actually, I amd now triple confused about what I meant, so lets leave it that way,


Actually, this is only issued when processing report, so this seems fine.

Alphare marked an inline comment as done.Aug 6 2021, 6:41 AM
Alphare updated this revision to Diff 29829.
Alphare updated this revision to Diff 29835.Aug 6 2021, 1:50 PM
marmoute accepted this revision.Aug 6 2021, 4:47 PM
baymax updated this revision to Diff 29843.Aug 7 2021, 9:47 AM

✅ refresh by Heptapod after a successful CI run (🐙 💚)
⚠ This patch is intended for stable ⚠

baymax updated this revision to Diff 29855.Aug 7 2021, 6:09 PM

✅ refresh by Heptapod after a successful CI run (🐙 💚)
⚠ This patch is intended for stable ⚠

pulkit accepted this revision.Aug 10 2021, 8:16 AM
This revision is now accepted and ready to land.Aug 10 2021, 8:16 AM