Page MenuHomePhabricator

treedirstate: auto-repack treedirstate once it reaches 3x its original size
ClosedPublic

Authored by mbthomas on Nov 14 2017, 12:39 PM.

Diff Detail

Repository
rFBHGX Facebook Mercurial Extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mbthomas created this revision.Nov 14 2017, 12:39 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptNov 14 2017, 12:39 PM
mbthomas updated this revision to Diff 3604.Nov 17 2017, 9:45 AM
quark requested changes to this revision.Nov 20 2017, 4:04 AM
quark added a subscriber: quark.

Mostly looks good. Requesting changes since wrapping runcommand is not the cleanest way to trigger cleanup.

treedirstate/__init__.py
597

The cleanup logic does not look that expensive. Maybe it could be done as a post close hook of a transaction. At that time the wlock is still taken so you don't need to take another wlock (which is expensive for read-only commands).

This revision now requires changes to proceed.Nov 20 2017, 4:04 AM
mbthomas updated this revision to Diff 3659.Nov 20 2017, 12:21 PM
mbthomas added a subscriber: stash.Nov 20 2017, 1:07 PM
mbthomas added inline comments.
treedirstate/__init__.py
597

It only runs if the dirstate has been modified (based on _repacked or _extended being set), so read-only commands won't trigger a repack.

I based this on what infinitepush backupcommand does at @stash's recommendation. Infinitepush used to use a transaction close hook, but it caused problems for commands that contain many transactions (e.g. histedit). I think it's a good idea to keep to one technique, too. What do you think?

quark added inline comments.Nov 20 2017, 3:10 PM
treedirstate/__init__.py
597

Good point about histedit.

The use of a separate wlock could potentially block the command unnecessarily. A transaction hook would avoid the lock issue and save a few ms on commands with a single transaction.

Infinitepush backup needs to be fired at the end of the last write operation, but this cleanup does not have to. To avoid running cleanups multiple times, I think it could be done by having a per-process state.

durham added a subscriber: durham.Nov 21 2017, 12:08 PM
durham added inline comments.
treedirstate/__init__.py
512

I'd doc comment some of these, to make it clear what it's cleaning up.

597

Could we wrap repo.close() instead? I also dislike wrapping runcommand and it'd be nice if we could start treating repo.close() as an actual repo destructor.

durham requested changes to this revision.Nov 21 2017, 12:09 PM

Also, this functionality probably warrants a test.

This revision now requires changes to proceed.Nov 21 2017, 12:09 PM

I'll add some integration tests as a separate diff on top of this stack tomorrow, and include repacking in that.

treedirstate/__init__.py
597

I'll change to wrapping repo.close(). We still need to get the wlock separately, but cleanup happens rarely, so this should be ok.

mbthomas updated this revision to Diff 3734.Nov 21 2017, 1:35 PM
mbthomas updated this revision to Diff 3775.Nov 22 2017, 1:14 PM
mbthomas updated this revision to Diff 3828.Nov 24 2017, 11:53 AM
mbthomas updated this revision to Diff 3845.Nov 24 2017, 3:18 PM
mbthomas updated this revision to Diff 3869.Nov 27 2017, 8:00 AM
mbthomas added inline comments.Nov 27 2017, 12:37 PM
treedirstate/__init__.py
394

As with treedirstate_enabled, I will update this to use ui.log and sampling.

durham requested changes to this revision.Nov 27 2017, 1:02 PM
durham added inline comments.
treedirstate/__init__.py
522

If the file system is not writable, this will throw an exception. We probably don't want cleanup logic to cause commands on readonly filesystems to break. Probably always a good test candidate.

This revision now requires changes to proceed.Nov 27 2017, 1:02 PM
mbthomas added inline comments.Nov 27 2017, 4:01 PM
treedirstate/__init__.py
522

Aside from debugtreedirstate cleanup, we only ever cleanup when we've written to the dirstate already, which means it really ought to be possible to write here. That being said, I don't mind adding some belt-and-braces handling. Should I just ignore all exceptions that come out of cleanup, or are there specific ones?

durham added inline comments.Nov 27 2017, 4:12 PM
treedirstate/__init__.py
522

From the comment in wrapclose, I assumed cleanup was run on 1% of commands that read the dirstate as well? Maybe that big if statement in the wrapclose finally needs to be cleaned up to make it clearer when it runs?

As for exceptions, usually IOError and OSError, but since this is the Lock class it might throw something special as well.

mbthomas added inline comments.Nov 27 2017, 4:14 PM
treedirstate/__init__.py
522

I think it throws Abort. I'll clean up the if statement and the comments to make it clear, but I think it should be ok to fail here if we make it this far and the repo is read-only. That shouldn't have happened, and I'd like to know that it did.

mbthomas updated this revision to Diff 3888.Nov 27 2017, 4:32 PM
durham accepted this revision.Nov 27 2017, 4:47 PM
This revision was automatically updated to reflect the committed changes.