Page MenuHomePhabricator

upgrade: use copy+delete instead of rename while creating backup
Changes PlannedPublic

Authored by pulkit on Thu, Dec 31, 11:43 AM.

Details

Reviewers
marmoute
Group Reviewers
hg-reviewers
Summary

A lot of times, we do an upgrade operation which does not touches all the parts
of the stores. But right not, we have a blind logic which processes everything.
To selectively upgrade parts of repository, we need to persist existing data
which is untouched.

However while creating current repository backup, we rename the whole store
leaving no option to persist untouched files.

We switch to copy+delete so that we can only delete data files which are changed
by the operation and leave rest untouched.

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

pulkit created this revision.Thu, Dec 31, 11:43 AM

I worried about having a much wider inconsistency window if we don't use rename. I'll have a deeper look later.

Help With Essay
from the Academic Experts. Our essay writing service is designed to get you the extra help you need in completing your next university essay

marmoute requested changes to this revision.Thu, Jan 7, 9:52 PM

As pointed in a pa previous comment, this is problematic, because this does not preserve the same consistency as the previous code

This revision now requires changes to proceed.Thu, Jan 7, 9:52 PM

As pointed in a pa previous comment, this is problematic, because this does not preserve the same consistency as the previous code

While the copy consistency can be maintained here, future changes (WIP) will break the consistency more. In upcoming work, we want to only touch parts of repository which needs to be updated. This will lead us to do selectively moving data from the upgraded repository to the current one.

As pointed in a pa previous comment, this is problematic, because this does not preserve the same consistency as the previous code

While the copy consistency can be maintained here, future changes (WIP) will break the consistency more. In upcoming work, we want to only touch parts of repository which needs to be updated. This will lead us to do selectively moving data from the upgraded repository to the current one.

@marmoute are you OK with this explanation?

My question is, would it be better to hardlink and then delete to avoid the copy overhead? (Of course not every filesystem supports that, but I think it falls back to a copy in that case.)

pulkit updated this revision to Diff 24774.Wed, Jan 13, 5:31 AM
marmoute requested changes to this revision.Wed, Jan 13, 5:34 AM

As pointed in a pa previous comment, this is problematic, because this does not preserve the same consistency as the previous code

While the copy consistency can be maintained here, future changes (WIP) will break the consistency more. In upcoming work, we want to only touch parts of repository which needs to be updated. This will lead us to do selectively moving data from the upgraded repository to the current one.

@marmoute are you OK with this explanation?

Not at all :-)

Currently we have:

  1. a fully consistent repository in place (while the upgrade run)
  2. a missing store during a split second (old store was rename)
  3. a fully consistent repository in place (the upgraded store was rename in place)

The process ensure a very narrow inconsistency window and a very clear invalid state "the store is either valid or missing."

The change in this part move the (2) step to a much longer operation where elements get copied one by one. As a result we get a much longer window of inconsistency with a much wonkier inconsistency state as we get "mixed" files within the resulting store.

So overall this change is going in the wrong direction.

We are currently looking for lighter upgrade for 2 feature "persistent nodemap" and "share-safe". And I don't think either of them need this kind of change. A upgrade processs for persistent nodemap could be:

  1. write new persistent nodemaps (data file, then docket) that nobody will read without the requirements.
  2. add the new requirement

The downgrade being the reverse (remove the requirement, remove the nodemaps)

And something similar could be done for the share-safe upgrade

  1. write a new .hg/store/requirements that nobody will read yet
  2. rewrite .hg/requirements
This revision now requires changes to proceed.Wed, Jan 13, 5:34 AM

As pointed in a pa previous comment, this is problematic, because this does not preserve the same consistency as the previous code

While the copy consistency can be maintained here, future changes (WIP) will break the consistency more. In upcoming work, we want to only touch parts of repository which needs to be updated. This will lead us to do selectively moving data from the upgraded repository to the current one.

@marmoute are you OK with this explanation?

Not at all :-)
Currently we have:

  1. a fully consistent repository in place (while the upgrade run)
  2. a missing store during a split second (old store was rename)
  3. a fully consistent repository in place (the upgraded store was rename in place)

The process ensure a very narrow inconsistency window and a very clear invalid state "the store is either valid or missing."
The change in this part move the (2) step to a much longer operation where elements get copied one by one. As a result we get a much longer window of inconsistency with a much wonkier inconsistency state as we get "mixed" files within the resulting store.

Nope, this patch does not do that. We have three stores here: backupstore (containing backup of current store), current store and upgraded store. Below is comparison of split second as part 2) mentioned above.

Before:

  1. Move current store to backup store
  2. Move upgraded store to current store

Store inconsistency starts when step 1) starts and remains till step 2) finishes since both are move operation.

After this patch:

  1. Copy current store to backup store
  2. Move upgraded store to current store

In this one, store inconsistency happens only at step 2) since step 1) is a copy.

So, this patch reduces the time when store was inconsistent.

Note: that renaming a directory is instantaneous on any descent file system. copying full hierarchy is not.

I am afraid your explanation is now confusing me. Can you try to explain what this patch is doing again ?

pulkit planned changes to this revision.Thu, Jan 21, 9:42 AM

Dropping this from stack for now to focus on getting other work pushed.