With updating the format of statefiles to use cbor, there can be cases when a
user updates hg in between a unresolved graft, rebase, histedit, cbor load
function in new hg won't be able to parse that properly. So let's fallback to
reading the file old way if cbor fails or does not returns a dictionary as cbor
may read that data without any error.
Details
- Reviewers
- None
- Group Reviewers
hg-reviewers
Diff Detail
- Repository
- rHG Mercurial
- Lint
Lint Skipped - Unit
Unit Tests Skipped
Event Timeline
Not sure where to record this comment in this series. So I'll pick this commit.
I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial.
mercurial/state.py | ||
---|---|---|
88 | bad ident | |
90 | How about not requiring it to be a dict? I imagine practically all callers will want to pass a dict, but why does this class have to enforce it? I think the API would be simpler if it was an opaque object. In the simple case of graft, the state is simply a list of nodes. However, for more complex states, we could have nested structures. I don't see why cmdstate should be involved in lookups into the top-level structure. The difference is subtle, but here's an example: # With dict-aware cmdstate cmdstate = ... cmdstate.load() version = cmdstate['version'] for car in cmdstate['cars']: for wheel in car['wheels']: # whatever # With agnostic cmdstate cmdstate = ... parking = cmdstate.load() version = parking['version'] for car in parking['cars']: for wheel in car['wheels']: # whatever |
Also, it doesn't seem like CBOR defines any magic bytes to start the top-level object with, so it's not obvious to me that an old state file (e.g. on containing just a nodeid) could not be parsed as a valid CBOR file. Perhaps cmdstate should help with that? We could make it always add a first item that's just "CBOR" or something (it seem very unlikely that we'd have that in an old state file), and we could fail when reading a state file that doesn't have that. Or would could require any new state files to have a new name than the old ones?
mercurial/state.py | ||
---|---|---|
90 | I don't have a strong reason. I just found this an easy way. I was going to add setitem() too so that we can store more new values or change existing values. For graft case, it won't be just a list of nodes, we are going to store info about --no-commit flag in graftstate. Moreover we need graft --abort too which needs to store more things. The one benefit is that we can pass the state object everywhere, lookup for values and update values on fly. |
I can add another key value pair 'magicstring: CBOR' which should be present in the cbor format state files, or we can start storing statefiles in .hg/state/ folder.
mercurial/state.py | ||
---|---|---|
90 |
It seems the other way is still slightly easier :) You'd replace this: def __getitem__(self, key): return self.opts[key] def __setitem__(self, key, value): updates = {key: value} self.opts.update(updates) by this: def get(self): return self.state def set(self, state): self.state = state I'm not even sure we'd need those two methods. It might be enough to make load() return the state and make save() accept (and require) a new state. The constructor would change from: if not opts: self.opts = {} else: self.opts = opts to: self.state = state The graft code for reading would change from: cmdstate.load() if cmdstate['version'] < 2: nodes = cmdstate['nodes'] revs = [repo[node].rev() for node in nodes] else: # state-file is written by a newer mercurial than what we are # using raise error.Abort(_("unable to read to read the state file")) to: state = cmdstate.load() if state['version'] < 2: nodes = state['nodes'] revs = [repo[node].rev() for node in nodes] else: # state-file is written by a newer mercurial than what we are # using raise error.Abort(_("unable to read to read the state file")) The graft code for writing would change from: cmdstate.addopts({'version': 1, 'nodes': nodelines}) cmdstate.save() to: cmdstate.set({'version': 1, 'nodes': nodelines}) cmdstate.save()
We can still pass the "cmdstate" instance around if we wanted to do that. I think it's a small plus that we can pass *just* the state around, though. That means we can pass the state into a function and we can look at the call site and know that it won't be calling .save() on it. |
mercurial/state.py | ||
---|---|---|
90 | I don't feel convinced on what can be the benefit of it and my opinion can be influenced by how I used it in evolve extension. But I can change it the other way around too, let me know if you want me to change. |
mercurial/state.py | ||
---|---|---|
90 | If it's up to me, I'd definitely say that you should change it. I don't see any benefits of the current proposal. I'd be happy to hear what other reviewers think. |
mercurial/state.py | ||
---|---|---|
90 | Oh, and I should clarify that I think the root of the problem is that you're mixing the class for reading with state itself (but only the top-level items of it). Let's say someone realizes they want to iterate over the items in the state, would we also add a .items() then? (I don't think that's a likely scenario, but it shows what I mean.) |
mercurial/state.py | ||
---|---|---|
90 | Sorry if it's already discussed, but what's the benefit of cmdstate I think plain load/save functions are much simpler than a dict-like object state = statemod.load(repo, fname) state = {'version': 1, 'nodes': nodelines} statemod.save(repo, fname, state) (statemod.load/save could be a class for serialization/deserialization) The state dict would be wrapped anyway depending on application logic. |
Thanks for all the reviews and helping me improving this. The improved version of state file which is already pushed to core should raise error.CorruptedState() if an old version of state file is found and caller should catch it and parse the old way. I don't think having a registrar is required anymore. Abandoning this differential.
bad ident