We are going to be rewriting sidedata soon, it's going to be appended to the
revlog data file, meaning that the data and the sidedata might not be
contiguous.
Details
- Reviewers
indygreg marmoute - Group Reviewers
hg-reviewers - Commits
- rHG4cd214c9948d: revlogv2: don't assume that the sidedata of the last rev is right after data
Diff Detail
- Repository
- rHG Mercurial
- Branch
- default
- Lint
No Linters Available - Unit
No Unit Test Coverage
Event Timeline
mercurial/revlog.py | ||
---|---|---|
2338–2341 | Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ? |
mercurial/revlog.py | ||
---|---|---|
2338–2341 | We do store the absolute offset, the point here is to find the index after the last entry's (potential) sidedata. Perhaps this variable should be named prev_sidedata_offset to make that clearer? |
mercurial/revlog.py | ||
---|---|---|
2338–2341 | So we are looking to the offset of the first byte that does not contains data? If so why are we looking for side data in particular and not just actual data too? If not, what is going on here ? |
mercurial/revlog.py | ||
---|---|---|
2338–2341 | Revlog data files are still append-only (even with the re-writing mechanism of the next patches in the stack), and the sidedata is the last thing that is written for any given entry. So if you want to find the offset of all data (in the general sense of the term), then you look for the end of the sidedata of the last entry. Is there something I missed about there being a simpler solution, or should I add a comment explaining this? |
mercurial/revlog.py | ||
---|---|---|
2338–2341 | Is it ? Because if the last entry did not had any sidedata. We need to take look at its actual data instead, isn't it? and actually if the previous entry had sidedata that we needed to compute after the fact, it sidedata will be written after the last entry data… etc. So the current approach seems quite fragile to me. It need a least a clear comment explaining what is it trying to do. And probably jus to be moved within a function with a clear semantic and docstring. The "last offset" information could be stored in the docket to make all this simpler soonish. |
mercurial/revlog.py | ||
---|---|---|
2338–2341 |
Right, hence the check for the sidedata offset being 0, which falls back to the regular end.
If the sidedata needs to be computer after the fact, it means - for now - that it doesn't currently have sidedata, so we'll write (correctly) after the end of the data, do so for however many revisions in the group, then compute their potential respective sidedata and append that to the data file one right after the other (so, separate from their data, but always after). Is that clearer? If so I'll explain better in code. |
mercurial/revlog.py | ||
---|---|---|
2338–2341 | What is the case I described (last revision with no sidedata, previous one with sidedata post-computed) happens for data committed before we start this write. |
mercurial/revlog.py | ||
---|---|---|
2338–2341 | Right, sorry, I misread your previous message. I'm sending a change with a linear scan of the index as a first step, pending the use of a docket file, so at least it's correct in all cases. |
Why don't we simply store the absolute offset for the sidedata instead of needing arithmetic ?