- User Since
- Dec 6 2017, 9:25 AM (156 w, 3 d)
Superseded by D9525
Thu, Dec 3
Will be done differently.
Wed, Dec 2
Tue, Dec 1
Mon, Nov 30
Sun, Nov 29
Sat, Nov 28
The non-POC part is superseded by D9445.
Fri, Nov 27
Sat, Nov 21
I was looking specifically at bzip2 for a bit. There are essentially two kinds of threaded compressors for it. pbzip2 is the more common and creates effectively multiple independent streams. That's not handled transparently by Python's bz2, so it would break all existing clients, making this a big no-go. Sourceforge has a more proper implementation for POSIX platforms (http://bzip2smp.sourceforge.net/) which doesn't have that problem and it would be nice if someone re-implemented the idea for modern libbz2. It can be done cleaner too. While this doesn't allow multi-threaded decompression for multi-stream-aware clients, it does work with all bzip2 decoders. Sadly the way it is done can't be from Python without re-implementing a good chunk of bz2 as it hooks deeply into the implementation. So in short, it would be possible to provide it as C extension and possibly even vendored, but it is more work than I currently want to do. I haven't looked into the state of pigz.
Thu, Nov 19
Wed, Nov 18
The last update implemented the option to read to journal from disk, in case that wasn't clear from the description change.
Tue, Nov 10
Please retry, the parent changes are all merged at this point.
Sun, Nov 8
I don't think just aborting when no option is given is a good UX. That feels even worse than making it interactive by default as discussed during the sprint.
The proposal during the spring was to ask by default if it is not loaded as extension. Or to make it more explicit:
(1) Check if there is anything to do. If no, just return success.
(2) Check if the extension was enabled. If yes, remove and exit.
(3) Check if the "--confirm" (or "--force"?) flag was given. If not, ask for confirmation to remove X files.
(4) Remove files.
My memory from the discussion was that we explicitly wanted to keep it as extension as it normal users shouldn't use this and it is considered dangerous?
Sat, Nov 7
I understand the concern and agree. I see two options of dealing with this without breaking the intention of the change here.
(1) Reverse the order. This would fix the immediate problem, but is more risky for similar classes of issues in other places, e.g. tree manifest chains.
(2) Adopt the logic from D9237 to read back the journal to recover the original order. Since both _playback and repair.strip will only ever be called once per transaction, it doesn't alter the performance much.
I've been wondering about that question as well. In theory, we could run into O(n^2) behavior when there are many files that end up being split. I've decided to take the step back from the full dropping in D9237 as most transactions are much smaller than the initial clone and therefore are not as sensitive to the memory use.
I'll cut this into smaller parts that are easier to review and decide on which steps are too far.
Fri, Nov 6
My understanding of the transaction processing is that we keep track of all files modified. For each file, we remember the old size. When we have to rollback the transaction, we iterate over that list and truncate it at that point. Now this part works as before, just that we don't keep the file name -> offset table in memory but read back the on-disk representation. I don't think this makes a performance difference either compared to everything else going on. There is one more user of the offsets and that's the logic in revlog for moving from inline data storage to separated data files. With the current patch, it would be O(n^2) worst case as it reparses the whole file.
Nov 3 2020
Nov 2 2020
Oct 30 2020
Oct 29 2020
Sorry, let me retract that. Wrong review, this one can be pulled out in isolation.
Oct 28 2020
Oct 27 2020
The index has a documented format, but it wasn't enforced so far. With the pending child review, that would change. The case of cs==node is seen when we send full changesets in the bundle. In that case we used to leak the full node id into the index.
No, the two parents are necessary because they store invalid data in the index, i.e. they violate the interface contract, but it wasn't enforced so far.
Oct 23 2020
Oct 22 2020
Oct 21 2020
Oct 20 2020
Oct 15 2020
Oct 14 2020
Re-submitted the change in a way that the Python2.7 nested scope issue is avoided.
Oct 12 2020
Oct 9 2020
Basically, because the cache in the current form is very expensive in terms of memory use. For a large repository, it can easily consume 100MB+ for something like 2% runtime gain, which doesn't seem a good trade-off.