We need to store new data so this is a good opportunity to rework this fully.
- We directly store the list of affected file in the side data:
- This avoid having to fetch and parse the files list in the revision in addition to the sidedata. Making the data more self sufficient.
- This work around situation where that files field contains wrong information, and open the way to other bug fixing (eg: issue6219)
- The format (fixed initial index, sorted files) allow for fast lookup of filename within the structure.
- This unify the storage of affected files and copies sources and destination, limiting the number filename stored redundantly.
- This prepare for the fact we should drop the files as soon as we do any change affecting the revision schema.
- This rely on compression to avoid a significant increase of the changelog.d. More testing on this will be done before we freeze the final format.
- We can store additional data:
- The new "merged" field,
- A future "salvaged" set recording files that might have been deleted but have were still present in the final result.
Is there a reason to use big-endian instead of little-endian (as x86 and x86-64 are little-endian)? I had the same question asked when I proposed the first draft of the dirstate cache.