diff --git a/mercurial/helptext/internals/dirstate-v2.txt b/mercurial/helptext/internals/dirstate-v2.txt --- a/mercurial/helptext/internals/dirstate-v2.txt +++ b/mercurial/helptext/internals/dirstate-v2.txt @@ -371,6 +371,111 @@ (For example, `hg rm` makes a file untracked.) This counter is used to implement `has_tracked_dir`. -* Offset 30 and more: - **TODO:** docs not written yet - as this part of the format might be changing soon. +* Offset 30: + Some boolean values packed as bits of a single byte. + Starting from least-significant, bit masks are:: + + WDIR_TRACKED = 1 << 0 + P1_TRACKED = 1 << 1 + P2_INFO = 1 << 2 + HAS_MODE_AND_SIZE = 1 << 3 + HAS_MTIME = 1 << 4 + + Other bits are unset. The meaning of these bits are: + + `WDIR_TRACKED` + Set if the working directory contains a tracked file at this node’s path. + This is typically set and unset by `hg add` and `hg rm`. + + `P1_TRACKED` + set if the working directory’s first parent changeset + (whose node identifier is found in tree metadata) + contains a tracked file at this node’s path. + This is a cache to reduce manifest lookups. + + `P2_INFO` + Set if the file has been involved in some merge operation. + Either because it was actually merged, + or because the version in the second parent p2 version was ahead, + or because some rename moved it there. + In either case `hg status` will want it displayed as modified. + + Files that would be mentioned at all in the `dirstate-v1` file format + have a node with at least one of the above three bits set in `dirstate-v2`. + Let’s called these files "tracked anywhere", + and "untracked" the nodes with all three of these bits unset. + Untracked nodes are typically for directories: + they hold child nodes and form the tree structure. + Additional untracked nodes may also exist. + Although implementations should strive to clean up nodes + that are entirely unused, other untracked nodes may also exist. + For example, a future version of Mercurial might in some cases + add nodes for untracked files or/and ignored files in the working directory + in order to optimize `hg status` + by enabling it to skip `readdir` in more cases. + + When a node is for a file tracked anywhere, + the rest of the node data is three fields: + + * Offset 31: + If `HAS_MODE_AND_SIZE` is unset, four zero bytes. + Otherwise, a 32-bit integer for the Unix mode (as in `stat_result.st_mode`) + expected for this file to be considered clean. + Only the `S_IXUSR` bit (owner has execute permission) is considered. + + * Offset 35: + If `HAS_MODE_AND_SIZE` is unset, four zero bytes. + Otherwise, a 32-bit integer for expected size of the file + truncated to its 31 least-significant bits. + Unlike in dirstate-v1, negative values are not used. + + * Offset 39: + If `HAS_MTIME` is unset, four zero bytes. + Otherwise, a 32-bit integer for expected modified time of the file + (as in `stat_result.st_mode`), + truncated to its 31 least-significant bits. + Unlike in dirstate-v1, negative values are not used. + + If an untracked node `HAS_MTIME` *unset*, this space is unused: + + * Offset 31: + 12 bytes set to zero + + If an untracked node `HAS_MTIME` *set*, + what follows is the modification time of a directory + represented similarly to the C `timespec` struct: + + * Offset 31: + The number of seconds elapsed since the Unix epoch, + as a signed (two’s complement) 64-bit integer. + + * Offset 39: + The number of nanoseconds elapsed since + the instant specified by the previous field alone, + as 32-bit integer. + Always greater than or equal to zero, and strictly less than a billion. + + The presence of a directory modification time means that at some point, + this path in the working directory was observed: + + - To be a directory + - With the given modification time + - That time was already strictly in the past when observed, + meaning that later changes cannot happen in the same clock tick + and must cause a different modification time + (unless the system clock jumps back and we get unlucky, + which is not impossible but deemed unlikely enough). + - All direct children of this directory + (as returned by `std::fs::read_dir`) + either have a corresponding dirstate node, + or are ignored by ignore patterns whose hash is in tree metadata. + + This means that if `std::fs::symlink_metadata` later reports + the same modification time + and ignored patterns haven’t changed, + a run of status that is not listing ignored files + can skip calling `std::fs::read_dir` again for this directory, + and iterate child dirstate nodes instead. + + +* (Offset 43: end of this node) diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py --- a/mercurial/pure/parsers.py +++ b/mercurial/pure/parsers.py @@ -55,7 +55,7 @@ - p1_tracked: is the file tracked in working copy first parent - p2_info: the file has been involved in some merge operation. Either because it was actually merged, or because the p2 version was - ahead, or because some renamed moved it there. In either case + ahead, or because some rename moved it there. In either case `hg status` will want it displayed as modified. # about the file state expected from p1 manifest: diff --git a/rust/hg-core/src/dirstate_tree/on_disk.rs b/rust/hg-core/src/dirstate_tree/on_disk.rs --- a/rust/hg-core/src/dirstate_tree/on_disk.rs +++ b/rust/hg-core/src/dirstate_tree/on_disk.rs @@ -64,44 +64,24 @@ uuid: &'on_disk [u8], } +/// Fields are documented in the *Tree metadata in the docket file* +/// section of `mercurial/helptext/internals/dirstate-v2.txt` #[derive(BytesCast)] #[repr(C)] struct TreeMetadata { root_nodes: ChildNodes, nodes_with_entry_count: Size, nodes_with_copy_source_count: Size, - - /// How many bytes of this data file are not used anymore unreachable_bytes: Size, - - /// Current version always sets these bytes to zero when creating or - /// updating a dirstate. Future versions could assign some bits to signal - /// for example "the version that last wrote/updated this dirstate did so - /// in such and such way that can be relied on by versions that know to." unused: [u8; 4], - /// If non-zero, a hash of ignore files that were used for some previous - /// run of the `status` algorithm. - /// - /// We define: - /// - /// * "Root" ignore files are `.hgignore` at the root of the repository if - /// it exists, and files from `ui.ignore.*` config. This set of files is - /// then sorted by the string representation of their path. - /// * The "expanded contents" of an ignore files is the byte string made - /// by concatenating its contents with the "expanded contents" of other - /// files included with `include:` or `subinclude:` files, in inclusion - /// order. This definition is recursive, as included files can - /// themselves include more files. - /// - /// This hash is defined as the SHA-1 of the concatenation (in sorted - /// order) of the "expanded contents" of each "root" ignore file. - /// (Note that computing this does not require actually concatenating byte - /// strings into contiguous memory, instead SHA-1 hashing can be done - /// incrementally.) + /// See *Optional hash of ignore patterns* section of + /// `mercurial/helptext/internals/dirstate-v2.txt` ignore_patterns_hash: IgnorePatternsHash, } +/// Fields are documented in the *The data file format* +/// section of `mercurial/helptext/internals/dirstate-v2.txt` #[derive(BytesCast)] #[repr(C)] pub(super) struct Node { @@ -114,45 +94,6 @@ children: ChildNodes, pub(super) descendants_with_entry_count: Size, pub(super) tracked_descendants_count: Size, - - /// Depending on the bits in `flags`: - /// - /// * If any of `WDIR_TRACKED`, `P1_TRACKED`, or `P2_INFO` are set, the - /// node has an entry. - /// - /// - If `HAS_MODE_AND_SIZE` is set, `data.mode` and `data.size` are - /// meaningful. Otherwise they are set to zero - /// - If `HAS_MTIME` is set, `data.mtime` is meaningful. Otherwise it is - /// set to zero. - /// - /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO`, or `HAS_MTIME` - /// are set, the node does not have an entry and `data` is set to all - /// zeros. - /// - /// * If none of `WDIR_TRACKED`, `P1_TRACKED`, `P2_INFO` are set, but - /// `HAS_MTIME` is set, the bytes of `data` should instead be - /// interpreted as the `Timestamp` for the mtime of a cached directory. - /// - /// The presence of this combination of flags means that at some point, - /// this path in the working directory was observed: - /// - /// - To be a directory - /// - With the modification time as given by `Timestamp` - /// - That timestamp was already strictly in the past when observed, - /// meaning that later changes cannot happen in the same clock tick - /// and must cause a different modification time (unless the system - /// clock jumps back and we get unlucky, which is not impossible but - /// but deemed unlikely enough). - /// - All direct children of this directory (as returned by - /// `std::fs::read_dir`) either have a corresponding dirstate node, or - /// are ignored by ignore patterns whose hash is in - /// `TreeMetadata::ignore_patterns_hash`. - /// - /// This means that if `std::fs::symlink_metadata` later reports the - /// same modification time and ignored patterns haven’t changed, a run - /// of status that is not listing ignored files can skip calling - /// `std::fs::read_dir` again for this directory, iterate child - /// dirstate nodes instead. flags: Flags, data: Entry, }