From: Jeff Hostetler <git@jeffhostetler.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 1/4] split-index & fsmonitor: demonstrate a bug
Date: Thu, 23 Mar 2023 09:39:44 -0400 [thread overview]
Message-ID: <80be2220-ae7b-be65-2ab0-787e52169250@jeffhostetler.com> (raw)
In-Reply-To: <c025fccbdde1a4df11ba36552f86369ed74d37d2.1679500859.git.gitgitgadget@gmail.com>
On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> This commit adds a new test case that demonstrates a bug in the
> split-index code that is triggered under certain circumstances when the
> FSMonitor is enabled, and its symptom manifests in the form of one of
> the following error messages:
>
> BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)
>
> BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index
>
> error: invalid path ''
> error: The following untracked working tree files would be overwritten by reset:
> initial.t
>
> Which of these error messages appears depends on timing-dependent
> conditions.
>
> Technically the root cause lies with a bug in the split-index code that
> has nothing to do with FSMonitor, but for the sake of this new test case
> it was the easiest way to trigger the bug.
>
> The bug is this: Under specific conditions, Git needs to skip writing
> the "link" extension (which is the index extension containing the
> information pertaining to the split-index). To do that, the `base_oid`
> attribute of the `split_index` structure in the in-memory index is
> zeroed out, and `do_write_index()` specifically checks for a "null"
> `base_oid` to understand that the "link" extension should not be
> written. However, this violates the consistency of the in-memory index
> structure, but that does not cause problems in most cases because the
> process exits without using the in-memory index structure anymore,
> anyway.
>
> But: _When_ the in-memory index is still used (which is the case e.g. in
> `git rebase`), subsequent writes of `the_index` are at risk of writing
> out a bogus index file, one that _should_ have a "link" extension but
> does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
> set for subsequent writes, forcing the shared index to be written, which
> re-initializes `base_oid` to a non-bogus state, and all is good.
>
> When it is _not_ set, however, all kinds of mayhem ensue, resulting in
> above-mentioned error messages, and often enough putting worktrees in a
> totally broken state where the only recourse is to manually delete the
> `index` and the `index.lock` files and then call `git reset` manually.
> Not something to ask users to do.
>
> The reason why it is comparatively easy to trigger the bug with
> FSMonitor is that there is _another_ bug in the FSMonitor code:
> `mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
> variable as a Boolean. But it is a bit field, and 1 happens to be the
> `SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
> when writing the index, among other things.
>
> "Comparatively easy" is a relative term in this context, for sure. The
> essence of how the new test case triggers the bug is as following:
>
> 1. The `git rebase` invocation will first reset the worktree to
> a commit that contains only the `one.t` file, and then execute a
> rebase script that starts with the following commands (commit hashes
> skipped):
>
> label onto
>
> reset initial
> pick two
> label two
>
> reset two
> pick three
> [...]
>
> 2. Before executing the `label` command, a split index is written, as
> well as the shared index.
>
> 3. The `reset initial` command in the rebase script writes out a new
> split index but skips writing the shared index, as intended.
>
> 4. The `pick two` command updates the worktree and refreshes the index,
> marking the `two.t` entry as valid via the FSMonitor, which sets the
> `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
> `base_oid` attribute to be zeroed out and a full (non-split) index
> to be written (making sure _not_ to write the "link" extension).
>
> 5. Now, the `reset two` command will leave the worktree alone, but
> still write out a new split index, not writing the shared index
> (because `base_oid` is still zeroed out, and there is no index entry
> update requiring it to be written, either).
>
> 6. When it is turn to run `pick three`, the index is read, but it is
> too short: It only contains a single entry when there should be two,
> because the "link" extension is missing from the written-out index
> file.
>
> There are three bugs at play, actually, which will be fixed over the
> course of the next commits:
>
> - The `base_oid` attribute should not be zeroed out to indicate when
> the "link" extension should not be written, as it puts the in-memory
> index structure into an inconsistent state.
>
> - The FSMonitor should not overwrite bits in `cache_changed`.
>
> - The `unpack_trees()` function tries to reuse the `split_index`
> structure from the source index, if any, but does not propagate the
> `SPLIT_INDEX_ORDERED` flag.
>
> While a fix for the second bug would let this test case pass, there are
> other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
> the bug that most crucially needs to be fixed is the first one.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Very well said. Thank you!!!
> ---
> t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index d419085379c..cbafdd69602 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
> egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
> '
>
> +test_expect_failure 'split-index and FSMonitor work well together' '
> + git init split-index &&
> + test_when_finished "git -C \"$PWD/split-index\" \
> + fsmonitor--daemon stop" &&
> + (
> + cd split-index &&
> + git config core.splitIndex true &&
> + # force split-index in most cases
> + git config splitIndex.maxPercentChange 99 &&
> + git config core.fsmonitor true &&
> +
> + # Create the following commit topology:
> + #
> + # * merge three
> + # |\
> + # | * three
> + # * | merge two
> + # |\|
> + # | * two
> + # * | one
> + # |/
> + # * 5a5efd7 initial
> +
> + test_commit initial &&
> + test_commit two &&
> + test_commit three &&
> + git reset --hard initial &&
> + test_commit one &&
> + test_tick &&
> + git merge two &&
> + test_tick &&
> + git merge three &&
> +
> + git rebase --force-rebase -r one
> + )
> +'
> +
> test_done
next prev parent reply other threads:[~2023-03-23 13:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-23 13:39 ` Jeff Hostetler [this message]
2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-22 21:24 ` Junio C Hamano
2023-03-23 13:59 ` Jeff Hostetler
2023-03-23 16:06 ` Junio C Hamano
2023-03-23 15:22 ` Jeff Hostetler
2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-23 14:32 ` Jeff Hostetler
2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-26 22:45 ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-27 14:48 ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
2023-03-27 16:42 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80be2220-ae7b-be65-2ab0-787e52169250@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=johannes.schindelin@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).