git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 1/4] split-index & fsmonitor: demonstrate a bug
Date: Thu, 23 Mar 2023 09:39:44 -0400	[thread overview]
Message-ID: <80be2220-ae7b-be65-2ab0-787e52169250@jeffhostetler.com> (raw)
In-Reply-To: <c025fccbdde1a4df11ba36552f86369ed74d37d2.1679500859.git.gitgitgadget@gmail.com>



On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> This commit adds a new test case that demonstrates a bug in the
> split-index code that is triggered under certain circumstances when the
> FSMonitor is enabled, and its symptom manifests in the form of one of
> the following error messages:
> 
>      BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)
> 
>      BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index
> 
>      error: invalid path ''
>      error: The following untracked working tree files would be overwritten by reset:
>              initial.t
> 
> Which of these error messages appears depends on timing-dependent
> conditions.
> 
> Technically the root cause lies with a bug in the split-index code that
> has nothing to do with FSMonitor, but for the sake of this new test case
> it was the easiest way to trigger the bug.
> 
> The bug is this: Under specific conditions, Git needs to skip writing
> the "link" extension (which is the index extension containing the
> information pertaining to the split-index). To do that, the `base_oid`
> attribute of the `split_index` structure in the in-memory index is
> zeroed out, and `do_write_index()` specifically checks for a "null"
> `base_oid` to understand that the "link" extension should not be
> written. However, this violates the consistency of the in-memory index
> structure, but that does not cause problems in most cases because the
> process exits without using the in-memory index structure anymore,
> anyway.
> 
> But: _When_ the in-memory index is still used (which is the case e.g. in
> `git rebase`), subsequent writes of `the_index` are at risk of writing
> out a bogus index file, one that _should_ have a "link" extension but
> does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
> set for subsequent writes, forcing the shared index to be written, which
> re-initializes `base_oid` to a non-bogus state, and all is good.
> 
> When it is _not_ set, however, all kinds of mayhem ensue, resulting in
> above-mentioned error messages, and often enough putting worktrees in a
> totally broken state where the only recourse is to manually delete the
> `index` and the `index.lock` files and then call `git reset` manually.
> Not something to ask users to do.
> 
> The reason why it is comparatively easy to trigger the bug with
> FSMonitor is that there is _another_ bug in the FSMonitor code:
> `mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
> variable as a Boolean. But it is a bit field, and 1 happens to be the
> `SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
> when writing the index, among other things.
> 
> "Comparatively easy" is a relative term in this context, for sure. The
> essence of how the new test case triggers the bug is as following:
> 
> 1. The `git rebase` invocation will first reset the worktree to
>     a commit that contains only the `one.t` file, and then execute a
>     rebase script that starts with the following commands (commit hashes
>     skipped):
> 
>     label onto
> 
>     reset initial
>     pick two
>     label two
> 
>     reset two
>     pick three
>     [...]
> 
> 2. Before executing the `label` command, a split index is written, as
>     well as the shared index.
> 
> 3. The `reset initial` command in the rebase script writes out a new
>     split index but skips writing the shared index, as intended.
> 
> 4. The `pick two` command updates the worktree and refreshes the index,
>     marking the `two.t` entry as valid via the FSMonitor, which sets the
>     `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
>     `base_oid` attribute to be zeroed out and a full (non-split) index
>     to be written (making sure _not_ to write the "link" extension).
> 
> 5. Now, the `reset two` command will leave the worktree alone, but
>     still write out a new split index, not writing the shared index
>     (because `base_oid` is still zeroed out, and there is no index entry
>     update requiring it to be written, either).
> 
> 6. When it is turn to run `pick three`, the index is read, but it is
>     too short: It only contains a single entry when there should be two,
>     because the "link" extension is missing from the written-out index
>     file.
> 
> There are three bugs at play, actually, which will be fixed over the
> course of the next commits:
> 
> - The `base_oid` attribute should not be zeroed out to indicate when
>    the "link" extension should not be written, as it puts the in-memory
>    index structure into an inconsistent state.
> 
> - The FSMonitor should not overwrite bits in `cache_changed`.
> 
> - The `unpack_trees()` function tries to reuse the `split_index`
>    structure from the source index, if any, but does not propagate the
>    `SPLIT_INDEX_ORDERED` flag.
> 
> While a fix for the second bug would let this test case pass, there are
> other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
> the bug that most crucially needs to be fixed is the first one.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Very well said.  Thank you!!!

> ---
>   t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 37 insertions(+)
> 
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index d419085379c..cbafdd69602 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
>   	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
>   '
>   
> +test_expect_failure 'split-index and FSMonitor work well together' '
> +	git init split-index &&
> +	test_when_finished "git -C \"$PWD/split-index\" \
> +		fsmonitor--daemon stop" &&
> +	(
> +		cd split-index &&
> +		git config core.splitIndex true &&
> +		# force split-index in most cases
> +		git config splitIndex.maxPercentChange 99 &&
> +		git config core.fsmonitor true &&
> +
> +		# Create the following commit topology:
> +		#
> +		# *   merge three
> +		# |\
> +		# | * three
> +		# * | merge two
> +		# |\|
> +		# | * two
> +		# * | one
> +		# |/
> +		# * 5a5efd7 initial
> +
> +		test_commit initial &&
> +		test_commit two &&
> +		test_commit three &&
> +		git reset --hard initial &&
> +		test_commit one &&
> +		test_tick &&
> +		git merge two &&
> +		test_tick &&
> +		git merge three &&
> +
> +		git rebase --force-rebase -r one
> +	)
> +'
> +
>   test_done

  reply	other threads:[~2023-03-23 13:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-23 13:39   ` Jeff Hostetler [this message]
2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-22 21:24   ` Junio C Hamano
2023-03-23 13:59     ` Jeff Hostetler
2023-03-23 16:06       ` Junio C Hamano
2023-03-23 15:22   ` Jeff Hostetler
2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-23 14:32   ` Jeff Hostetler
2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-27 14:48   ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
2023-03-27 16:42     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80be2220-ae7b-be65-2ab0-787e52169250@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).