git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Fix a few split-index bugs
@ 2023-03-22 16:00 Johannes Schindelin via GitGitGadget
  2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-22 16:00 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin

I received an internal bug report that after upgrading from v2.39.2 to
v2.40.0, some users ran into the following error message:

BUG: fsmonitor.c:21: fsmonitor_dirty has more entries than the index (57 > 0)


It sounds very much like the report we received in
https://lore.kernel.org/git/CAC7ZvybvykKQyMWcZoKXxFDu_amnkxZCDq2C6KHoyhmHN2tcKw@mail.gmail.com/,
but sadly that thread petered out when the reporter stopped being able to
reproduce the problem.

After a few days of investigating, I am convinced that this is due to some
old bugs, and not actually a regression in v2.40.0 (although I can believe
that some improvements in v2.40.0 might make it easier to run into these
bugs).

This patch series addresses those bugs.

Note: While the Git maintainer has stated a strong preference to introduce
regression tests in the same patch that fixes the corresponding regression,
this patch series starts with a stand-alone patch that demonstrates a
problematic scenario via a new test_expect_failure test case. The reason why
I specifically split out the test into its own commit is that there is a lot
of information to unpack in the commit message that is larger than any of
the subsequent bug fixes. Besides, it motivates not only the second patch
(which marks the test case as test_expect_success) but paints the larger
picture necessary to understand also the need for the remaining two patches.

This patch series is based on maint-2.37, the oldest maintenance branch it
applies without merge conflicts. When merging with next, there are only
trivial conflicts in unpack-trees.c due to en/dir-api-cleanup where
o->result is now o->internal.result.

Johannes Schindelin (4):
  split-index & fsmonitor: demonstrate a bug
  split-index; stop abusing the `base_oid` to strip the "link" extension
  fsmonitor: avoid overriding `cache_changed` bits
  unpack-trees: take care to propagate the split-index flag

 fsmonitor.h                  |  2 +-
 read-cache.c                 | 37 ++++++++++++++++++++++--------------
 t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
 unpack-trees.c               |  2 ++
 4 files changed, 63 insertions(+), 15 deletions(-)


base-commit: eb88fe1ff5ceb34845f0919b8bdc60d8a1703cf6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1497%2Fdscho%2Ffix-split-index-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1497/dscho/fix-split-index-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1497
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] split-index & fsmonitor: demonstrate a bug
  2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
@ 2023-03-22 16:00 ` Johannes Schindelin via GitGitGadget
  2023-03-23 13:39   ` Jeff Hostetler
  2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-22 16:00 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This commit adds a new test case that demonstrates a bug in the
split-index code that is triggered under certain circumstances when the
FSMonitor is enabled, and its symptom manifests in the form of one of
the following error messages:

    BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)

    BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index

    error: invalid path ''
    error: The following untracked working tree files would be overwritten by reset:
            initial.t

Which of these error messages appears depends on timing-dependent
conditions.

Technically the root cause lies with a bug in the split-index code that
has nothing to do with FSMonitor, but for the sake of this new test case
it was the easiest way to trigger the bug.

The bug is this: Under specific conditions, Git needs to skip writing
the "link" extension (which is the index extension containing the
information pertaining to the split-index). To do that, the `base_oid`
attribute of the `split_index` structure in the in-memory index is
zeroed out, and `do_write_index()` specifically checks for a "null"
`base_oid` to understand that the "link" extension should not be
written. However, this violates the consistency of the in-memory index
structure, but that does not cause problems in most cases because the
process exits without using the in-memory index structure anymore,
anyway.

But: _When_ the in-memory index is still used (which is the case e.g. in
`git rebase`), subsequent writes of `the_index` are at risk of writing
out a bogus index file, one that _should_ have a "link" extension but
does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
set for subsequent writes, forcing the shared index to be written, which
re-initializes `base_oid` to a non-bogus state, and all is good.

When it is _not_ set, however, all kinds of mayhem ensue, resulting in
above-mentioned error messages, and often enough putting worktrees in a
totally broken state where the only recourse is to manually delete the
`index` and the `index.lock` files and then call `git reset` manually.
Not something to ask users to do.

The reason why it is comparatively easy to trigger the bug with
FSMonitor is that there is _another_ bug in the FSMonitor code:
`mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
variable as a Boolean. But it is a bit field, and 1 happens to be the
`SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
when writing the index, among other things.

"Comparatively easy" is a relative term in this context, for sure. The
essence of how the new test case triggers the bug is as following:

1. The `git rebase` invocation will first reset the worktree to
   a commit that contains only the `one.t` file, and then execute a
   rebase script that starts with the following commands (commit hashes
   skipped):

   label onto

   reset initial
   pick two
   label two

   reset two
   pick three
   [...]

2. Before executing the `label` command, a split index is written, as
   well as the shared index.

3. The `reset initial` command in the rebase script writes out a new
   split index but skips writing the shared index, as intended.

4. The `pick two` command updates the worktree and refreshes the index,
   marking the `two.t` entry as valid via the FSMonitor, which sets the
   `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
   `base_oid` attribute to be zeroed out and a full (non-split) index
   to be written (making sure _not_ to write the "link" extension).

5. Now, the `reset two` command will leave the worktree alone, but
   still write out a new split index, not writing the shared index
   (because `base_oid` is still zeroed out, and there is no index entry
   update requiring it to be written, either).

6. When it is turn to run `pick three`, the index is read, but it is
   too short: It only contains a single entry when there should be two,
   because the "link" extension is missing from the written-out index
   file.

There are three bugs at play, actually, which will be fixed over the
course of the next commits:

- The `base_oid` attribute should not be zeroed out to indicate when
  the "link" extension should not be written, as it puts the in-memory
  index structure into an inconsistent state.

- The FSMonitor should not overwrite bits in `cache_changed`.

- The `unpack_trees()` function tries to reuse the `split_index`
  structure from the source index, if any, but does not propagate the
  `SPLIT_INDEX_ORDERED` flag.

While a fix for the second bug would let this test case pass, there are
other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
the bug that most crucially needs to be fixed is the first one.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index d419085379c..cbafdd69602 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
 	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
 '
 
+test_expect_failure 'split-index and FSMonitor work well together' '
+	git init split-index &&
+	test_when_finished "git -C \"$PWD/split-index\" \
+		fsmonitor--daemon stop" &&
+	(
+		cd split-index &&
+		git config core.splitIndex true &&
+		# force split-index in most cases
+		git config splitIndex.maxPercentChange 99 &&
+		git config core.fsmonitor true &&
+
+		# Create the following commit topology:
+		#
+		# *   merge three
+		# |\
+		# | * three
+		# * | merge two
+		# |\|
+		# | * two
+		# * | one
+		# |/
+		# * 5a5efd7 initial
+
+		test_commit initial &&
+		test_commit two &&
+		test_commit three &&
+		git reset --hard initial &&
+		test_commit one &&
+		test_tick &&
+		git merge two &&
+		test_tick &&
+		git merge three &&
+
+		git rebase --force-rebase -r one
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
  2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
  2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
@ 2023-03-22 16:00 ` Johannes Schindelin via GitGitGadget
  2023-03-22 21:24   ` Junio C Hamano
  2023-03-23 15:22   ` Jeff Hostetler
  2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-22 16:00 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When a split-index is in effect, the `$GIT_DIR/index` file needs to
contain a "link" extension that contains all the information about the
split-index, including the information about the shared index.

However, in some cases Git needs to suppress writing that "link"
extension (i.e. to fall back to writing a full index) even if the
in-memory index structure _has_ a `split_index` configured. This is the
case e.g. when "too many not shared" index entries exist.

In such instances, the current code sets the `base_oid` field of said
`split_index` structure to all-zero to indicate that `do_write_index()`
should skip writing the "link" extension.

This can lead to problems later on, when the in-memory index is still
used to perform other operations and eventually wants to write a
split-index, detects the presence of the `split_index` and reuses that,
too (under the assumption that it has been initialized correctly and
still has a non-null `base_oid`).

Let's stop zeroing out the `base_oid` to indicate that the "link"
extension should not be written.

One might be tempted to simply call `discard_split_index()` instead,
under the assumption that Git decided to write a non-split index and
therefore the the `split_index` structure might no longer be wanted.
However, that is not possible because that would release index entries
in `split_index->base` that are likely to still be in use. Therefore we
cannot do that.

The next best thing we _can_ do is to introduce a flag, specifically
indicating when the "link" extension should be skipped. So that's what
we do here.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c                 | 37 ++++++++++++++++++++++--------------
 t/t7527-builtin-fsmonitor.sh |  2 +-
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index b09128b1884..8fcb2d54c05 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2868,6 +2868,12 @@ static int record_ieot(void)
 	return !git_config_get_index_threads(&val) && val != 1;
 }
 
+enum strip_extensions {
+	WRITE_ALL_EXTENSIONS = 0,
+	STRIP_ALL_EXTENSIONS = 1,
+	STRIP_LINK_EXTENSION_ONLY = 2
+};
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2876,7 +2882,7 @@ static int record_ieot(void)
  * rely on it.
  */
 static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
-			  int strip_extensions, unsigned flags)
+			  enum strip_extensions strip_extensions, unsigned flags)
 {
 	uint64_t start = getnanotime();
 	struct hashfile *f;
@@ -3045,7 +3051,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 			return -1;
 	}
 
-	if (!strip_extensions && istate->split_index &&
+	if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index &&
 	    !is_null_oid(&istate->split_index->base_oid)) {
 		struct strbuf sb = STRBUF_INIT;
 
@@ -3060,7 +3066,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
+	if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) {
 		struct strbuf sb = STRBUF_INIT;
 
 		cache_tree_write(&sb, istate->cache_tree);
@@ -3070,7 +3076,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && istate->resolve_undo) {
+	if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->resolve_undo) {
 		struct strbuf sb = STRBUF_INIT;
 
 		resolve_undo_write(&sb, istate->resolve_undo);
@@ -3081,7 +3087,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && istate->untracked) {
+	if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->untracked) {
 		struct strbuf sb = STRBUF_INIT;
 
 		write_untracked_extension(&sb, istate->untracked);
@@ -3092,7 +3098,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && istate->fsmonitor_last_update) {
+	if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->fsmonitor_last_update) {
 		struct strbuf sb = STRBUF_INIT;
 
 		write_fsmonitor_extension(&sb, istate);
@@ -3166,8 +3172,14 @@ static int commit_locked_index(struct lock_file *lk)
 		return commit_lock_file(lk);
 }
 
+/*
+ * Write the Git index into a `.lock` file
+ *
+ * If `strip_link_extension` is non-zero, avoid writing any "link" extension
+ * (used by the split-index feature).
+ */
 static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
-				 unsigned flags)
+				 unsigned flags, int strip_link_extension)
 {
 	int ret;
 	int was_full = istate->sparse_index == INDEX_EXPANDED;
@@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 	 */
 	trace2_region_enter_printf("index", "do_write_index", the_repository,
 				   "%s", get_lock_file_path(lock));
-	ret = do_write_index(istate, lock->tempfile, 0, flags);
+	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
 	trace2_region_leave_printf("index", "do_write_index", the_repository,
 				   "%s", get_lock_file_path(lock));
 
@@ -3214,7 +3226,7 @@ static int write_split_index(struct index_state *istate,
 {
 	int ret;
 	prepare_to_write_split_index(istate);
-	ret = do_write_locked_index(istate, lock, flags);
+	ret = do_write_locked_index(istate, lock, flags, 0);
 	finish_writing_split_index(istate);
 	return ret;
 }
@@ -3366,9 +3378,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	if ((!si && !test_split_index_env) ||
 	    alternate_index_output ||
 	    (istate->cache_changed & ~EXTMASK)) {
-		if (si)
-			oidclr(&si->base_oid);
-		ret = do_write_locked_index(istate, lock, flags);
+		ret = do_write_locked_index(istate, lock, flags, 1);
 		goto out;
 	}
 
@@ -3394,8 +3404,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		/* Same initial permissions as the main .git/index file */
 		temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
 		if (!temp) {
-			oidclr(&si->base_oid);
-			ret = do_write_locked_index(istate, lock, flags);
+			ret = do_write_locked_index(istate, lock, flags, 1);
 			goto out;
 		}
 		ret = write_shared_index(istate, &temp, flags);
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index cbafdd69602..9fab9a2ab38 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
 	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
 '
 
-test_expect_failure 'split-index and FSMonitor work well together' '
+test_expect_success 'split-index and FSMonitor work well together' '
 	git init split-index &&
 	test_when_finished "git -C \"$PWD/split-index\" \
 		fsmonitor--daemon stop" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits
  2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
  2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
  2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
@ 2023-03-22 16:00 ` Johannes Schindelin via GitGitGadget
  2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-22 16:00 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

As of e636a7b4d030 (read-cache: be specific what part of the index has
changed, 2014-06-13), the paradigm `cache_changed = 1` fell out of
fashion and it became a bit field instead.

This is important because some bits have specific meaning and should not
be unset without care, e.g. `SPLIT_INDEX_ORDERED`.

However, b5a816975206 (mark_fsmonitor_valid(): mark the index as changed
if needed, 2019-05-24) did use the `cache_changed` attribute as if it
were a Boolean instead of a bit field.

That not only would override the `SPLIT_INDEX_ORDERED` bit when marking
index entries as valid via the FSMonitor, but worse: it would set the
`SOMETHING_OTHER` bit (whose value is 1). This means that Git would
unnecessarily force a full index to be written out when a split index
was asked for.

Let's instead use the bit that is specifically intended to indicate
FSMonitor-triggered changes, allowing the split-index feature to work as
designed.

Noticed-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsmonitor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsmonitor.h b/fsmonitor.h
index edf7ce5203b..778707b131b 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -86,7 +86,7 @@ static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache
 	    !(ce->ce_flags & CE_FSMONITOR_VALID)) {
 		if (S_ISGITLINK(ce->ce_mode))
 			return;
-		istate->cache_changed = 1;
+		istate->cache_changed |= FSMONITOR_CHANGED;
 		ce->ce_flags |= CE_FSMONITOR_VALID;
 		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] unpack-trees: take care to propagate the split-index flag
  2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
@ 2023-03-22 16:00 ` Johannes Schindelin via GitGitGadget
  2023-03-23 14:32   ` Jeff Hostetler
  2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  5 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-22 16:00 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When copying the `split_index` structure from one index structure to
another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it
is set, otherwise Git might forget to write the shared index when that
is actually needed.

It just so _happens_ that in many instances when `unpack_trees()` is
called, the result causes the shared index to be written anyway, but
there are edge cases when that is not so.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 unpack-trees.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 90b92114be8..ca5e47c77c0 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1916,6 +1916,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		 * create a new one.
 		 */
 		o->result.split_index = o->src_index->split_index;
+		if (o->src_index->cache_changed & SPLIT_INDEX_ORDERED)
+			o->result.cache_changed |= SPLIT_INDEX_ORDERED;
 		o->result.split_index->refcount++;
 	} else {
 		o->result.split_index = init_split_index(&o->result);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] Fix a few split-index bugs
  2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
@ 2023-03-22 16:22 ` Junio C Hamano
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  5 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2023-03-22 16:22 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jeff Hostetler, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Note: While the Git maintainer has stated a strong preference to introduce
> regression tests in the same patch that fixes the corresponding regression,
> this patch series starts with a stand-alone patch that demonstrates a
> problematic scenario via a new test_expect_failure test case.

It is fine, especially to show existing/old bugs that need extensive
explanation.

> This patch series is based on maint-2.37, the oldest maintenance branch it
> applies without merge conflicts. When merging with next, there are only
> trivial conflicts in unpack-trees.c due to en/dir-api-cleanup where
> o->result is now o->internal.result.

Thanks for digging into old and important case.  Maintenance of the
index data structure is a crucial part of the health of the system.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
  2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
@ 2023-03-22 21:24   ` Junio C Hamano
  2023-03-23 13:59     ` Jeff Hostetler
  2023-03-23 15:22   ` Jeff Hostetler
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2023-03-22 21:24 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jeff Hostetler, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When a split-index is in effect, the `$GIT_DIR/index` file needs to
> contain a "link" extension that contains all the information about the
> split-index, including the information about the shared index.
> ...
> Let's stop zeroing out the `base_oid` to indicate that the "link"
> extension should not be written.

Nicely explained.

> One might be tempted to simply call `discard_split_index()` instead,
> under the assumption that Git decided to write a non-split index and
> therefore the the `split_index` structure might no longer be wanted.

"the the".

> +enum strip_extensions {
> +	WRITE_ALL_EXTENSIONS = 0,
> +	STRIP_ALL_EXTENSIONS = 1,
> +	STRIP_LINK_EXTENSION_ONLY = 2
> +};

We do not need to spell out the specific values for this enum; the
users' (i.e. the callers of do_write_index()) sole requirement is
for these symbols to have different values.

Also do we envision that (1) we would need to keep STRIP_LINK_ONLY
to be with the largest value among the enum values, or (2) we would
never add new value to the set?  Otherwise let's end the last one
with a trailing comma.

Looking at the way strip_extensions variable is used in
do_write_index(), an alternative design might be to make it a set of
bits (e.g. unsigned write_extension) and give one bit to each
extension.  But such a clean-up is better left outside the topic, I
would imagine, as we do not have any need to skip an arbitrary set
of extensions right now.

> +/*
> + * Write the Git index into a `.lock` file
> + *
> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
> + * (used by the split-index feature).
> + */

Not exposing "enum strip_extensions" to the caller of this function,
like this patch does, is probably a very safe and sensible thing to
do.  We do not have a reason to allow its callers to (perhaps
mistakenly) pass STRIP_ALL_EXTENSIONS to it.

>  static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
> -				 unsigned flags)
> +				 unsigned flags, int strip_link_extension)
>  {
>  	int ret;
>  	int was_full = istate->sparse_index == INDEX_EXPANDED;
> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>  	 */
>  	trace2_region_enter_printf("index", "do_write_index", the_repository,
>  				   "%s", get_lock_file_path(lock));
> -	ret = do_write_index(istate, lock->tempfile, 0, flags);
> +	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
>  	trace2_region_leave_printf("index", "do_write_index", the_repository,
>  				   "%s", get_lock_file_path(lock));
>  

OK.

Very nicely done.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] split-index & fsmonitor: demonstrate a bug
  2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
@ 2023-03-23 13:39   ` Jeff Hostetler
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2023-03-23 13:39 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin



On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> This commit adds a new test case that demonstrates a bug in the
> split-index code that is triggered under certain circumstances when the
> FSMonitor is enabled, and its symptom manifests in the form of one of
> the following error messages:
> 
>      BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)
> 
>      BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index
> 
>      error: invalid path ''
>      error: The following untracked working tree files would be overwritten by reset:
>              initial.t
> 
> Which of these error messages appears depends on timing-dependent
> conditions.
> 
> Technically the root cause lies with a bug in the split-index code that
> has nothing to do with FSMonitor, but for the sake of this new test case
> it was the easiest way to trigger the bug.
> 
> The bug is this: Under specific conditions, Git needs to skip writing
> the "link" extension (which is the index extension containing the
> information pertaining to the split-index). To do that, the `base_oid`
> attribute of the `split_index` structure in the in-memory index is
> zeroed out, and `do_write_index()` specifically checks for a "null"
> `base_oid` to understand that the "link" extension should not be
> written. However, this violates the consistency of the in-memory index
> structure, but that does not cause problems in most cases because the
> process exits without using the in-memory index structure anymore,
> anyway.
> 
> But: _When_ the in-memory index is still used (which is the case e.g. in
> `git rebase`), subsequent writes of `the_index` are at risk of writing
> out a bogus index file, one that _should_ have a "link" extension but
> does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
> set for subsequent writes, forcing the shared index to be written, which
> re-initializes `base_oid` to a non-bogus state, and all is good.
> 
> When it is _not_ set, however, all kinds of mayhem ensue, resulting in
> above-mentioned error messages, and often enough putting worktrees in a
> totally broken state where the only recourse is to manually delete the
> `index` and the `index.lock` files and then call `git reset` manually.
> Not something to ask users to do.
> 
> The reason why it is comparatively easy to trigger the bug with
> FSMonitor is that there is _another_ bug in the FSMonitor code:
> `mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
> variable as a Boolean. But it is a bit field, and 1 happens to be the
> `SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
> when writing the index, among other things.
> 
> "Comparatively easy" is a relative term in this context, for sure. The
> essence of how the new test case triggers the bug is as following:
> 
> 1. The `git rebase` invocation will first reset the worktree to
>     a commit that contains only the `one.t` file, and then execute a
>     rebase script that starts with the following commands (commit hashes
>     skipped):
> 
>     label onto
> 
>     reset initial
>     pick two
>     label two
> 
>     reset two
>     pick three
>     [...]
> 
> 2. Before executing the `label` command, a split index is written, as
>     well as the shared index.
> 
> 3. The `reset initial` command in the rebase script writes out a new
>     split index but skips writing the shared index, as intended.
> 
> 4. The `pick two` command updates the worktree and refreshes the index,
>     marking the `two.t` entry as valid via the FSMonitor, which sets the
>     `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
>     `base_oid` attribute to be zeroed out and a full (non-split) index
>     to be written (making sure _not_ to write the "link" extension).
> 
> 5. Now, the `reset two` command will leave the worktree alone, but
>     still write out a new split index, not writing the shared index
>     (because `base_oid` is still zeroed out, and there is no index entry
>     update requiring it to be written, either).
> 
> 6. When it is turn to run `pick three`, the index is read, but it is
>     too short: It only contains a single entry when there should be two,
>     because the "link" extension is missing from the written-out index
>     file.
> 
> There are three bugs at play, actually, which will be fixed over the
> course of the next commits:
> 
> - The `base_oid` attribute should not be zeroed out to indicate when
>    the "link" extension should not be written, as it puts the in-memory
>    index structure into an inconsistent state.
> 
> - The FSMonitor should not overwrite bits in `cache_changed`.
> 
> - The `unpack_trees()` function tries to reuse the `split_index`
>    structure from the source index, if any, but does not propagate the
>    `SPLIT_INDEX_ORDERED` flag.
> 
> While a fix for the second bug would let this test case pass, there are
> other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
> the bug that most crucially needs to be fixed is the first one.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

Very well said.  Thank you!!!

> ---
>   t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 37 insertions(+)
> 
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index d419085379c..cbafdd69602 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
>   	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
>   '
>   
> +test_expect_failure 'split-index and FSMonitor work well together' '
> +	git init split-index &&
> +	test_when_finished "git -C \"$PWD/split-index\" \
> +		fsmonitor--daemon stop" &&
> +	(
> +		cd split-index &&
> +		git config core.splitIndex true &&
> +		# force split-index in most cases
> +		git config splitIndex.maxPercentChange 99 &&
> +		git config core.fsmonitor true &&
> +
> +		# Create the following commit topology:
> +		#
> +		# *   merge three
> +		# |\
> +		# | * three
> +		# * | merge two
> +		# |\|
> +		# | * two
> +		# * | one
> +		# |/
> +		# * 5a5efd7 initial
> +
> +		test_commit initial &&
> +		test_commit two &&
> +		test_commit three &&
> +		git reset --hard initial &&
> +		test_commit one &&
> +		test_tick &&
> +		git merge two &&
> +		test_tick &&
> +		git merge three &&
> +
> +		git rebase --force-rebase -r one
> +	)
> +'
> +
>   test_done

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
  2023-03-22 21:24   ` Junio C Hamano
@ 2023-03-23 13:59     ` Jeff Hostetler
  2023-03-23 16:06       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2023-03-23 13:59 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin via GitGitGadget
  Cc: git, Johannes Schindelin



On 3/22/23 5:24 PM, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> When a split-index is in effect, the `$GIT_DIR/index` file needs to
>> contain a "link" extension that contains all the information about the
>> split-index, including the information about the shared index.
>> ...
>> Let's stop zeroing out the `base_oid` to indicate that the "link"
>> extension should not be written.
> 
> Nicely explained.
> 
>> One might be tempted to simply call `discard_split_index()` instead,
>> under the assumption that Git decided to write a non-split index and
>> therefore the the `split_index` structure might no longer be wanted.
> 
> "the the".
> 
>> +enum strip_extensions {
>> +	WRITE_ALL_EXTENSIONS = 0,
>> +	STRIP_ALL_EXTENSIONS = 1,
>> +	STRIP_LINK_EXTENSION_ONLY = 2
>> +};
> 
> We do not need to spell out the specific values for this enum; the
> users' (i.e. the callers of do_write_index()) sole requirement is
> for these symbols to have different values.

There are several calls to do_write_locked_index() that pass 0 or 1
as the new final arg.  If we update them to use these enum values,
then we don't need integer values here.

> 
> Also do we envision that (1) we would need to keep STRIP_LINK_ONLY
> to be with the largest value among the enum values, or (2) we would
> never add new value to the set?  Otherwise let's end the last one
> with a trailing comma.
> 
> Looking at the way strip_extensions variable is used in
> do_write_index(), an alternative design might be to make it a set of
> bits (e.g. unsigned write_extension) and give one bit to each
> extension.  But such a clean-up is better left outside the topic, I
> would imagine, as we do not have any need to skip an arbitrary set
> of extensions right now.

Agreed, I thought about suggesting a set of bits too, but right now
we only need to strip all of them or just this one.

> 
>> +/*
>> + * Write the Git index into a `.lock` file
>> + *
>> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
>> + * (used by the split-index feature).
>> + */
> 
> Not exposing "enum strip_extensions" to the caller of this function,
> like this patch does, is probably a very safe and sensible thing to
> do.  We do not have a reason to allow its callers to (perhaps
> mistakenly) pass STRIP_ALL_EXTENSIONS to it.
> 
>>   static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
>> -				 unsigned flags)
>> +				 unsigned flags, int strip_link_extension)
>>   {
>>   	int ret;
>>   	int was_full = istate->sparse_index == INDEX_EXPANDED;
>> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>>   	 */
>>   	trace2_region_enter_printf("index", "do_write_index", the_repository,
>>   				   "%s", get_lock_file_path(lock));
>> -	ret = do_write_index(istate, lock->tempfile, 0, flags);
>> +	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);

In the else of the ?: operator, could we use the WRITE_ALL_EXTENSIONS
instead of 0?

>>   	trace2_region_leave_printf("index", "do_write_index", the_repository,
>>   				   "%s", get_lock_file_path(lock));
>>   
> 
> OK.
> 
> Very nicely done.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] unpack-trees: take care to propagate the split-index flag
  2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
@ 2023-03-23 14:32   ` Jeff Hostetler
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2023-03-23 14:32 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin



On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> When copying the `split_index` structure from one index structure to
> another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it
> is set, otherwise Git might forget to write the shared index when that
> is actually needed.
> 
> It just so _happens_ that in many instances when `unpack_trees()` is
> called, the result causes the shared index to be written anyway, but
> there are edge cases when that is not so.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   unpack-trees.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 90b92114be8..ca5e47c77c0 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1916,6 +1916,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   		 * create a new one.
>   		 */
>   		o->result.split_index = o->src_index->split_index;
> +		if (o->src_index->cache_changed & SPLIT_INDEX_ORDERED)
> +			o->result.cache_changed |= SPLIT_INDEX_ORDERED;

Nice find!

>   		o->result.split_index->refcount++;
>   	} else {
>   		o->result.split_index = init_split_index(&o->result);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
  2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
  2023-03-22 21:24   ` Junio C Hamano
@ 2023-03-23 15:22   ` Jeff Hostetler
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2023-03-23 15:22 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin



On 3/22/23 12:00 PM, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> When a split-index is in effect, the `$GIT_DIR/index` file needs to
> contain a "link" extension that contains all the information about the
> split-index, including the information about the shared index.
> 
> However, in some cases Git needs to suppress writing that "link"
> extension (i.e. to fall back to writing a full index) even if the
> in-memory index structure _has_ a `split_index` configured. This is the
> case e.g. when "too many not shared" index entries exist.
> 
> In such instances, the current code sets the `base_oid` field of said
> `split_index` structure to all-zero to indicate that `do_write_index()`
> should skip writing the "link" extension.
> 
> This can lead to problems later on, when the in-memory index is still
> used to perform other operations and eventually wants to write a
> split-index, detects the presence of the `split_index` and reuses that,
> too (under the assumption that it has been initialized correctly and
> still has a non-null `base_oid`).
> 
> Let's stop zeroing out the `base_oid` to indicate that the "link"
> extension should not be written.
> 
> One might be tempted to simply call `discard_split_index()` instead,
> under the assumption that Git decided to write a non-split index and
> therefore the the `split_index` structure might no longer be wanted.
> However, that is not possible because that would release index entries
> in `split_index->base` that are likely to still be in use. Therefore we
> cannot do that.
> 
> The next best thing we _can_ do is to introduce a flag, specifically
> indicating when the "link" extension should be skipped. So that's what
> we do here.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   read-cache.c                 | 37 ++++++++++++++++++++++--------------
>   t/t7527-builtin-fsmonitor.sh |  2 +-
>   2 files changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index b09128b1884..8fcb2d54c05 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2868,6 +2868,12 @@ static int record_ieot(void)
>   	return !git_config_get_index_threads(&val) && val != 1;
>   }
>   
> +enum strip_extensions {
> +	WRITE_ALL_EXTENSIONS = 0,
> +	STRIP_ALL_EXTENSIONS = 1,
> +	STRIP_LINK_EXTENSION_ONLY = 2
> +};

Earlier (in a response to Junio's response on this commit) I said
that I didn't think we needed to make a bit set here, but I want
to re-think that or at least walk thru the change and talk out loud.

I'll explain in-line below.

> +
>   /*
>    * On success, `tempfile` is closed. If it is the temporary file
>    * of a `struct lock_file`, we will therefore effectively perform
> @@ -2876,7 +2882,7 @@ static int record_ieot(void)
>    * rely on it.
>    */
>   static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
> -			  int strip_extensions, unsigned flags)
> +			  enum strip_extensions strip_extensions, unsigned flags)
>   {
>   	uint64_t start = getnanotime();
>   	struct hashfile *f;
> @@ -3045,7 +3051,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>   			return -1;
>   	}
>   
> -	if (!strip_extensions && istate->split_index &&
> +	if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index &&
>   	    !is_null_oid(&istate->split_index->base_oid)) {

(I hate all of this double negative logic...)

Here we only want the extension if we have WRITE_ALL, so that is
NOT STRIP_ALL and NOT STRIP_LINK_ONLY, so that is OK.

>   		struct strbuf sb = STRBUF_INIT;
>   
> @@ -3060,7 +3066,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>   		if (err)
>   			return -1;
>   	}
> -	if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
> +	if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) {

Here we only want the extension when NOT STRIP_ALL, so this is
either WRITE_ALL or STRIP_LINK_ONLY, so this is OK.  The rest are
the same, so I'll omit them.

[...]

All of this looks correct, but I stumbled over things on my first
or second reading.  I wonder if it would it simplify things to define
this as:

enum strip_extensions {
	WRITE_ALL_EXTENSIONS   = 0,
	STRIP_LINK_EXTENSION   = (1<0),
	STRIP_OTHER_EXTENSIONS = (1<1),
	STRIP_ALL_EXTENSIONS   = (STRIP_LINK_EXTENSION
				 | STRIP_OTHER_EXTENSIONS),
};

Then the link test becomes:
	if ( ! (strip_extensions & STRIP_LINK_EXTENSION) &&
	    istate->split_index &&
	    ...) {

and the others become:

	if ( ! (strip_extensions & STRIP_OTHER_EXTENSIONS) &&
	    ...) {

If we need to add the ability later to strip an individual,
we can easily add a bit to the enum and update the _ALL_ mask
and the corresponding `if` test.

In a later commit (probably in another series), we can invert
these double negatives to improve readability.

> +/*
> + * Write the Git index into a `.lock` file
> + *
> + * If `strip_link_extension` is non-zero, avoid writing any "link" extension
> + * (used by the split-index feature).
> + */
>   static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
> -				 unsigned flags)
> +				 unsigned flags, int strip_link_extension)
>   {
>   	int ret;
>   	int was_full = istate->sparse_index == INDEX_EXPANDED;
> @@ -3185,7 +3197,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>   	 */
>   	trace2_region_enter_printf("index", "do_write_index", the_repository,
>   				   "%s", get_lock_file_path(lock));
> -	ret = do_write_index(istate, lock->tempfile, 0, flags);
> +	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
>   	trace2_region_leave_printf("index", "do_write_index", the_repository,
>   				   "%s", get_lock_file_path(lock));
>   
> @@ -3214,7 +3226,7 @@ static int write_split_index(struct index_state *istate,
>   {
>   	int ret;
>   	prepare_to_write_split_index(istate);
> -	ret = do_write_locked_index(istate, lock, flags);
> +	ret = do_write_locked_index(istate, lock, flags, 0);

could we use the enum values here instead of 0 ?

>   	finish_writing_split_index(istate);
>   	return ret;
>   }
> @@ -3366,9 +3378,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
>   	if ((!si && !test_split_index_env) ||
>   	    alternate_index_output ||
>   	    (istate->cache_changed & ~EXTMASK)) {
> -		if (si)
> -			oidclr(&si->base_oid);
> -		ret = do_write_locked_index(istate, lock, flags);
> +		ret = do_write_locked_index(istate, lock, flags, 1);

and here

>   		goto out;
>   	}
>   
> @@ -3394,8 +3404,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
>   		/* Same initial permissions as the main .git/index file */
>   		temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
>   		if (!temp) {
> -			oidclr(&si->base_oid);
> -			ret = do_write_locked_index(istate, lock, flags);
> +			ret = do_write_locked_index(istate, lock, flags, 1);

and here

>   			goto out;
>   		}
>   		ret = write_shared_index(istate, &temp, flags);
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index cbafdd69602..9fab9a2ab38 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
>   	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
>   '
>   
> -test_expect_failure 'split-index and FSMonitor work well together' '
> +test_expect_success 'split-index and FSMonitor work well together' '
>   	git init split-index &&
>   	test_when_finished "git -C \"$PWD/split-index\" \
>   		fsmonitor--daemon stop" &&

Thanks
Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
  2023-03-23 13:59     ` Jeff Hostetler
@ 2023-03-23 16:06       ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2023-03-23 16:06 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff Hostetler <git@jeffhostetler.com> writes:

>>> +enum strip_extensions {
>>> +	WRITE_ALL_EXTENSIONS = 0,
>>> +	STRIP_ALL_EXTENSIONS = 1,
>>> +	STRIP_LINK_EXTENSION_ONLY = 2
>>> +};
>> We do not need to spell out the specific values for this enum; the
>> users' (i.e. the callers of do_write_index()) sole requirement is
>> for these symbols to have different values.
>
> There are several calls to do_write_locked_index() that pass 0 or 1
> as the new final arg.  If we update them to use these enum values,
> then we don't need integer values here.

Good eyes.  Yes, the new caller that selectively passes
STRIP_LINK_EXTENSION_ONLY should pass WRITE_ALL_EXTENSIONS, not 0,
on the other side of ?: as you pointed out.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 0/4] Fix a few split-index bugs
  2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
@ 2023-03-26 22:45 ` Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
                     ` (4 more replies)
  5 siblings, 5 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-26 22:45 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin

I received an internal bug report that after upgrading from v2.39.2 to
v2.40.0, some users ran into the following error message:

BUG: fsmonitor.c:21: fsmonitor_dirty has more entries than the index (57 > 0)


It sounds very much like the report we received in
https://lore.kernel.org/git/CAC7ZvybvykKQyMWcZoKXxFDu_amnkxZCDq2C6KHoyhmHN2tcKw@mail.gmail.com/,
but sadly that thread petered out when the reporter stopped being able to
reproduce the problem.

After a few days of investigating, I am convinced that this is due to some
old bugs, and not actually a regression in v2.40.0 (although I can believe
that some improvements in v2.40.0 might make it easier to run into these
bugs).

This patch series addresses those bugs.

Note: While the Git maintainer has stated a strong preference to introduce
regression tests in the same patch that fixes the corresponding regression,
this patch series starts with a stand-alone patch that demonstrates a
problematic scenario via a new test_expect_failure test case. The reason why
I specifically split out the test into its own commit is that there is a lot
of information to unpack in the commit message that is larger than any of
the subsequent bug fixes. Besides, it motivates not only the second patch
(which marks the test case as test_expect_success) but paints the larger
picture necessary to understand also the need for the remaining two patches.

This patch series is based on maint-2.37, the oldest maintenance branch it
applies without merge conflicts. When merging with next, there are only
trivial conflicts in unpack-trees.c due to en/dir-api-cleanup where
o->result is now o->internal.result.

Changes since v1:

 * Fix a double "the" in a commit message
 * Replace enum strip_extensions by the bit field enum write_extensions,
   inverting the meaning of the values to avoid double negatives
 * Leave a trailing comma at the definition of the enum values

Johannes Schindelin (4):
  split-index & fsmonitor: demonstrate a bug
  split-index; stop abusing the `base_oid` to strip the "link" extension
  fsmonitor: avoid overriding `cache_changed` bits
  unpack-trees: take care to propagate the split-index flag

 fsmonitor.h                  |  2 +-
 read-cache.c                 | 49 +++++++++++++++++++++++-------------
 t/t7527-builtin-fsmonitor.sh | 37 +++++++++++++++++++++++++++
 unpack-trees.c               |  2 ++
 4 files changed, 72 insertions(+), 18 deletions(-)


base-commit: eb88fe1ff5ceb34845f0919b8bdc60d8a1703cf6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1497%2Fdscho%2Ffix-split-index-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1497/dscho/fix-split-index-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1497

Range-diff vs v1:

 1:  c025fccbdde = 1:  c025fccbdde split-index & fsmonitor: demonstrate a bug
 2:  f1897b88072 ! 2:  8cc075f6325 split-index; stop abusing the `base_oid` to strip the "link" extension
     @@ Commit message
      
          One might be tempted to simply call `discard_split_index()` instead,
          under the assumption that Git decided to write a non-split index and
     -    therefore the the `split_index` structure might no longer be wanted.
     +    therefore the `split_index` structure might no longer be wanted.
          However, that is not possible because that would release index entries
          in `split_index->base` that are likely to still be in use. Therefore we
          cannot do that.
      
     -    The next best thing we _can_ do is to introduce a flag, specifically
     -    indicating when the "link" extension should be skipped. So that's what
     -    we do here.
     +    The next best thing we _can_ do is to introduce a bit field to indicate
     +    specifically which index extensions (not) to write. So that's what we do
     +    here.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
     @@ read-cache.c: static int record_ieot(void)
       	return !git_config_get_index_threads(&val) && val != 1;
       }
       
     -+enum strip_extensions {
     -+	WRITE_ALL_EXTENSIONS = 0,
     -+	STRIP_ALL_EXTENSIONS = 1,
     -+	STRIP_LINK_EXTENSION_ONLY = 2
     ++enum write_extensions {
     ++	WRITE_NO_EXTENSION =              0,
     ++	WRITE_SPLIT_INDEX_EXTENSION =     1<<0,
     ++	WRITE_CACHE_TREE_EXTENSION =      1<<1,
     ++	WRITE_RESOLVE_UNDO_EXTENSION =    1<<2,
     ++	WRITE_UNTRACKED_CACHE_EXTENSION = 1<<3,
     ++	WRITE_FSMONITOR_EXTENSION =       1<<4,
      +};
     ++#define WRITE_ALL_EXTENSIONS ((enum write_extensions)-1)
      +
       /*
        * On success, `tempfile` is closed. If it is the temporary file
     @@ read-cache.c: static int record_ieot(void)
        */
       static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
      -			  int strip_extensions, unsigned flags)
     -+			  enum strip_extensions strip_extensions, unsigned flags)
     ++			  enum write_extensions write_extensions, unsigned flags)
       {
       	uint64_t start = getnanotime();
       	struct hashfile *f;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       	}
       
      -	if (!strip_extensions && istate->split_index &&
     -+	if (strip_extensions == WRITE_ALL_EXTENSIONS && istate->split_index &&
     - 	    !is_null_oid(&istate->split_index->base_oid)) {
     +-	    !is_null_oid(&istate->split_index->base_oid)) {
     ++	if (write_extensions & WRITE_SPLIT_INDEX_EXTENSION &&
     ++	    istate->split_index) {
       		struct strbuf sb = STRBUF_INIT;
       
     + 		if (istate->sparse_index)
      @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
       		if (err)
       			return -1;
       	}
      -	if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
     -+	if (strip_extensions != STRIP_ALL_EXTENSIONS && !drop_cache_tree && istate->cache_tree) {
     ++	if (write_extensions & WRITE_CACHE_TREE_EXTENSION &&
     ++	    !drop_cache_tree && istate->cache_tree) {
       		struct strbuf sb = STRBUF_INIT;
       
       		cache_tree_write(&sb, istate->cache_tree);
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       			return -1;
       	}
      -	if (!strip_extensions && istate->resolve_undo) {
     -+	if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->resolve_undo) {
     ++	if (write_extensions & WRITE_RESOLVE_UNDO_EXTENSION &&
     ++	    istate->resolve_undo) {
       		struct strbuf sb = STRBUF_INIT;
       
       		resolve_undo_write(&sb, istate->resolve_undo);
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       			return -1;
       	}
      -	if (!strip_extensions && istate->untracked) {
     -+	if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->untracked) {
     ++	if (write_extensions & WRITE_UNTRACKED_CACHE_EXTENSION &&
     ++	    istate->untracked) {
       		struct strbuf sb = STRBUF_INIT;
       
       		write_untracked_extension(&sb, istate->untracked);
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       			return -1;
       	}
      -	if (!strip_extensions && istate->fsmonitor_last_update) {
     -+	if (strip_extensions != STRIP_ALL_EXTENSIONS && istate->fsmonitor_last_update) {
     ++	if (write_extensions & WRITE_FSMONITOR_EXTENSION &&
     ++	    istate->fsmonitor_last_update) {
       		struct strbuf sb = STRBUF_INIT;
       
       		write_fsmonitor_extension(&sb, istate);
     @@ read-cache.c: static int commit_locked_index(struct lock_file *lk)
       		return commit_lock_file(lk);
       }
       
     -+/*
     -+ * Write the Git index into a `.lock` file
     -+ *
     -+ * If `strip_link_extension` is non-zero, avoid writing any "link" extension
     -+ * (used by the split-index feature).
     -+ */
     - static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
     +-static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
      -				 unsigned flags)
     -+				 unsigned flags, int strip_link_extension)
     ++static int do_write_locked_index(struct index_state *istate,
     ++				 struct lock_file *lock,
     ++				 unsigned flags,
     ++				 enum write_extensions write_extensions)
       {
       	int ret;
       	int was_full = istate->sparse_index == INDEX_EXPANDED;
     @@ read-cache.c: static int do_write_locked_index(struct index_state *istate, struc
       	trace2_region_enter_printf("index", "do_write_index", the_repository,
       				   "%s", get_lock_file_path(lock));
      -	ret = do_write_index(istate, lock->tempfile, 0, flags);
     -+	ret = do_write_index(istate, lock->tempfile, strip_link_extension ? STRIP_LINK_EXTENSION_ONLY : 0, flags);
     ++	ret = do_write_index(istate, lock->tempfile, write_extensions, flags);
       	trace2_region_leave_printf("index", "do_write_index", the_repository,
       				   "%s", get_lock_file_path(lock));
       
     @@ read-cache.c: static int write_split_index(struct index_state *istate,
       	int ret;
       	prepare_to_write_split_index(istate);
      -	ret = do_write_locked_index(istate, lock, flags);
     -+	ret = do_write_locked_index(istate, lock, flags, 0);
     ++	ret = do_write_locked_index(istate, lock, flags, WRITE_ALL_EXTENSIONS);
       	finish_writing_split_index(istate);
       	return ret;
       }
     +@@ read-cache.c: static int write_shared_index(struct index_state *istate,
     + 
     + 	trace2_region_enter_printf("index", "shared/do_write_index",
     + 				   the_repository, "%s", get_tempfile_path(*temp));
     +-	ret = do_write_index(si->base, *temp, 1, flags);
     ++	ret = do_write_index(si->base, *temp, WRITE_NO_EXTENSION, flags);
     + 	trace2_region_leave_printf("index", "shared/do_write_index",
     + 				   the_repository, "%s", get_tempfile_path(*temp));
     + 
      @@ read-cache.c: int write_locked_index(struct index_state *istate, struct lock_file *lock,
       	if ((!si && !test_split_index_env) ||
       	    alternate_index_output ||
     @@ read-cache.c: int write_locked_index(struct index_state *istate, struct lock_fil
      -		if (si)
      -			oidclr(&si->base_oid);
      -		ret = do_write_locked_index(istate, lock, flags);
     -+		ret = do_write_locked_index(istate, lock, flags, 1);
     ++		ret = do_write_locked_index(istate, lock, flags,
     ++					    ~WRITE_SPLIT_INDEX_EXTENSION);
       		goto out;
       	}
       
     @@ read-cache.c: int write_locked_index(struct index_state *istate, struct lock_fil
       		if (!temp) {
      -			oidclr(&si->base_oid);
      -			ret = do_write_locked_index(istate, lock, flags);
     -+			ret = do_write_locked_index(istate, lock, flags, 1);
     ++			ret = do_write_locked_index(istate, lock, flags,
     ++						    ~WRITE_SPLIT_INDEX_EXTENSION);
       			goto out;
       		}
       		ret = write_shared_index(istate, &temp, flags);
 3:  c1c35f0f026 = 3:  89b3cd9a668 fsmonitor: avoid overriding `cache_changed` bits
 4:  3963d3e5428 = 4:  df61146eaf5 unpack-trees: take care to propagate the split-index flag

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
@ 2023-03-26 22:45   ` Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-26 22:45 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This commit adds a new test case that demonstrates a bug in the
split-index code that is triggered under certain circumstances when the
FSMonitor is enabled, and its symptom manifests in the form of one of
the following error messages:

    BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)

    BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index

    error: invalid path ''
    error: The following untracked working tree files would be overwritten by reset:
            initial.t

Which of these error messages appears depends on timing-dependent
conditions.

Technically the root cause lies with a bug in the split-index code that
has nothing to do with FSMonitor, but for the sake of this new test case
it was the easiest way to trigger the bug.

The bug is this: Under specific conditions, Git needs to skip writing
the "link" extension (which is the index extension containing the
information pertaining to the split-index). To do that, the `base_oid`
attribute of the `split_index` structure in the in-memory index is
zeroed out, and `do_write_index()` specifically checks for a "null"
`base_oid` to understand that the "link" extension should not be
written. However, this violates the consistency of the in-memory index
structure, but that does not cause problems in most cases because the
process exits without using the in-memory index structure anymore,
anyway.

But: _When_ the in-memory index is still used (which is the case e.g. in
`git rebase`), subsequent writes of `the_index` are at risk of writing
out a bogus index file, one that _should_ have a "link" extension but
does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
set for subsequent writes, forcing the shared index to be written, which
re-initializes `base_oid` to a non-bogus state, and all is good.

When it is _not_ set, however, all kinds of mayhem ensue, resulting in
above-mentioned error messages, and often enough putting worktrees in a
totally broken state where the only recourse is to manually delete the
`index` and the `index.lock` files and then call `git reset` manually.
Not something to ask users to do.

The reason why it is comparatively easy to trigger the bug with
FSMonitor is that there is _another_ bug in the FSMonitor code:
`mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
variable as a Boolean. But it is a bit field, and 1 happens to be the
`SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
when writing the index, among other things.

"Comparatively easy" is a relative term in this context, for sure. The
essence of how the new test case triggers the bug is as following:

1. The `git rebase` invocation will first reset the worktree to
   a commit that contains only the `one.t` file, and then execute a
   rebase script that starts with the following commands (commit hashes
   skipped):

   label onto

   reset initial
   pick two
   label two

   reset two
   pick three
   [...]

2. Before executing the `label` command, a split index is written, as
   well as the shared index.

3. The `reset initial` command in the rebase script writes out a new
   split index but skips writing the shared index, as intended.

4. The `pick two` command updates the worktree and refreshes the index,
   marking the `two.t` entry as valid via the FSMonitor, which sets the
   `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
   `base_oid` attribute to be zeroed out and a full (non-split) index
   to be written (making sure _not_ to write the "link" extension).

5. Now, the `reset two` command will leave the worktree alone, but
   still write out a new split index, not writing the shared index
   (because `base_oid` is still zeroed out, and there is no index entry
   update requiring it to be written, either).

6. When it is turn to run `pick three`, the index is read, but it is
   too short: It only contains a single entry when there should be two,
   because the "link" extension is missing from the written-out index
   file.

There are three bugs at play, actually, which will be fixed over the
course of the next commits:

- The `base_oid` attribute should not be zeroed out to indicate when
  the "link" extension should not be written, as it puts the in-memory
  index structure into an inconsistent state.

- The FSMonitor should not overwrite bits in `cache_changed`.

- The `unpack_trees()` function tries to reuse the `split_index`
  structure from the source index, if any, but does not propagate the
  `SPLIT_INDEX_ORDERED` flag.

While a fix for the second bug would let this test case pass, there are
other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
the bug that most crucially needs to be fixed is the first one.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t7527-builtin-fsmonitor.sh | 37 ++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index d419085379c..cbafdd69602 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1003,4 +1003,41 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
 	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
 '
 
+test_expect_failure 'split-index and FSMonitor work well together' '
+	git init split-index &&
+	test_when_finished "git -C \"$PWD/split-index\" \
+		fsmonitor--daemon stop" &&
+	(
+		cd split-index &&
+		git config core.splitIndex true &&
+		# force split-index in most cases
+		git config splitIndex.maxPercentChange 99 &&
+		git config core.fsmonitor true &&
+
+		# Create the following commit topology:
+		#
+		# *   merge three
+		# |\
+		# | * three
+		# * | merge two
+		# |\|
+		# | * two
+		# * | one
+		# |/
+		# * 5a5efd7 initial
+
+		test_commit initial &&
+		test_commit two &&
+		test_commit three &&
+		git reset --hard initial &&
+		test_commit one &&
+		test_tick &&
+		git merge two &&
+		test_tick &&
+		git merge three &&
+
+		git rebase --force-rebase -r one
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
@ 2023-03-26 22:45   ` Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-26 22:45 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When a split-index is in effect, the `$GIT_DIR/index` file needs to
contain a "link" extension that contains all the information about the
split-index, including the information about the shared index.

However, in some cases Git needs to suppress writing that "link"
extension (i.e. to fall back to writing a full index) even if the
in-memory index structure _has_ a `split_index` configured. This is the
case e.g. when "too many not shared" index entries exist.

In such instances, the current code sets the `base_oid` field of said
`split_index` structure to all-zero to indicate that `do_write_index()`
should skip writing the "link" extension.

This can lead to problems later on, when the in-memory index is still
used to perform other operations and eventually wants to write a
split-index, detects the presence of the `split_index` and reuses that,
too (under the assumption that it has been initialized correctly and
still has a non-null `base_oid`).

Let's stop zeroing out the `base_oid` to indicate that the "link"
extension should not be written.

One might be tempted to simply call `discard_split_index()` instead,
under the assumption that Git decided to write a non-split index and
therefore the `split_index` structure might no longer be wanted.
However, that is not possible because that would release index entries
in `split_index->base` that are likely to still be in use. Therefore we
cannot do that.

The next best thing we _can_ do is to introduce a bit field to indicate
specifically which index extensions (not) to write. So that's what we do
here.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c                 | 49 +++++++++++++++++++++++-------------
 t/t7527-builtin-fsmonitor.sh |  2 +-
 2 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index b09128b1884..7c9a0eeeac8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2868,6 +2868,16 @@ static int record_ieot(void)
 	return !git_config_get_index_threads(&val) && val != 1;
 }
 
+enum write_extensions {
+	WRITE_NO_EXTENSION =              0,
+	WRITE_SPLIT_INDEX_EXTENSION =     1<<0,
+	WRITE_CACHE_TREE_EXTENSION =      1<<1,
+	WRITE_RESOLVE_UNDO_EXTENSION =    1<<2,
+	WRITE_UNTRACKED_CACHE_EXTENSION = 1<<3,
+	WRITE_FSMONITOR_EXTENSION =       1<<4,
+};
+#define WRITE_ALL_EXTENSIONS ((enum write_extensions)-1)
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2876,7 +2886,7 @@ static int record_ieot(void)
  * rely on it.
  */
 static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
-			  int strip_extensions, unsigned flags)
+			  enum write_extensions write_extensions, unsigned flags)
 {
 	uint64_t start = getnanotime();
 	struct hashfile *f;
@@ -3045,8 +3055,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 			return -1;
 	}
 
-	if (!strip_extensions && istate->split_index &&
-	    !is_null_oid(&istate->split_index->base_oid)) {
+	if (write_extensions & WRITE_SPLIT_INDEX_EXTENSION &&
+	    istate->split_index) {
 		struct strbuf sb = STRBUF_INIT;
 
 		if (istate->sparse_index)
@@ -3060,7 +3070,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && !drop_cache_tree && istate->cache_tree) {
+	if (write_extensions & WRITE_CACHE_TREE_EXTENSION &&
+	    !drop_cache_tree && istate->cache_tree) {
 		struct strbuf sb = STRBUF_INIT;
 
 		cache_tree_write(&sb, istate->cache_tree);
@@ -3070,7 +3081,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && istate->resolve_undo) {
+	if (write_extensions & WRITE_RESOLVE_UNDO_EXTENSION &&
+	    istate->resolve_undo) {
 		struct strbuf sb = STRBUF_INIT;
 
 		resolve_undo_write(&sb, istate->resolve_undo);
@@ -3081,7 +3093,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && istate->untracked) {
+	if (write_extensions & WRITE_UNTRACKED_CACHE_EXTENSION &&
+	    istate->untracked) {
 		struct strbuf sb = STRBUF_INIT;
 
 		write_untracked_extension(&sb, istate->untracked);
@@ -3092,7 +3105,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
-	if (!strip_extensions && istate->fsmonitor_last_update) {
+	if (write_extensions & WRITE_FSMONITOR_EXTENSION &&
+	    istate->fsmonitor_last_update) {
 		struct strbuf sb = STRBUF_INIT;
 
 		write_fsmonitor_extension(&sb, istate);
@@ -3166,8 +3180,10 @@ static int commit_locked_index(struct lock_file *lk)
 		return commit_lock_file(lk);
 }
 
-static int do_write_locked_index(struct index_state *istate, struct lock_file *lock,
-				 unsigned flags)
+static int do_write_locked_index(struct index_state *istate,
+				 struct lock_file *lock,
+				 unsigned flags,
+				 enum write_extensions write_extensions)
 {
 	int ret;
 	int was_full = istate->sparse_index == INDEX_EXPANDED;
@@ -3185,7 +3201,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 	 */
 	trace2_region_enter_printf("index", "do_write_index", the_repository,
 				   "%s", get_lock_file_path(lock));
-	ret = do_write_index(istate, lock->tempfile, 0, flags);
+	ret = do_write_index(istate, lock->tempfile, write_extensions, flags);
 	trace2_region_leave_printf("index", "do_write_index", the_repository,
 				   "%s", get_lock_file_path(lock));
 
@@ -3214,7 +3230,7 @@ static int write_split_index(struct index_state *istate,
 {
 	int ret;
 	prepare_to_write_split_index(istate);
-	ret = do_write_locked_index(istate, lock, flags);
+	ret = do_write_locked_index(istate, lock, flags, WRITE_ALL_EXTENSIONS);
 	finish_writing_split_index(istate);
 	return ret;
 }
@@ -3289,7 +3305,7 @@ static int write_shared_index(struct index_state *istate,
 
 	trace2_region_enter_printf("index", "shared/do_write_index",
 				   the_repository, "%s", get_tempfile_path(*temp));
-	ret = do_write_index(si->base, *temp, 1, flags);
+	ret = do_write_index(si->base, *temp, WRITE_NO_EXTENSION, flags);
 	trace2_region_leave_printf("index", "shared/do_write_index",
 				   the_repository, "%s", get_tempfile_path(*temp));
 
@@ -3366,9 +3382,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	if ((!si && !test_split_index_env) ||
 	    alternate_index_output ||
 	    (istate->cache_changed & ~EXTMASK)) {
-		if (si)
-			oidclr(&si->base_oid);
-		ret = do_write_locked_index(istate, lock, flags);
+		ret = do_write_locked_index(istate, lock, flags,
+					    ~WRITE_SPLIT_INDEX_EXTENSION);
 		goto out;
 	}
 
@@ -3394,8 +3409,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		/* Same initial permissions as the main .git/index file */
 		temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
 		if (!temp) {
-			oidclr(&si->base_oid);
-			ret = do_write_locked_index(istate, lock, flags);
+			ret = do_write_locked_index(istate, lock, flags,
+						    ~WRITE_SPLIT_INDEX_EXTENSION);
 			goto out;
 		}
 		ret = write_shared_index(istate, &temp, flags);
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index cbafdd69602..9fab9a2ab38 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1003,7 +1003,7 @@ test_expect_success !UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
 	egrep "^event: nfd/d_${utf8_nfc}/?$" ./unicode.trace
 '
 
-test_expect_failure 'split-index and FSMonitor work well together' '
+test_expect_success 'split-index and FSMonitor work well together' '
 	git init split-index &&
 	test_when_finished "git -C \"$PWD/split-index\" \
 		fsmonitor--daemon stop" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
@ 2023-03-26 22:45   ` Johannes Schindelin via GitGitGadget
  2023-03-26 22:45   ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
  2023-03-27 14:48   ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
  4 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-26 22:45 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

As of e636a7b4d030 (read-cache: be specific what part of the index has
changed, 2014-06-13), the paradigm `cache_changed = 1` fell out of
fashion and it became a bit field instead.

This is important because some bits have specific meaning and should not
be unset without care, e.g. `SPLIT_INDEX_ORDERED`.

However, b5a816975206 (mark_fsmonitor_valid(): mark the index as changed
if needed, 2019-05-24) did use the `cache_changed` attribute as if it
were a Boolean instead of a bit field.

That not only would override the `SPLIT_INDEX_ORDERED` bit when marking
index entries as valid via the FSMonitor, but worse: it would set the
`SOMETHING_OTHER` bit (whose value is 1). This means that Git would
unnecessarily force a full index to be written out when a split index
was asked for.

Let's instead use the bit that is specifically intended to indicate
FSMonitor-triggered changes, allowing the split-index feature to work as
designed.

Noticed-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsmonitor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsmonitor.h b/fsmonitor.h
index edf7ce5203b..778707b131b 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -86,7 +86,7 @@ static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache
 	    !(ce->ce_flags & CE_FSMONITOR_VALID)) {
 		if (S_ISGITLINK(ce->ce_mode))
 			return;
-		istate->cache_changed = 1;
+		istate->cache_changed |= FSMONITOR_CHANGED;
 		ce->ce_flags |= CE_FSMONITOR_VALID;
 		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2023-03-26 22:45   ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
@ 2023-03-26 22:45   ` Johannes Schindelin via GitGitGadget
  2023-03-27 14:48   ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
  4 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-03-26 22:45 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

When copying the `split_index` structure from one index structure to
another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it
is set, otherwise Git might forget to write the shared index when that
is actually needed.

It just so _happens_ that in many instances when `unpack_trees()` is
called, the result causes the shared index to be written anyway, but
there are edge cases when that is not so.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 unpack-trees.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 90b92114be8..ca5e47c77c0 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1916,6 +1916,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		 * create a new one.
 		 */
 		o->result.split_index = o->src_index->split_index;
+		if (o->src_index->cache_changed & SPLIT_INDEX_ORDERED)
+			o->result.cache_changed |= SPLIT_INDEX_ORDERED;
 		o->result.split_index->refcount++;
 	} else {
 		o->result.split_index = init_split_index(&o->result);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/4] Fix a few split-index bugs
  2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
                     ` (3 preceding siblings ...)
  2023-03-26 22:45   ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
@ 2023-03-27 14:48   ` Jeff Hostetler
  2023-03-27 16:42     ` Junio C Hamano
  4 siblings, 1 reply; 19+ messages in thread
From: Jeff Hostetler @ 2023-03-27 14:48 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin



On 3/26/23 6:45 PM, Johannes Schindelin via GitGitGadget wrote:
> I received an internal bug report that after upgrading from v2.39.2 to
> v2.40.0, some users ran into the following error message:
> 
...

Very nice. I like the new bit mask and getting rid of the double
negative strip/write logic.


Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>

Thanks!
Jeff


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/4] Fix a few split-index bugs
  2023-03-27 14:48   ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
@ 2023-03-27 16:42     ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2023-03-27 16:42 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff Hostetler <git@jeffhostetler.com> writes:

> On 3/26/23 6:45 PM, Johannes Schindelin via GitGitGadget wrote:
>> I received an internal bug report that after upgrading from v2.39.2 to
>> v2.40.0, some users ran into the following error message:
>> 
> ...
>
> Very nice. I like the new bit mask and getting rid of the double
> negative strip/write logic.

Yup, it indeed made the logic easier to see to split them into
individual bits per extensions even though an initial "gut" reaction
may have been "you ain't gonna need such a flexibility" ;-).

Thanks, both.  Will replace.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-03-27 16:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22 16:00 [PATCH 0/4] Fix a few split-index bugs Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-23 13:39   ` Jeff Hostetler
2023-03-22 16:00 ` [PATCH 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-22 21:24   ` Junio C Hamano
2023-03-23 13:59     ` Jeff Hostetler
2023-03-23 16:06       ` Junio C Hamano
2023-03-23 15:22   ` Jeff Hostetler
2023-03-22 16:00 ` [PATCH 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-22 16:00 ` [PATCH 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-23 14:32   ` Jeff Hostetler
2023-03-22 16:22 ` [PATCH 0/4] Fix a few split-index bugs Junio C Hamano
2023-03-26 22:45 ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 1/4] split-index & fsmonitor: demonstrate a bug Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 2/4] split-index; stop abusing the `base_oid` to strip the "link" extension Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 3/4] fsmonitor: avoid overriding `cache_changed` bits Johannes Schindelin via GitGitGadget
2023-03-26 22:45   ` [PATCH v2 4/4] unpack-trees: take care to propagate the split-index flag Johannes Schindelin via GitGitGadget
2023-03-27 14:48   ` [PATCH v2 0/4] Fix a few split-index bugs Jeff Hostetler
2023-03-27 16:42     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).