git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX
@ 2021-08-17 17:49 SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken for ~2.5 years, and it hasn't triggered any
index splitting since then.

The last patch in this series makes GIT_TEST_SPLIT_INDEX=1 work again,
although it slightly changes its behavior; see its log message for all
the details.

Patches 3 and 4 fix old tests that started to fail because
of this slight behavior change.  Bsides the final fix, patch 4 is the
most interesting in this series.

Patch 5 fixes new tests that were introduced while
GIT_TEST_SPLIT_INDEX was broken.

The first two patches are just minor while-at-it test cleanups.

SZEDER Gábor (6):
  t1600-index: remove unnecessary redirection
  t1600-index: don't run git commands upstream of a pipe
  t1600-index: disable GIT_TEST_SPLIT_INDEX
  read-cache: look for shared index files next to the index, too
  tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  read-cache: fix GIT_TEST_SPLIT_INDEX

 read-cache.c                       | 37 ++++++++++++++++------
 t/t1091-sparse-checkout-builtin.sh | 25 +++++++++------
 t/t1600-index.sh                   | 13 +++++---
 t/t1700-split-index.sh             | 34 ++++++++++++++++++++
 t/t7519-status-fsmonitor.sh        | 51 ++++++++++++++++--------------
 5 files changed, 113 insertions(+), 47 deletions(-)

-- 
2.33.0.453.gc5e41af357


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] t1600-index: remove unnecessary redirection
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-17 17:49 ` SZEDER Gábor
  2021-08-17 18:12   ` Derrick Stolee
  2021-08-17 17:49 ` [PATCH 2/6] t1600-index: don't run git commands upstream of a pipe SZEDER Gábor
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

In a helper function in the 't1600-index.sh' test script the stderr
of a 'git add' command is redirected to its stdout, but its stdout is
not redirected anywhere.  So apparently this redirection is
unnecessary, remove it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1600-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index c9b9e718b8..5d10cec67a 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -79,7 +79,7 @@ test_index_version () {
 		else
 			unset GIT_INDEX_VERSION
 		fi &&
-		git add a 2>&1 &&
+		git add a &&
 		echo $EXPECTED_OUTPUT_VERSION >expect &&
 		test-tool index-version <.git/index >actual &&
 		test_cmp expect actual
-- 
2.33.0.453.gc5e41af357


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] t1600-index: don't run git commands upstream of a pipe
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
@ 2021-08-17 17:49 ` SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX SZEDER Gábor
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1600-index.sh | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index 5d10cec67a..aa88fc30ce 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -13,7 +13,8 @@ test_expect_success 'bogus GIT_INDEX_VERSION issues warning' '
 		rm -f .git/index &&
 		GIT_INDEX_VERSION=2bogus &&
 		export GIT_INDEX_VERSION &&
-		git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
+		git add a 2>err &&
+		sed "s/[0-9]//" err >actual.err &&
 		sed -e "s/ Z$/ /" <<-\EOF >expect.err &&
 			warning: GIT_INDEX_VERSION set, but the value is invalid.
 			Using version Z
@@ -27,7 +28,8 @@ test_expect_success 'out of bounds GIT_INDEX_VERSION issues warning' '
 		rm -f .git/index &&
 		GIT_INDEX_VERSION=1 &&
 		export GIT_INDEX_VERSION &&
-		git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
+		git add a 2>err &&
+		sed "s/[0-9]//" err >actual.err &&
 		sed -e "s/ Z$/ /" <<-\EOF >expect.err &&
 			warning: GIT_INDEX_VERSION set, but the value is invalid.
 			Using version Z
@@ -50,7 +52,8 @@ test_expect_success 'out of bounds index.version issues warning' '
 		sane_unset GIT_INDEX_VERSION &&
 		rm -f .git/index &&
 		git config --add index.version 1 &&
-		git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
+		git add a 2>err &&
+		sed "s/[0-9]//" err >actual.err &&
 		sed -e "s/ Z$/ /" <<-\EOF >expect.err &&
 			warning: index.version set, but the value is invalid.
 			Using version Z
-- 
2.33.0.453.gc5e41af357


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 2/6] t1600-index: don't run git commands upstream of a pipe SZEDER Gábor
@ 2021-08-17 17:49 ` SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 4/6] read-cache: look for shared index files next to the index, too SZEDER Gábor
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

Tests in 't1600-index.sh' check that various bogus index version
values are recognized and an appropriate warning message is issued.
GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index splitting
randomly, and thus might interfere [1] with these tests: splitting the
index means that two index files are written (the shared base index
and the split '.git/index'), and the same warning message is then
issued twice, failing these tests.

Unset GIT_TEST_SPLIT_INDEX in this test script to avoid such
interference.

[1] There is no such interference at the moment, because, alas,
    GIT_TEST_SPLIT_INDEX=1 is broken and never split the index.  There
    was no such interference in the past (before it broke) either,
    because the first index write with GIT_TEST_SPLIT_INDEX=1 never
    split the index, only the second write did.  A subsequent commit
    fixing GIT_TEST_SPLIT_INDEX will have all the details on this.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1600-index.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index aa88fc30ce..46329c488b 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -4,6 +4,8 @@ test_description='index file specific tests'
 
 . ./test-lib.sh
 
+sane_unset GIT_TEST_SPLIT_INDEX
+
 test_expect_success 'setup' '
 	echo 1 >a
 '
-- 
2.33.0.453.gc5e41af357


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] read-cache: look for shared index files next to the index, too
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                   ` (2 preceding siblings ...)
  2021-08-17 17:49 ` [PATCH 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-17 17:49 ` SZEDER Gábor
  2021-08-17 17:49 ` [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

When reading a split index git always looks for its referenced shared
base index in the gitdir of the current repository, even when reading
an alternate index specified via GIT_INDEX_FILE, and even when that
alternate index file is the "main" '.git/index' file of an other
repository.  However, if that split index and its referenced shared
index files were written by a git command running entirely in that
other repository, then, naturally, the shared index file is written to
that other repository's gitdir.  Consequently, a git command
attempting to read that shared index file while running in a different
repository won't be able find it and will error out.

I'm not sure in what use case it is necessary to read the index of one
repository by a git command running in a different repository, but it
is certainly possible to do so, and in fact the test 'bare repository:
check that --cached honors index' in 't0003-attributes.sh' does
exactly that.  If GIT_TEST_SPLIT_INDEX=1 were to split the index in
just the right moment [1], then this test would indeed fail, because
the referenced shared index file could not be found.

Let's look for the referenced shared index file not only in the gitdir
of the current directory, but, if the shared index is not there, right
next to the split index as well.

[1] We haven't seen this issue trigger a failure in t0003 yet,
    because:

      - While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index
        splitting randomly, the first index write has always been
        deterministic and it has never split the index.

      - That alternate index file in the other repository is written
        only once in the entire test script, so it's never split.

    However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while
    doing so it will slightly change its behavior to always split the
    index already on the first index write, and t0003 would always
    fail without this patch.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 read-cache.c           | 14 +++++++++++++-
 t/t1700-split-index.sh | 23 +++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 9048ef9e90..fbd59886a3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2391,9 +2391,21 @@ int read_index_from(struct index_state *istate, const char *path,
 	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
 	trace2_region_enter_printf("index", "shared/do_read_index",
 				   the_repository, "%s", base_path);
-	ret = do_read_index(split_index->base, base_path, 1);
+	ret = do_read_index(split_index->base, base_path, 0);
 	trace2_region_leave_printf("index", "shared/do_read_index",
 				   the_repository, "%s", base_path);
+	if (!ret) {
+		char *path_copy = xstrdup(path);
+		const char *base_path2 = xstrfmt("%s/sharedindex.%s",
+						 dirname(path_copy),
+						 base_oid_hex);
+		free(path_copy);
+		trace2_region_enter_printf("index", "shared/do_read_index",
+					   the_repository, "%s", base_path2);
+		ret = do_read_index(split_index->base, base_path2, 1);
+		trace2_region_leave_printf("index", "shared/do_read_index",
+					   the_repository, "%s", base_path2);
+	}
 	if (!oideq(&split_index->base_oid, &split_index->base->oid))
 		die(_("broken index, expect %s in %s, got %s"),
 		    base_oid_hex, base_path,
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 986baa612e..e2aa0bd949 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -510,4 +510,27 @@ test_expect_success 'do not refresh null base index' '
 	)
 '
 
+test_expect_success 'reading split index at alternate location' '
+	git init reading-alternate-location &&
+	(
+		cd reading-alternate-location &&
+		>file-in-alternate &&
+		git update-index --split-index --add file-in-alternate
+	) &&
+	echo file-in-alternate >expect &&
+
+	# Should be able to find the shared index both right next to
+	# the specified split index file ...
+	GIT_INDEX_FILE=./reading-alternate-location/.git/index \
+	git ls-files --cached >actual &&
+	test_cmp expect actual &&
+
+	# ... and, for backwards compatibility, in the current GIT_DIR
+	# as well.
+	mv -v ./reading-alternate-location/.git/sharedindex.* .git &&
+	GIT_INDEX_FILE=./reading-alternate-location/.git/index \
+	git ls-files --cached >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.33.0.453.gc5e41af357


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                   ` (3 preceding siblings ...)
  2021-08-17 17:49 ` [PATCH 4/6] read-cache: look for shared index files next to the index, too SZEDER Gábor
@ 2021-08-17 17:49 ` SZEDER Gábor
  2021-08-17 18:26   ` Derrick Stolee
  2021-08-17 17:49 ` [PATCH 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  6 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

The sparse index and split index features are said to be currently
incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
interfere with the test cases exercising the sparse index feature.
Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
whole of 't1092-sparse-checkout-compatibility.sh'.  There are,
however, two other test cases exercising sparse index, namely
'sparse-index enabled and disabled' in
't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
index' in 't7519-status-fsmonitor.sh', and these two could fail with
GIT_TEST_SPLIT_INDEX=1 as well [2].

Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
test cases to avoid such interference.

Note that this is the minimal change to merely avoid failures when
these test cases are run with GIT_TEST_SPLIT_INDEX=1.  Interestingly,
though, without these changes the 'git sparse-checkout init --cone
--sparse-index' commands still succeed even with split index, and set
all the necessary configuration variables and create the initial
'$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
by later sanity checks finding that the index is not in fact a sparse
index.  This indicates that 'git sparse-checkout init --sparse-index'
lacks some error checking and its tests lack coverage related to split
index, but fixing those issues (let alone making sparse index
comparible with split index) is beyond the scope of this patch series.

[1] https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/

[2] Neither of these test cases fail at the moment, because
    GIT_TEST_SPLIT_INDEX=1 is broken and never splits the index, and
    it broke long before the sparse index feature was added.
    This patch series is about to fix GIT_TEST_SPLIT_INDEX, and then
    both test cases mentioned above would fail.

(The diff is best viewed with '--ignore-all-space')

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1091-sparse-checkout-builtin.sh | 25 +++++++++------
 t/t7519-status-fsmonitor.sh        | 51 ++++++++++++++++--------------
 2 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 38fc8340f5..3f94c41241 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -206,16 +206,21 @@ test_expect_success 'sparse-checkout disable' '
 '
 
 test_expect_success 'sparse-index enabled and disabled' '
-	git -C repo sparse-checkout init --cone --sparse-index &&
-	test_cmp_config -C repo true index.sparse &&
-	test-tool -C repo read-cache --table >cache &&
-	grep " tree " cache &&
-
-	git -C repo sparse-checkout disable &&
-	test-tool -C repo read-cache --table >cache &&
-	! grep " tree " cache &&
-	git -C repo config --list >config &&
-	! grep index.sparse config
+	(
+		sane_unset GIT_TEST_SPLIT_INDEX &&
+		git -C repo update-index --no-split-index &&
+
+		git -C repo sparse-checkout init --cone --sparse-index &&
+		test_cmp_config -C repo true index.sparse &&
+		test-tool -C repo read-cache --table >cache &&
+		grep " tree " cache &&
+
+		git -C repo sparse-checkout disable &&
+		test-tool -C repo read-cache --table >cache &&
+		! grep " tree " cache &&
+		git -C repo config --list >config &&
+		! grep index.sparse config
+	)
 '
 
 test_expect_success 'cone mode: init and set' '
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index deea88d443..513e17065f 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -402,34 +402,39 @@ check_sparse_index_behavior () {
 }
 
 test_expect_success 'status succeeds with sparse index' '
-	git reset --hard &&
+	(
+		sane_unset GIT_TEST_SPLIT_INDEX &&
+		git update-index --no-split-index &&
 
-	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
-	check_sparse_index_behavior ! &&
+		git reset --hard &&
 
-	write_script .git/hooks/fsmonitor-test<<-\EOF &&
-		printf "last_update_token\0"
-	EOF
-	git config core.fsmonitor .git/hooks/fsmonitor-test &&
-	check_sparse_index_behavior ! &&
+		test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
+		check_sparse_index_behavior ! &&
 
-	write_script .git/hooks/fsmonitor-test<<-\EOF &&
-		printf "last_update_token\0"
-		printf "dir1/modified\0"
-	EOF
-	check_sparse_index_behavior ! &&
+		write_script .git/hooks/fsmonitor-test<<-\EOF &&
+			printf "last_update_token\0"
+		EOF
+		git config core.fsmonitor .git/hooks/fsmonitor-test &&
+		check_sparse_index_behavior ! &&
 
-	cp -r dir1 dir1a &&
-	git add dir1a &&
-	git commit -m "add dir1a" &&
+		write_script .git/hooks/fsmonitor-test<<-\EOF &&
+			printf "last_update_token\0"
+			printf "dir1/modified\0"
+		EOF
+		check_sparse_index_behavior ! &&
 
-	# This one modifies outside the sparse-checkout definition
-	# and hence we expect to expand the sparse-index.
-	write_script .git/hooks/fsmonitor-test<<-\EOF &&
-		printf "last_update_token\0"
-		printf "dir1a/modified\0"
-	EOF
-	check_sparse_index_behavior
+		cp -r dir1 dir1a &&
+		git add dir1a &&
+		git commit -m "add dir1a" &&
+
+		# This one modifies outside the sparse-checkout definition
+		# and hence we expect to expand the sparse-index.
+		write_script .git/hooks/fsmonitor-test<<-\EOF &&
+			printf "last_update_token\0"
+			printf "dir1a/modified\0"
+		EOF
+		check_sparse_index_behavior
+	)
 '
 
 test_done
-- 
2.33.0.453.gc5e41af357


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                   ` (4 preceding siblings ...)
  2021-08-17 17:49 ` [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
@ 2021-08-17 17:49 ` SZEDER Gábor
  2021-08-17 18:29   ` Derrick Stolee
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  6 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 17:49 UTC (permalink / raw)
  To: git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy,
	SZEDER Gábor

Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing
"link" index ext with null base oid, 2019-02-13), and
GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since
then.

This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't
restore the pre-6e37c8ed3c behavior.  To understand the bug, the fix,
and the behavior change we first have to look at how
GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c:

  There are two places where we check the value of
  GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this:

    1) In the lower-level do_write_index(), where, if
       GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
       This call merely allocates and zero-initializes
       'istate->split_index', but does nothing else (i.e. doesn't fill
       the base/shared index with cache entries, doesn't actually
       write a shared index file, etc.).  Pertinent to this issue, the
       hash of the base index remains all zeroed out.

    2) In the higher-level write_locked_index(), but only when
       'istate->split_index' has already been initialized.  Then, if
       GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that
       triggers index splitting later in this function.  This
       randomness comes from the first byte of the hash of the base
       index via an 'if ((first_byte & 15) < 6)' condition.

       However, if 'istate->split_index' hasn't been initialized (i.e.
       it is still NULL), then write_locked_index() just calls
       do_write_locked_index(), which internally calls the above
       mentioned do_write_index().

  This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
  splitting randomly, the first two index writes were always
  deterministic (though I suspect this was unintentional):

    - The initial index write never splits the index.
      During the first index write write_locked_index() is called with
      'istate->split_index' still uninitialized, so the check in 2) is
      not executed.  It still calls do_write_index(), though, which
      then executes the check in 1).  The resulting all zero base
      index hash then leads to the 'link' extension being written to
      '.git/index', though a shared index file is not written:

        $ rm .git/index
        $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
        $ test-tool dump-split-index .git/index
        own c6ef71168597caec8553c83d9d0048f1ef416170
        base 0000000000000000000000000000000000000000
        100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
        replacements:
        deletions:
        $ ls -l .git/sharedindex.*
        ls: cannot access '.git/sharedindex.*': No such file or directory

    - The second index write always splits the index.
      When the index written in the previous point is read,
      'istate->split_index' is initialized because of the presence of
      the 'link' extension.  So during the second write
      write_locked_index() does run the check in 2), and the first
      byte of the all zero base index hash always fulfills the
      randomness condition, which in turn always triggers the index
      splitting.

    - Subsequent index writes will find the 'link' extension with a
      real non-zero base index hash, so from then on the check in 2)
      is executed and the first byte of the base index hash is as
      random as it gets (coming from the SHA-1 of index data including
      timestamps and inodes...).

All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:

  $ rm .git/index
  $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
  $ test-tool dump-split-index .git/index
  own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
  not a split index
  $ ls -l .git/sharedindex.*
  ls: cannot access '.git/sharedindex.*': No such file or directory

So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't
write a 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.

Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary.  The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it.  Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).

Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write.  Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here.  The new behaviour did
trigger test failures in 't0003-attributes.sh' and 't1600-index.sh',
though, which have been fixed in preparatory patches in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 read-cache.c           | 23 ++++++++++++++---------
 t/t1700-split-index.sh | 11 +++++++++++
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index fbd59886a3..335242c1cf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2824,11 +2824,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		}
 	}
 
-	if (!istate->version) {
+	if (!istate->version)
 		istate->version = get_index_format_default(the_repository);
-		if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
-			init_split_index(istate);
-	}
 
 	/* demote version 3 to version 2 when the latter suffices */
 	if (istate->version == 3 || istate->version == 2)
@@ -3255,7 +3252,7 @@ static int too_many_not_shared_entries(struct index_state *istate)
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		       unsigned flags)
 {
-	int new_shared_index, ret;
+	int new_shared_index, ret, test_split_index_env;
 	struct split_index *si = istate->split_index;
 
 	if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
@@ -3270,7 +3267,10 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	if (istate->fsmonitor_last_update)
 		fill_fsmonitor_bitmap(istate);
 
-	if (!si || alternate_index_output ||
+	test_split_index_env = git_env_bool("GIT_TEST_SPLIT_INDEX", 0);
+
+	if ((!si && !test_split_index_env) ||
+	    alternate_index_output ||
 	    (istate->cache_changed & ~EXTMASK)) {
 		if (si)
 			oidclr(&si->base_oid);
@@ -3278,10 +3278,15 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		goto out;
 	}
 
-	if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) {
-		int v = si->base_oid.hash[0];
-		if ((v & 15) < 6)
+	if (test_split_index_env) {
+		if (!si) {
+			si = init_split_index(istate);
 			istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		} else {
+			int v = si->base_oid.hash[0];
+			if ((v & 15) < 6)
+				istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		}
 	}
 	if (too_many_not_shared_entries(istate))
 		istate->cache_changed |= SPLIT_INDEX_ORDERED;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index e2aa0bd949..decd2527ed 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -533,4 +533,15 @@ test_expect_success 'reading split index at alternate location' '
 	test_cmp expect actual
 '
 
+test_expect_success 'GIT_TEST_SPLIT_INDEX works' '
+	git init git-test-split-index &&
+	(
+		cd git-test-split-index &&
+		>file &&
+		GIT_TEST_SPLIT_INDEX=1 git update-index --add file &&
+		ls -l .git/sharedindex.* >actual &&
+		test_line_count = 1 actual
+	)
+'
+
 test_done
-- 
2.33.0.453.gc5e41af357


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] t1600-index: remove unnecessary redirection
  2021-08-17 17:49 ` [PATCH 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
@ 2021-08-17 18:12   ` Derrick Stolee
  2021-08-17 18:39     ` SZEDER Gábor
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2021-08-17 18:12 UTC (permalink / raw)
  To: SZEDER Gábor, git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy

On 8/17/2021 1:49 PM, SZEDER Gábor wrote:
> In a helper function in the 't1600-index.sh' test script the stderr
> of a 'git add' command is redirected to its stdout, but its stdout is
> not redirected anywhere.  So apparently this redirection is
> unnecessary, remove it.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  t/t1600-index.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t1600-index.sh b/t/t1600-index.sh
> index c9b9e718b8..5d10cec67a 100755
> --- a/t/t1600-index.sh
> +++ b/t/t1600-index.sh
> @@ -79,7 +79,7 @@ test_index_version () {
>  		else
>  			unset GIT_INDEX_VERSION
>  		fi &&
> -		git add a 2>&1 &&
> +		git add a &&
>  		echo $EXPECTED_OUTPUT_VERSION >expect &&
>  		test-tool index-version <.git/index >actual &&
>  		test_cmp expect actual

Since here we have a 'test_cmp expect actual', perhaps the
actual mistake is that the line isn't

	git add a 2>&1 >actual &&

What do you think about that?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  2021-08-17 17:49 ` [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
@ 2021-08-17 18:26   ` Derrick Stolee
  2021-08-17 21:32     ` SZEDER Gábor
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2021-08-17 18:26 UTC (permalink / raw)
  To: SZEDER Gábor, git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy

On 8/17/2021 1:49 PM, SZEDER Gábor wrote:
> The sparse index and split index features are said to be currently
> incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
> interfere with the test cases exercising the sparse index feature.
> Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
> whole of 't1092-sparse-checkout-compatibility.sh'.  There are,
> however, two other test cases exercising sparse index, namely
> 'sparse-index enabled and disabled' in
> 't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
> index' in 't7519-status-fsmonitor.sh', and these two could fail with
> GIT_TEST_SPLIT_INDEX=1 as well [2].
> 
> Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
> test cases to avoid such interference.
> 
> Note that this is the minimal change to merely avoid failures when
> these test cases are run with GIT_TEST_SPLIT_INDEX=1.  Interestingly,
> though, without these changes the 'git sparse-checkout init --cone
> --sparse-index' commands still succeed even with split index, and set
> all the necessary configuration variables and create the initial
> '$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
> by later sanity checks finding that the index is not in fact a sparse
> index.  This indicates that 'git sparse-checkout init --sparse-index'
> lacks some error checking and its tests lack coverage related to split
> index, but fixing those issues (let alone making sparse index
> comparible with split index) is beyond the scope of this patch series.

s/comparible/compatible.

I agree that making these two things compatible is not something to
solve today. I'm not sure they should _ever_ be solved because of
the complexity involved (what if the base index is not sparse but
the tip wants to be, or vice-versa?, or if a directory must be
expanded because of a conflict?). They use very different approaches
to solve a similar problem: how to deal with large index files.

* The split index reduces index _write_ time by only editing a diff
  of the base index.

* The sparse index reduces index _read and write_ time by writing a
  smaller index, but only if the user is using cone mode sparse-
  checkout.

>  test_expect_success 'sparse-index enabled and disabled' '
> -	git -C repo sparse-checkout init --cone --sparse-index &&
> -	test_cmp_config -C repo true index.sparse &&
> -	test-tool -C repo read-cache --table >cache &&
> -	grep " tree " cache &&
> -
> -	git -C repo sparse-checkout disable &&
> -	test-tool -C repo read-cache --table >cache &&
> -	! grep " tree " cache &&
> -	git -C repo config --list >config &&
> -	! grep index.sparse config
> +	(
> +		sane_unset GIT_TEST_SPLIT_INDEX &&
> +		git -C repo update-index --no-split-index &&
> +
> +		git -C repo sparse-checkout init --cone --sparse-index &&
> +		test_cmp_config -C repo true index.sparse &&
> +		test-tool -C repo read-cache --table >cache &&
> +		grep " tree " cache &&
> +
> +		git -C repo sparse-checkout disable &&
> +		test-tool -C repo read-cache --table >cache &&
> +		! grep " tree " cache &&
> +		git -C repo config --list >config &&
> +		! grep index.sparse config
> +	)
>  '

This test is safe for now.

>  test_expect_success 'status succeeds with sparse index' '

This test is being edited in ds/sparse-index-ignored-files. v3
of the relevant patch was just sent today [1].

[1] https://lore.kernel.org/git/e66106f7a99d94145eec983ea5e72b7cf8a8a479.1629206603.git.gitgitgadget@gmail.com/

You might want to rebase on top of that topic. The edits to
the test are likely stable now.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX
  2021-08-17 17:49 ` [PATCH 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-17 18:29   ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2021-08-17 18:29 UTC (permalink / raw)
  To: SZEDER Gábor, git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy

On 8/17/2021 1:49 PM, SZEDER Gábor wrote:

Sorry to crop all of your excellent description of why this wasn't
working before and why it will work now, but...

> +test_expect_success 'GIT_TEST_SPLIT_INDEX works' '
> +	git init git-test-split-index &&
> +	(
> +		cd git-test-split-index &&
> +		>file &&
> +		GIT_TEST_SPLIT_INDEX=1 git update-index --add file &&
> +		ls -l .git/sharedindex.* >actual &&
> +		test_line_count = 1 actual
> +	)
> +'
> +

This is really the critical piece to ensure we don't have a
similar problem in the future. Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] t1600-index: remove unnecessary redirection
  2021-08-17 18:12   ` Derrick Stolee
@ 2021-08-17 18:39     ` SZEDER Gábor
  2021-08-17 18:48       ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 18:39 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Thomas Gummerer, Nguyễn Thái Ngọc Duy

On Tue, Aug 17, 2021 at 02:12:36PM -0400, Derrick Stolee wrote:
> On 8/17/2021 1:49 PM, SZEDER Gábor wrote:
> > In a helper function in the 't1600-index.sh' test script the stderr
> > of a 'git add' command is redirected to its stdout, but its stdout is
> > not redirected anywhere.  So apparently this redirection is
> > unnecessary, remove it.
> > 
> > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> > ---
> >  t/t1600-index.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/t/t1600-index.sh b/t/t1600-index.sh
> > index c9b9e718b8..5d10cec67a 100755
> > --- a/t/t1600-index.sh
> > +++ b/t/t1600-index.sh
> > @@ -79,7 +79,7 @@ test_index_version () {
> >  		else
> >  			unset GIT_INDEX_VERSION
> >  		fi &&
> > -		git add a 2>&1 &&
> > +		git add a &&
> >  		echo $EXPECTED_OUTPUT_VERSION >expect &&
> >  		test-tool index-version <.git/index >actual &&
> >  		test_cmp expect actual
> 
> Since here we have a 'test_cmp expect actual', perhaps the
> actual mistake is that the line isn't
> 
> 	git add a 2>&1 >actual &&
> 
> What do you think about that?

The actual file is written two lines later as:

  test-tool index-version <.git/index >actual

So it seems that neither stdout nor stderr of that 'git add' command
matters.

However, in most of the preceeding tests 'git add's stderr does indeed
matter, as they do:

    git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
    # write expect.err
    test_cmp expect.err actual.err

I suspect that the 2>&1 redirection this patch is removing might have
been a copy-paste error.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] t1600-index: remove unnecessary redirection
  2021-08-17 18:39     ` SZEDER Gábor
@ 2021-08-17 18:48       ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2021-08-17 18:48 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Thomas Gummerer, Nguyễn Thái Ngọc Duy

On 8/17/2021 2:39 PM, SZEDER Gábor wrote:
> On Tue, Aug 17, 2021 at 02:12:36PM -0400, Derrick Stolee wrote:
>> On 8/17/2021 1:49 PM, SZEDER Gábor wrote:
>>> In a helper function in the 't1600-index.sh' test script the stderr
>>> of a 'git add' command is redirected to its stdout, but its stdout is
>>> not redirected anywhere.  So apparently this redirection is
>>> unnecessary, remove it.
>>>
>>> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>>> ---
>>>  t/t1600-index.sh | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/t/t1600-index.sh b/t/t1600-index.sh
>>> index c9b9e718b8..5d10cec67a 100755
>>> --- a/t/t1600-index.sh
>>> +++ b/t/t1600-index.sh
>>> @@ -79,7 +79,7 @@ test_index_version () {
>>>  		else
>>>  			unset GIT_INDEX_VERSION
>>>  		fi &&
>>> -		git add a 2>&1 &&
>>> +		git add a &&
>>>  		echo $EXPECTED_OUTPUT_VERSION >expect &&
>>>  		test-tool index-version <.git/index >actual &&
>>>  		test_cmp expect actual
>>
>> Since here we have a 'test_cmp expect actual', perhaps the
>> actual mistake is that the line isn't
>>
>> 	git add a 2>&1 >actual &&
>>
>> What do you think about that?
> 
> The actual file is written two lines later as:
> 
>   test-tool index-version <.git/index >actual
> 
> So it seems that neither stdout nor stderr of that 'git add' command
> matters.

Of course. Sorry for misreading.

> However, in most of the preceeding tests 'git add's stderr does indeed
> matter, as they do:
> 
>     git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
>     # write expect.err
>     test_cmp expect.err actual.err
> 
> I suspect that the 2>&1 redirection this patch is removing might have
> been a copy-paste error.

That makes more sense than an errant addition.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  2021-08-17 18:26   ` Derrick Stolee
@ 2021-08-17 21:32     ` SZEDER Gábor
  2021-08-18 18:51       ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-17 21:32 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Thomas Gummerer, Nguyễn Thái Ngọc Duy

On Tue, Aug 17, 2021 at 02:26:23PM -0400, Derrick Stolee wrote:
> On 8/17/2021 1:49 PM, SZEDER Gábor wrote:
> > The sparse index and split index features are said to be currently
> > incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
> > interfere with the test cases exercising the sparse index feature.
> > Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
> > whole of 't1092-sparse-checkout-compatibility.sh'.  There are,
> > however, two other test cases exercising sparse index, namely
> > 'sparse-index enabled and disabled' in
> > 't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
> > index' in 't7519-status-fsmonitor.sh', and these two could fail with
> > GIT_TEST_SPLIT_INDEX=1 as well [2].
> > 
> > Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
> > test cases to avoid such interference.
> > 
> > Note that this is the minimal change to merely avoid failures when
> > these test cases are run with GIT_TEST_SPLIT_INDEX=1.  Interestingly,
> > though, without these changes the 'git sparse-checkout init --cone
> > --sparse-index' commands still succeed even with split index, and set
> > all the necessary configuration variables and create the initial
> > '$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
> > by later sanity checks finding that the index is not in fact a sparse
> > index.  This indicates that 'git sparse-checkout init --sparse-index'
> > lacks some error checking and its tests lack coverage related to split
> > index, but fixing those issues (let alone making sparse index
> > comparible with split index) is beyond the scope of this patch series.
> 
> s/comparible/compatible.
> 
> I agree that making these two things compatible is not something to
> solve today. I'm not sure they should _ever_ be solved because of
> the complexity involved (what if the base index is not sparse but
> the tip wants to be, or vice-versa?,

I think that this's not an issue, because a shared index file is not
for forever, but it's expected that a new shared index is written
every once in a while anyway, see splitIndex.maxPercentChange.  So
whenever sparse index gets enabled/disabled we could just write a new
shared index accordingly.  Maybe even when the sparse patterns change,
if necessary.

> or if a directory must be expanded because of a conflict?).

I'm not quite up-to-date in sparse index terminology, so I'm not sure
that this means...  but as mentioned above we can just write a new
shared index when deemed necessary.

> They use very different approaches
> to solve a similar problem: how to deal with large index files.
> 
> * The split index reduces index _write_ time by only editing a diff
>   of the base index.
> 
> * The sparse index reduces index _read and write_ time by writing a
>   smaller index, but only if the user is using cone mode sparse-
>   checkout.

Yeah, I think that in general there is more to be gained with sparse
index than with split index, though the split index might further
reduce the write time of a sparse index, because the amount of data
written is proportional with the nr of changed files instead of the nr
of all files in the sparse index.  I'm not sure that it's worth it,
either.

My remarks about compatibility primarily stem from your remarks about
compatibility in response to my sparse vs. split test failure report
in:

  https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/

  "the sparse-index is (currently) incompatible with the split-index"

I assumed that that "(currently)" implies that eventually the two will
be made compatible.

Anyway, I'm fine with leaving them incompatible, but 'git
sparse-checkout' should still learn about split index, so it won't
pretend to succeed after it couldn't do what it was asked to when
there is a split index.  Maybe vice-versa as well.

> >  test_expect_success 'sparse-index enabled and disabled' '
> > -	git -C repo sparse-checkout init --cone --sparse-index &&
> > -	test_cmp_config -C repo true index.sparse &&
> > -	test-tool -C repo read-cache --table >cache &&
> > -	grep " tree " cache &&
> > -
> > -	git -C repo sparse-checkout disable &&
> > -	test-tool -C repo read-cache --table >cache &&
> > -	! grep " tree " cache &&
> > -	git -C repo config --list >config &&
> > -	! grep index.sparse config
> > +	(
> > +		sane_unset GIT_TEST_SPLIT_INDEX &&
> > +		git -C repo update-index --no-split-index &&
> > +
> > +		git -C repo sparse-checkout init --cone --sparse-index &&
> > +		test_cmp_config -C repo true index.sparse &&
> > +		test-tool -C repo read-cache --table >cache &&
> > +		grep " tree " cache &&
> > +
> > +		git -C repo sparse-checkout disable &&
> > +		test-tool -C repo read-cache --table >cache &&
> > +		! grep " tree " cache &&
> > +		git -C repo config --list >config &&
> > +		! grep index.sparse config
> > +	)
> >  '
> 
> This test is safe for now.
> 
> >  test_expect_success 'status succeeds with sparse index' '
> 
> This test is being edited in ds/sparse-index-ignored-files. v3
> of the relevant patch was just sent today [1].
> 
> [1] https://lore.kernel.org/git/e66106f7a99d94145eec983ea5e72b7cf8a8a479.1629206603.git.gitgitgadget@gmail.com/
> 
> You might want to rebase on top of that topic. The edits to
> the test are likely stable now.

Oh, no :)  The test 'cone mode clears ignored subdirectories' added in
that patch series (as in 'seen', so not the newes version) fails with
the working GIT_TEST_SPLIT_INDEX=1 as well, and I don't immediately
see why, as it doesn't seem to use sparse index.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  2021-08-17 21:32     ` SZEDER Gábor
@ 2021-08-18 18:51       ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2021-08-18 18:51 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Thomas Gummerer, Nguyễn Thái Ngọc Duy

On 8/17/2021 5:32 PM, SZEDER Gábor wrote:
> On Tue, Aug 17, 2021 at 02:26:23PM -0400, Derrick Stolee wrote:
>> On 8/17/2021 1:49 PM, SZEDER Gábor wrote:
>>> The sparse index and split index features are said to be currently
>>> incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
>>> interfere with the test cases exercising the sparse index feature.
>>> Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
>>> whole of 't1092-sparse-checkout-compatibility.sh'.  There are,
>>> however, two other test cases exercising sparse index, namely
>>> 'sparse-index enabled and disabled' in
>>> 't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
>>> index' in 't7519-status-fsmonitor.sh', and these two could fail with
>>> GIT_TEST_SPLIT_INDEX=1 as well [2].
>>>
>>> Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
>>> test cases to avoid such interference.
>>>
>>> Note that this is the minimal change to merely avoid failures when
>>> these test cases are run with GIT_TEST_SPLIT_INDEX=1.  Interestingly,
>>> though, without these changes the 'git sparse-checkout init --cone
>>> --sparse-index' commands still succeed even with split index, and set
>>> all the necessary configuration variables and create the initial
>>> '$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
>>> by later sanity checks finding that the index is not in fact a sparse
>>> index.  This indicates that 'git sparse-checkout init --sparse-index'
>>> lacks some error checking and its tests lack coverage related to split
>>> index, but fixing those issues (let alone making sparse index
>>> comparible with split index) is beyond the scope of this patch series.
>>
>> s/comparible/compatible.
>>
>> I agree that making these two things compatible is not something to
>> solve today. I'm not sure they should _ever_ be solved because of
>> the complexity involved (what if the base index is not sparse but
>> the tip wants to be, or vice-versa?,
> 
> I think that this's not an issue, because a shared index file is not
> for forever, but it's expected that a new shared index is written
> every once in a while anyway, see splitIndex.maxPercentChange.  So
> whenever sparse index gets enabled/disabled we could just write a new
> shared index accordingly.  Maybe even when the sparse patterns change,
> if necessary.
> 
>> or if a directory must be expanded because of a conflict?).
> 
> I'm not quite up-to-date in sparse index terminology, so I'm not sure
> that this means...  but as mentioned above we can just write a new
> shared index when deemed necessary.
> 
>> They use very different approaches
>> to solve a similar problem: how to deal with large index files.
>>
>> * The split index reduces index _write_ time by only editing a diff
>>   of the base index.
>>
>> * The sparse index reduces index _read and write_ time by writing a
>>   smaller index, but only if the user is using cone mode sparse-
>>   checkout.
> 
> Yeah, I think that in general there is more to be gained with sparse
> index than with split index, though the split index might further
> reduce the write time of a sparse index, because the amount of data
> written is proportional with the nr of changed files instead of the nr
> of all files in the sparse index.  I'm not sure that it's worth it,
> either.
> 
> My remarks about compatibility primarily stem from your remarks about
> compatibility in response to my sparse vs. split test failure report
> in:
> 
>   https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/
> 
>   "the sparse-index is (currently) incompatible with the split-index"
> 
> I assumed that that "(currently)" implies that eventually the two will
> be made compatible.

I say "(currently)" because there is nothing stopping someone from
integrating them. I'm not sure anyone ever will.

> Anyway, I'm fine with leaving them incompatible, but 'git
> sparse-checkout' should still learn about split index, so it won't
> pretend to succeed after it couldn't do what it was asked to when
> there is a split index.  Maybe vice-versa as well.

You're saying that when 'git sparse-checkout init --cone --sparse-index'
doesn't write a sparse index because the index is already split, we
should actually die() saying "split index and sparse index are
incompatible" so the user knows why the sparse index was not enabled?
That seems reasonable for now, although it is successful in setting
the config to enable the sparse index opportunistically in the future.

>>>  test_expect_success 'sparse-index enabled and disabled' '
>>> -	git -C repo sparse-checkout init --cone --sparse-index &&
>>> -	test_cmp_config -C repo true index.sparse &&
>>> -	test-tool -C repo read-cache --table >cache &&
>>> -	grep " tree " cache &&
>>> -
>>> -	git -C repo sparse-checkout disable &&
>>> -	test-tool -C repo read-cache --table >cache &&
>>> -	! grep " tree " cache &&
>>> -	git -C repo config --list >config &&
>>> -	! grep index.sparse config
>>> +	(
>>> +		sane_unset GIT_TEST_SPLIT_INDEX &&
>>> +		git -C repo update-index --no-split-index &&
>>> +
>>> +		git -C repo sparse-checkout init --cone --sparse-index &&
>>> +		test_cmp_config -C repo true index.sparse &&
>>> +		test-tool -C repo read-cache --table >cache &&
>>> +		grep " tree " cache &&
>>> +
>>> +		git -C repo sparse-checkout disable &&
>>> +		test-tool -C repo read-cache --table >cache &&
>>> +		! grep " tree " cache &&
>>> +		git -C repo config --list >config &&
>>> +		! grep index.sparse config
>>> +	)
>>>  '
>>
>> This test is safe for now.
>>
>>>  test_expect_success 'status succeeds with sparse index' '
>>
>> This test is being edited in ds/sparse-index-ignored-files. v3
>> of the relevant patch was just sent today [1].
>>
>> [1] https://lore.kernel.org/git/e66106f7a99d94145eec983ea5e72b7cf8a8a479.1629206603.git.gitgitgadget@gmail.com/
>>
>> You might want to rebase on top of that topic. The edits to
>> the test are likely stable now.
> 
> Oh, no :)  The test 'cone mode clears ignored subdirectories' added in
> that patch series (as in 'seen', so not the newes version) fails with
> the working GIT_TEST_SPLIT_INDEX=1 as well, and I don't immediately
> see why, as it doesn't seem to use sparse index.
 
Oh interesting! This is because the index converts to a sparse index
in-memory solely so it can use the sparse directory entries to pick
which directories need to be deleted. In the presence of a split
index, convert_to_sparse() fails due to these lines:

	if (istate->split_index || istate->sparse_index || !istate->cache_nr ||
	    !core_apply_sparse_checkout || !core_sparse_checkout_cone)
		return 0;

	if (!istate->repo)
		istate->repo = the_repository;

	if (!(flags & SPARSE_INDEX_IGNORE_CONFIG)) {
		...

While this if here is checking the condition for creating it in-memory.

The fix is likely to do the following instead:

	if (istate->sparse_index || !istate->cache_nr ||
	    !core_apply_sparse_checkout || !core_sparse_checkout_cone)
		return 0;

	if (!istate->repo)
		istate->repo = the_repository;

	if (!(flags & SPARSE_INDEX_IGNORE_CONFIG)) {
		if (istate->split_index)
			return 0;
		...

and maybe even rename the flag to SPARSE_INDEX_MEMORY_ONLY or something.

Thanks for alerting me to this issue!

-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX
  2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                   ` (5 preceding siblings ...)
  2021-08-17 17:49 ` [PATCH 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-26 20:59 ` SZEDER Gábor
  2021-08-26 20:59   ` [PATCH v2 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
                     ` (6 more replies)
  6 siblings, 7 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 20:59 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

To recap from v1's cover letter:

Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken for ~2.5 years, and it hasn't triggered any
index splitting since then.

The last patch in this series makes GIT_TEST_SPLIT_INDEX=1 work again,
although it slightly changes its behavior; see its log message for all
the details.

Patches 3 and 4 fix old tests that started to fail because
of this slight behavior change.  Bsides the final fix, patch 4 is the
most interesting in this series.

Patch 5 fixes recent sparse index tests that were introduced while
GIT_TEST_SPLIT_INDEX was broken.

The first two patches are just minor while-at-it test cleanups.


Changes since v1, both in patch 5:
  - Removed a remark about making split and sparse index compatible.
  - Stolle suggested to rebase this patch series on top of topic
    'ds/sparse-index-ignored-files' to avoid merge conflicts in a
    sparse index test.  Well, I didn't do that, but instead I rebased
    this series on top of the merge of master and 86d6ca5886 (t7519:
    rewrite sparse index test, 2021-08-10), and the latter commit:
      - is the one responsible for said merge conflicts, and
      - is at the bottom of 'ds/sparse-index-ignored-files', and could
	be considered more as a followup fix for topic
	'ds/status-with-sparse-index' (already in v2.33.0) than an
	integral part of 'ds/sparse-index-ignored-files'.
    The rest of 'ds/sparse-index-ignored-files' doesn't have any
    conflicts with this patch series that auto-merge couldn't resolve.

SZEDER Gábor (6):
  t1600-index: remove unnecessary redirection
  t1600-index: don't run git commands upstream of a pipe
  t1600-index: disable GIT_TEST_SPLIT_INDEX
  read-cache: look for shared index files next to the index, too
  tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  read-cache: fix GIT_TEST_SPLIT_INDEX

 read-cache.c                       | 37 +++++++++++-----
 t/t1091-sparse-checkout-builtin.sh | 25 ++++++-----
 t/t1600-index.sh                   | 13 ++++--
 t/t1700-split-index.sh             | 34 +++++++++++++++
 t/t7519-status-fsmonitor.sh        | 68 ++++++++++++++++--------------
 5 files changed, 121 insertions(+), 56 deletions(-)

Range-diff against v1:
1:  7e1d38d215 = 1:  ab555030cb t1600-index: remove unnecessary redirection
2:  d86c91d26e = 2:  38d350b02d t1600-index: don't run git commands upstream of a pipe
3:  792b6739e5 = 3:  57b5386aee t1600-index: disable GIT_TEST_SPLIT_INDEX
4:  b12467bab7 = 4:  58c444bcd0 read-cache: look for shared index files next to the index, too
5:  51e13df905 ! 5:  24b8b7738c tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
    @@ Commit message
         by later sanity checks finding that the index is not in fact a sparse
         index.  This indicates that 'git sparse-checkout init --sparse-index'
         lacks some error checking and its tests lack coverage related to split
    -    index, but fixing those issues (let alone making sparse index
    -    comparible with split index) is beyond the scope of this patch series.
    +    index, but fixing those issues is beyond the scope of this patch
    +    series.
     
         [1] https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/
     
    @@ t/t7519-status-fsmonitor.sh: check_sparse_index_behavior () {
      }
      
      test_expect_success 'status succeeds with sparse index' '

  I took the liberty to cut the part of the range-diff showing the
  changes of changes to this test, because 86d6ca5886 totally rewrites
  this test, while this patch changes the test's indentation,
  resulting in an unusable range-diff.
  Instead, to show the only relevant change in this patch here is the
  first hunk touching this test with --ignore-all-space from v1:

    @@ -402,6 +402,10 @@ check_sparse_index_behavior () {
     }
    
     test_expect_success 'status succeeds with sparse index' '
    +       (
    +               sane_unset GIT_TEST_SPLIT_INDEX &&
    +               git update-index --no-split-index &&
    +
                    git reset --hard &&
    
                    test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&

  And from v2:

    @@ -399,6 +399,9 @@ check_sparse_index_behavior () {
     }
    
     test_expect_success 'status succeeds with sparse index' '
    +       (
    +               sane_unset GIT_TEST_SPLIT_INDEX &&
    +
                    git clone . full &&
                    git clone . sparse &&
                    git -C sparse sparse-checkout init --cone --sparse-index &&

6:  a5c790e622 = 6:  a43a12b0ce read-cache: fix GIT_TEST_SPLIT_INDEX
-- 
2.33.0.358.g803110d36e


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/6] t1600-index: remove unnecessary redirection
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-26 20:59   ` SZEDER Gábor
  2021-08-26 21:00   ` [PATCH v2 2/6] t1600-index: don't run git commands upstream of a pipe SZEDER Gábor
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 20:59 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

In a helper function in the 't1600-index.sh' test script the stderr
of a 'git add' command is redirected to its stdout, but its stdout is
not redirected anywhere.  So apparently this redirection is
unnecessary, remove it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1600-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index c9b9e718b8..5d10cec67a 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -79,7 +79,7 @@ test_index_version () {
 		else
 			unset GIT_INDEX_VERSION
 		fi &&
-		git add a 2>&1 &&
+		git add a &&
 		echo $EXPECTED_OUTPUT_VERSION >expect &&
 		test-tool index-version <.git/index >actual &&
 		test_cmp expect actual
-- 
2.33.0.358.g803110d36e


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/6] t1600-index: don't run git commands upstream of a pipe
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  2021-08-26 20:59   ` [PATCH v2 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
@ 2021-08-26 21:00   ` SZEDER Gábor
  2021-08-26 21:00   ` [PATCH v2 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX SZEDER Gábor
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 21:00 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1600-index.sh | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index 5d10cec67a..aa88fc30ce 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -13,7 +13,8 @@ test_expect_success 'bogus GIT_INDEX_VERSION issues warning' '
 		rm -f .git/index &&
 		GIT_INDEX_VERSION=2bogus &&
 		export GIT_INDEX_VERSION &&
-		git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
+		git add a 2>err &&
+		sed "s/[0-9]//" err >actual.err &&
 		sed -e "s/ Z$/ /" <<-\EOF >expect.err &&
 			warning: GIT_INDEX_VERSION set, but the value is invalid.
 			Using version Z
@@ -27,7 +28,8 @@ test_expect_success 'out of bounds GIT_INDEX_VERSION issues warning' '
 		rm -f .git/index &&
 		GIT_INDEX_VERSION=1 &&
 		export GIT_INDEX_VERSION &&
-		git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
+		git add a 2>err &&
+		sed "s/[0-9]//" err >actual.err &&
 		sed -e "s/ Z$/ /" <<-\EOF >expect.err &&
 			warning: GIT_INDEX_VERSION set, but the value is invalid.
 			Using version Z
@@ -50,7 +52,8 @@ test_expect_success 'out of bounds index.version issues warning' '
 		sane_unset GIT_INDEX_VERSION &&
 		rm -f .git/index &&
 		git config --add index.version 1 &&
-		git add a 2>&1 | sed "s/[0-9]//" >actual.err &&
+		git add a 2>err &&
+		sed "s/[0-9]//" err >actual.err &&
 		sed -e "s/ Z$/ /" <<-\EOF >expect.err &&
 			warning: index.version set, but the value is invalid.
 			Using version Z
-- 
2.33.0.358.g803110d36e


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  2021-08-26 20:59   ` [PATCH v2 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
  2021-08-26 21:00   ` [PATCH v2 2/6] t1600-index: don't run git commands upstream of a pipe SZEDER Gábor
@ 2021-08-26 21:00   ` SZEDER Gábor
  2021-08-26 21:00   ` [PATCH v2 4/6] read-cache: look for shared index files next to the index, too SZEDER Gábor
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 21:00 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Tests in 't1600-index.sh' check that various bogus index version
values are recognized and an appropriate warning message is issued.
GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index splitting
randomly, and thus might interfere [1] with these tests: splitting the
index means that two index files are written (the shared base index
and the split '.git/index'), and the same warning message is then
issued twice, failing these tests.

Unset GIT_TEST_SPLIT_INDEX in this test script to avoid such
interference.

[1] There is no such interference at the moment, because, alas,
    GIT_TEST_SPLIT_INDEX=1 is broken and never split the index.  There
    was no such interference in the past (before it broke) either,
    because the first index write with GIT_TEST_SPLIT_INDEX=1 never
    split the index, only the second write did.  A subsequent commit
    fixing GIT_TEST_SPLIT_INDEX will have all the details on this.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1600-index.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t1600-index.sh b/t/t1600-index.sh
index aa88fc30ce..46329c488b 100755
--- a/t/t1600-index.sh
+++ b/t/t1600-index.sh
@@ -4,6 +4,8 @@ test_description='index file specific tests'
 
 . ./test-lib.sh
 
+sane_unset GIT_TEST_SPLIT_INDEX
+
 test_expect_success 'setup' '
 	echo 1 >a
 '
-- 
2.33.0.358.g803110d36e


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/6] read-cache: look for shared index files next to the index, too
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                     ` (2 preceding siblings ...)
  2021-08-26 21:00   ` [PATCH v2 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-26 21:00   ` SZEDER Gábor
  2021-08-26 21:00   ` [PATCH v2 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 21:00 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

When reading a split index git always looks for its referenced shared
base index in the gitdir of the current repository, even when reading
an alternate index specified via GIT_INDEX_FILE, and even when that
alternate index file is the "main" '.git/index' file of an other
repository.  However, if that split index and its referenced shared
index files were written by a git command running entirely in that
other repository, then, naturally, the shared index file is written to
that other repository's gitdir.  Consequently, a git command
attempting to read that shared index file while running in a different
repository won't be able find it and will error out.

I'm not sure in what use case it is necessary to read the index of one
repository by a git command running in a different repository, but it
is certainly possible to do so, and in fact the test 'bare repository:
check that --cached honors index' in 't0003-attributes.sh' does
exactly that.  If GIT_TEST_SPLIT_INDEX=1 were to split the index in
just the right moment [1], then this test would indeed fail, because
the referenced shared index file could not be found.

Let's look for the referenced shared index file not only in the gitdir
of the current directory, but, if the shared index is not there, right
next to the split index as well.

[1] We haven't seen this issue trigger a failure in t0003 yet,
    because:

      - While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index
        splitting randomly, the first index write has always been
        deterministic and it has never split the index.

      - That alternate index file in the other repository is written
        only once in the entire test script, so it's never split.

    However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while
    doing so it will slightly change its behavior to always split the
    index already on the first index write, and t0003 would always
    fail without this patch.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 read-cache.c           | 14 +++++++++++++-
 t/t1700-split-index.sh | 23 +++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 9048ef9e90..fbd59886a3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2391,9 +2391,21 @@ int read_index_from(struct index_state *istate, const char *path,
 	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
 	trace2_region_enter_printf("index", "shared/do_read_index",
 				   the_repository, "%s", base_path);
-	ret = do_read_index(split_index->base, base_path, 1);
+	ret = do_read_index(split_index->base, base_path, 0);
 	trace2_region_leave_printf("index", "shared/do_read_index",
 				   the_repository, "%s", base_path);
+	if (!ret) {
+		char *path_copy = xstrdup(path);
+		const char *base_path2 = xstrfmt("%s/sharedindex.%s",
+						 dirname(path_copy),
+						 base_oid_hex);
+		free(path_copy);
+		trace2_region_enter_printf("index", "shared/do_read_index",
+					   the_repository, "%s", base_path2);
+		ret = do_read_index(split_index->base, base_path2, 1);
+		trace2_region_leave_printf("index", "shared/do_read_index",
+					   the_repository, "%s", base_path2);
+	}
 	if (!oideq(&split_index->base_oid, &split_index->base->oid))
 		die(_("broken index, expect %s in %s, got %s"),
 		    base_oid_hex, base_path,
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 986baa612e..e2aa0bd949 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -510,4 +510,27 @@ test_expect_success 'do not refresh null base index' '
 	)
 '
 
+test_expect_success 'reading split index at alternate location' '
+	git init reading-alternate-location &&
+	(
+		cd reading-alternate-location &&
+		>file-in-alternate &&
+		git update-index --split-index --add file-in-alternate
+	) &&
+	echo file-in-alternate >expect &&
+
+	# Should be able to find the shared index both right next to
+	# the specified split index file ...
+	GIT_INDEX_FILE=./reading-alternate-location/.git/index \
+	git ls-files --cached >actual &&
+	test_cmp expect actual &&
+
+	# ... and, for backwards compatibility, in the current GIT_DIR
+	# as well.
+	mv -v ./reading-alternate-location/.git/sharedindex.* .git &&
+	GIT_INDEX_FILE=./reading-alternate-location/.git/index \
+	git ls-files --cached >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.33.0.358.g803110d36e


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                     ` (3 preceding siblings ...)
  2021-08-26 21:00   ` [PATCH v2 4/6] read-cache: look for shared index files next to the index, too SZEDER Gábor
@ 2021-08-26 21:00   ` SZEDER Gábor
  2021-08-26 21:00   ` [PATCH v2 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
  2021-08-31 14:47   ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX Derrick Stolee
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 21:00 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

The sparse index and split index features are said to be currently
incompatible [1], and consequently GIT_TEST_SPLIT_INDEX=1 might
interfere with the test cases exercising the sparse index feature.
Therefore GIT_TEST_SPLIT_INDEX is already explicitly disabled for the
whole of 't1092-sparse-checkout-compatibility.sh'.  There are,
however, two other test cases exercising sparse index, namely
'sparse-index enabled and disabled' in
't1091-sparse-checkout-builtin.sh' and 'status succeeds with sparse
index' in 't7519-status-fsmonitor.sh', and these two could fail with
GIT_TEST_SPLIT_INDEX=1 as well [2].

Unset GIT_TEST_SPLIT_INDEX and disable the split index in these two
test cases to avoid such interference.

Note that this is the minimal change to merely avoid failures when
these test cases are run with GIT_TEST_SPLIT_INDEX=1.  Interestingly,
though, without these changes the 'git sparse-checkout init --cone
--sparse-index' commands still succeed even with split index, and set
all the necessary configuration variables and create the initial
'$GIT_DIR/info/sparse-checkout' file, but the test failures are caused
by later sanity checks finding that the index is not in fact a sparse
index.  This indicates that 'git sparse-checkout init --sparse-index'
lacks some error checking and its tests lack coverage related to split
index, but fixing those issues is beyond the scope of this patch
series.

[1] https://public-inbox.org/git/48e9c3d6-407a-1843-2d91-22112410e3f8@gmail.com/

[2] Neither of these test cases fail at the moment, because
    GIT_TEST_SPLIT_INDEX=1 is broken and never splits the index, and
    it broke long before the sparse index feature was added.
    This patch series is about to fix GIT_TEST_SPLIT_INDEX, and then
    both test cases mentioned above would fail.

(The diff is best viewed with '--ignore-all-space')

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t1091-sparse-checkout-builtin.sh | 25 ++++++-----
 t/t7519-status-fsmonitor.sh        | 68 ++++++++++++++++--------------
 2 files changed, 51 insertions(+), 42 deletions(-)

diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 38fc8340f5..3f94c41241 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -206,16 +206,21 @@ test_expect_success 'sparse-checkout disable' '
 '
 
 test_expect_success 'sparse-index enabled and disabled' '
-	git -C repo sparse-checkout init --cone --sparse-index &&
-	test_cmp_config -C repo true index.sparse &&
-	test-tool -C repo read-cache --table >cache &&
-	grep " tree " cache &&
-
-	git -C repo sparse-checkout disable &&
-	test-tool -C repo read-cache --table >cache &&
-	! grep " tree " cache &&
-	git -C repo config --list >config &&
-	! grep index.sparse config
+	(
+		sane_unset GIT_TEST_SPLIT_INDEX &&
+		git -C repo update-index --no-split-index &&
+
+		git -C repo sparse-checkout init --cone --sparse-index &&
+		test_cmp_config -C repo true index.sparse &&
+		test-tool -C repo read-cache --table >cache &&
+		grep " tree " cache &&
+
+		git -C repo sparse-checkout disable &&
+		test-tool -C repo read-cache --table >cache &&
+		! grep " tree " cache &&
+		git -C repo config --list >config &&
+		! grep index.sparse config
+	)
 '
 
 test_expect_success 'cone mode: init and set' '
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 6f2cf306f6..451734b9c2 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -399,41 +399,45 @@ check_sparse_index_behavior () {
 }
 
 test_expect_success 'status succeeds with sparse index' '
-	git clone . full &&
-	git clone . sparse &&
-	git -C sparse sparse-checkout init --cone --sparse-index &&
-	git -C sparse sparse-checkout set dir1 dir2 &&
+	(
+		sane_unset GIT_TEST_SPLIT_INDEX &&
 
-	write_script .git/hooks/fsmonitor-test <<-\EOF &&
-		printf "last_update_token\0"
-	EOF
-	git -C full config core.fsmonitor ../.git/hooks/fsmonitor-test &&
-	git -C sparse config core.fsmonitor ../.git/hooks/fsmonitor-test &&
-	check_sparse_index_behavior ! &&
+		git clone . full &&
+		git clone . sparse &&
+		git -C sparse sparse-checkout init --cone --sparse-index &&
+		git -C sparse sparse-checkout set dir1 dir2 &&
 
-	write_script .git/hooks/fsmonitor-test <<-\EOF &&
-		printf "last_update_token\0"
-		printf "dir1/modified\0"
-	EOF
-	check_sparse_index_behavior ! &&
-
-	git -C sparse sparse-checkout add dir1a &&
+		write_script .git/hooks/fsmonitor-test <<-\EOF &&
+			printf "last_update_token\0"
+		EOF
+		git -C full config core.fsmonitor ../.git/hooks/fsmonitor-test &&
+		git -C sparse config core.fsmonitor ../.git/hooks/fsmonitor-test &&
+		check_sparse_index_behavior ! &&
 
-	for repo in full sparse
-	do
-		cp -r $repo/dir1 $repo/dir1a &&
-		git -C $repo add dir1a &&
-		git -C $repo commit -m "add dir1a" || return 1
-	done &&
-	git -C sparse sparse-checkout set dir1 dir2 &&
-
-	# This one modifies outside the sparse-checkout definition
-	# and hence we expect to expand the sparse-index.
-	write_script .git/hooks/fsmonitor-test <<-\EOF &&
-		printf "last_update_token\0"
-		printf "dir1a/modified\0"
-	EOF
-	check_sparse_index_behavior
+		write_script .git/hooks/fsmonitor-test <<-\EOF &&
+			printf "last_update_token\0"
+			printf "dir1/modified\0"
+		EOF
+		check_sparse_index_behavior ! &&
+
+		git -C sparse sparse-checkout add dir1a &&
+
+		for repo in full sparse
+		do
+			cp -r $repo/dir1 $repo/dir1a &&
+			git -C $repo add dir1a &&
+			git -C $repo commit -m "add dir1a" || return 1
+		done &&
+		git -C sparse sparse-checkout set dir1 dir2 &&
+
+		# This one modifies outside the sparse-checkout definition
+		# and hence we expect to expand the sparse-index.
+		write_script .git/hooks/fsmonitor-test <<-\EOF &&
+			printf "last_update_token\0"
+			printf "dir1a/modified\0"
+		EOF
+		check_sparse_index_behavior
+	)
 '
 
 test_done
-- 
2.33.0.358.g803110d36e


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                     ` (4 preceding siblings ...)
  2021-08-26 21:00   ` [PATCH v2 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
@ 2021-08-26 21:00   ` SZEDER Gábor
  2021-08-31 14:47   ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX Derrick Stolee
  6 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2021-08-26 21:00 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy, SZEDER Gábor

Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
split index feature and trigger index splitting (mostly) randomly.
Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing
"link" index ext with null base oid, 2019-02-13), and
GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since
then.

This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't
restore the pre-6e37c8ed3c behavior.  To understand the bug, the fix,
and the behavior change we first have to look at how
GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c:

  There are two places where we check the value of
  GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this:

    1) In the lower-level do_write_index(), where, if
       GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
       This call merely allocates and zero-initializes
       'istate->split_index', but does nothing else (i.e. doesn't fill
       the base/shared index with cache entries, doesn't actually
       write a shared index file, etc.).  Pertinent to this issue, the
       hash of the base index remains all zeroed out.

    2) In the higher-level write_locked_index(), but only when
       'istate->split_index' has already been initialized.  Then, if
       GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that
       triggers index splitting later in this function.  This
       randomness comes from the first byte of the hash of the base
       index via an 'if ((first_byte & 15) < 6)' condition.

       However, if 'istate->split_index' hasn't been initialized (i.e.
       it is still NULL), then write_locked_index() just calls
       do_write_locked_index(), which internally calls the above
       mentioned do_write_index().

  This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
  splitting randomly, the first two index writes were always
  deterministic (though I suspect this was unintentional):

    - The initial index write never splits the index.
      During the first index write write_locked_index() is called with
      'istate->split_index' still uninitialized, so the check in 2) is
      not executed.  It still calls do_write_index(), though, which
      then executes the check in 1).  The resulting all zero base
      index hash then leads to the 'link' extension being written to
      '.git/index', though a shared index file is not written:

        $ rm .git/index
        $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
        $ test-tool dump-split-index .git/index
        own c6ef71168597caec8553c83d9d0048f1ef416170
        base 0000000000000000000000000000000000000000
        100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
        replacements:
        deletions:
        $ ls -l .git/sharedindex.*
        ls: cannot access '.git/sharedindex.*': No such file or directory

    - The second index write always splits the index.
      When the index written in the previous point is read,
      'istate->split_index' is initialized because of the presence of
      the 'link' extension.  So during the second write
      write_locked_index() does run the check in 2), and the first
      byte of the all zero base index hash always fulfills the
      randomness condition, which in turn always triggers the index
      splitting.

    - Subsequent index writes will find the 'link' extension with a
      real non-zero base index hash, so from then on the check in 2)
      is executed and the first byte of the base index hash is as
      random as it gets (coming from the SHA-1 of index data including
      timestamps and inodes...).

All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:

  $ rm .git/index
  $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
  $ test-tool dump-split-index .git/index
  own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
  not a split index
  $ ls -l .git/sharedindex.*
  ls: cannot access '.git/sharedindex.*': No such file or directory

So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't
write a 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.

Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary.  The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it.  Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).

Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write.  Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here.  The new behaviour did
trigger test failures in 't0003-attributes.sh' and 't1600-index.sh',
though, which have been fixed in preparatory patches in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 read-cache.c           | 23 ++++++++++++++---------
 t/t1700-split-index.sh | 11 +++++++++++
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index fbd59886a3..335242c1cf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2824,11 +2824,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		}
 	}
 
-	if (!istate->version) {
+	if (!istate->version)
 		istate->version = get_index_format_default(the_repository);
-		if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
-			init_split_index(istate);
-	}
 
 	/* demote version 3 to version 2 when the latter suffices */
 	if (istate->version == 3 || istate->version == 2)
@@ -3255,7 +3252,7 @@ static int too_many_not_shared_entries(struct index_state *istate)
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		       unsigned flags)
 {
-	int new_shared_index, ret;
+	int new_shared_index, ret, test_split_index_env;
 	struct split_index *si = istate->split_index;
 
 	if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
@@ -3270,7 +3267,10 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	if (istate->fsmonitor_last_update)
 		fill_fsmonitor_bitmap(istate);
 
-	if (!si || alternate_index_output ||
+	test_split_index_env = git_env_bool("GIT_TEST_SPLIT_INDEX", 0);
+
+	if ((!si && !test_split_index_env) ||
+	    alternate_index_output ||
 	    (istate->cache_changed & ~EXTMASK)) {
 		if (si)
 			oidclr(&si->base_oid);
@@ -3278,10 +3278,15 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		goto out;
 	}
 
-	if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) {
-		int v = si->base_oid.hash[0];
-		if ((v & 15) < 6)
+	if (test_split_index_env) {
+		if (!si) {
+			si = init_split_index(istate);
 			istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		} else {
+			int v = si->base_oid.hash[0];
+			if ((v & 15) < 6)
+				istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		}
 	}
 	if (too_many_not_shared_entries(istate))
 		istate->cache_changed |= SPLIT_INDEX_ORDERED;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index e2aa0bd949..decd2527ed 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -533,4 +533,15 @@ test_expect_success 'reading split index at alternate location' '
 	test_cmp expect actual
 '
 
+test_expect_success 'GIT_TEST_SPLIT_INDEX works' '
+	git init git-test-split-index &&
+	(
+		cd git-test-split-index &&
+		>file &&
+		GIT_TEST_SPLIT_INDEX=1 git update-index --add file &&
+		ls -l .git/sharedindex.* >actual &&
+		test_line_count = 1 actual
+	)
+'
+
 test_done
-- 
2.33.0.358.g803110d36e


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX
  2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
                     ` (5 preceding siblings ...)
  2021-08-26 21:00   ` [PATCH v2 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
@ 2021-08-31 14:47   ` Derrick Stolee
  2021-08-31 17:38     ` Junio C Hamano
  6 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2021-08-31 14:47 UTC (permalink / raw)
  To: SZEDER Gábor, git
  Cc: Thomas Gummerer, Nguyễn Thái Ngọc Duy

On 8/26/2021 4:59 PM, SZEDER Gábor wrote:
> To recap from v1's cover letter:
> 
> Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
> split index feature and trigger index splitting (mostly) randomly.
> Alas, this has been broken for ~2.5 years, and it hasn't triggered any
> index splitting since then.
> 
> The last patch in this series makes GIT_TEST_SPLIT_INDEX=1 work again,
> although it slightly changes its behavior; see its log message for all
> the details.

I checked the range-diff and reread the patches. This version looks
good to me. Thanks for pointing out the new failures that were
happening in my patches. I think I fixed them in my latest versions
of the sparse-index work, so the latest 'seen' should pass with these
changes to GIT_TEST_SPLIT_INDEX.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX
  2021-08-31 14:47   ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX Derrick Stolee
@ 2021-08-31 17:38     ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2021-08-31 17:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, git, Thomas Gummerer,
	Nguyễn Thái Ngọc Duy

Derrick Stolee <stolee@gmail.com> writes:

> On 8/26/2021 4:59 PM, SZEDER Gábor wrote:
>> To recap from v1's cover letter:
>> 
>> Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the
>> split index feature and trigger index splitting (mostly) randomly.
>> Alas, this has been broken for ~2.5 years, and it hasn't triggered any
>> index splitting since then.
>> 
>> The last patch in this series makes GIT_TEST_SPLIT_INDEX=1 work again,
>> although it slightly changes its behavior; see its log message for all
>> the details.
>
> I checked the range-diff and reread the patches. This version looks
> good to me. Thanks for pointing out the new failures that were
> happening in my patches. I think I fixed them in my latest versions
> of the sparse-index work, so the latest 'seen' should pass with these
> changes to GIT_TEST_SPLIT_INDEX.
>
> Thanks,
> -Stolee


This, at least the t7519-status-fsmonitor.sh part, seems to depend
on the 'ds/sparse-index-ignored-files' topic, so I'll prepare a
merge of the topic on top of 'master' then queue these with your
Acked-by.

Thanks, both.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-08-31 17:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 17:49 [PATCH 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
2021-08-17 17:49 ` [PATCH 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
2021-08-17 18:12   ` Derrick Stolee
2021-08-17 18:39     ` SZEDER Gábor
2021-08-17 18:48       ` Derrick Stolee
2021-08-17 17:49 ` [PATCH 2/6] t1600-index: don't run git commands upstream of a pipe SZEDER Gábor
2021-08-17 17:49 ` [PATCH 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX SZEDER Gábor
2021-08-17 17:49 ` [PATCH 4/6] read-cache: look for shared index files next to the index, too SZEDER Gábor
2021-08-17 17:49 ` [PATCH 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
2021-08-17 18:26   ` Derrick Stolee
2021-08-17 21:32     ` SZEDER Gábor
2021-08-18 18:51       ` Derrick Stolee
2021-08-17 17:49 ` [PATCH 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
2021-08-17 18:29   ` Derrick Stolee
2021-08-26 20:59 ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
2021-08-26 20:59   ` [PATCH v2 1/6] t1600-index: remove unnecessary redirection SZEDER Gábor
2021-08-26 21:00   ` [PATCH v2 2/6] t1600-index: don't run git commands upstream of a pipe SZEDER Gábor
2021-08-26 21:00   ` [PATCH v2 3/6] t1600-index: disable GIT_TEST_SPLIT_INDEX SZEDER Gábor
2021-08-26 21:00   ` [PATCH v2 4/6] read-cache: look for shared index files next to the index, too SZEDER Gábor
2021-08-26 21:00   ` [PATCH v2 5/6] tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests SZEDER Gábor
2021-08-26 21:00   ` [PATCH v2 6/6] read-cache: fix GIT_TEST_SPLIT_INDEX SZEDER Gábor
2021-08-31 14:47   ` [PATCH v2 0/6] Fix GIT_TEST_SPLIT_INDEX Derrick Stolee
2021-08-31 17:38     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).