git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Sparse Index: Integrate with 'git add'
@ 2021-07-21 21:06 Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee

This patch series re-submits the 'git add' integration with sparse-index.
The performance gains are the same as before.

It is based on ds/commit-and-checkout-with-sparse-index.

This series was delayed from its initial submission for a couple reasons.

The first was because it was colliding with some changes in
mt/add-rm-in-sparse-checkout, so now we are far enough along that that
branch is in our history and we can work forwards.

The other concern was about how 'git add ' should respond when a path
outside of the sparse-checkout cone exists. One recommendation (that I am
failing to find a link to the message, sorry) was to disallow adding files
that would become index entries with SKIP_WORKTREE on. However, as I worked
towards that goal I found that change would cause problems for a realistic
scenario: merge conflicts outside of the sparse-checkout cone.

The first patch of this series adds tests that create merge conflicts
outside of the sparse cone and then presents different ways a user could
resolve the situation. We want all of them to be feasible, and this
includes:

 1. Reverting the file to a known version in history.
 2. Adding the file with its contents on disk.
 3. Moving the file to a new location in the sparse directory.

Without maintaining the behavior of adding files outside of the
sparse-checkout cone, we risk confusing users who get into this state. The
only workaround they would have is to modify their sparse-checkout
definition which might lead to expanding much more data than they need to
resolve the conflicts.

For that reason, I stopped trying to limit 'git add' to be within the cone.
I'm open to suggestions here, but we need an approach that works for
out-of-cone conflicts.

The one place I did continue to update is 'git add --refresh ' to match the
behavior added by mt/add-rm-in-sparse-checkout which outputs an error
message. This happens even when the file exists in the working directory,
but that seems appropriate enough.

Thanks, -Stolee

Derrick Stolee (5):
  t1092: test merge conflicts outside cone
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  t1092: 'git add --refresh' difference with sparse-index
  add: ignore outside the sparse-checkout in refresh()

 builtin/add.c                            | 13 ++++-
 pathspec.c                               |  2 -
 t/t1092-sparse-checkout-compatibility.sh | 71 ++++++++++++++++++++----
 3 files changed, 72 insertions(+), 14 deletions(-)


base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/999
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-23 17:34   ` Elijah Newren
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 37 ++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 91e30d6ec22..a3c01d588d8 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -114,6 +114,16 @@ test_expect_success 'setup' '
 		git add . &&
 		git commit -m "file to dir" &&
 
+		for side in left right
+		do
+			git checkout -b merge-$side base &&
+			echo $side >>deep/deeper2/a &&
+			echo $side >>folder1/a &&
+			echo $side >>folder2/a &&
+			git add . &&
+			git commit -m "$side" || return 1
+		done &&
+
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
 		git commit -a -m "update deepest" &&
@@ -482,6 +492,33 @@ test_expect_success 'merge' '
 	test_all_match git rev-parse HEAD^{tree}
 '
 
+test_expect_success 'merge with conflict outside cone' '
+	init_repos &&
+
+	test_all_match git checkout -b merge-tip merge-left &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match test_must_fail git merge -m merge merge-right &&
+	test_all_match git status --porcelain=v2 &&
+
+	# resolve the conflict in different ways:
+	# 1. revert to the base
+	test_all_match git checkout base -- deep/deeper2/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 2. add the file with conflict markers
+	test_all_match git add folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 3. rename the file to another sparse filename
+	run_on_all mv folder2/a folder2/z &&
+	test_all_match git add folder2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git merge --continue &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
 test_expect_success 'merge with outside renames' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-21 22:19   ` Junio C Hamano
  2021-07-23 17:45   ` Elijah Newren
  2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%

While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 14 ++++++--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index b773b5a4993..c76e6ddd359 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index a3c01d588d8..a11d9d7f35d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
 
 	test_sparse_match git status --porcelain=v2 &&
 
-	# This "git add folder1/a" fails with a warning
-	# in the sparse repos, differing from the full
-	# repo. This is intentional.
-	test_sparse_match test_must_fail git add folder1/a &&
-	test_sparse_match test_must_fail git add --refresh folder1/a &&
-	test_all_match git status --porcelain=v2 &&
-
 	test_all_match git add . &&
 	test_all_match git status --porcelain=v2 &&
 	test_all_match git commit -m folder1/new &&
@@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
 	git -C sparse-index reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
 	git -C sparse-index reset --hard &&
-	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
+	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
+
+	echo >>sparse-index/README.md &&
+	ensure_not_expanded add -A &&
+	echo >>sparse-index/extra.txt &&
+	ensure_not_expanded add extra.txt
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/5] pathspec: stop calling ensure_full_index
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-23 18:17   ` Elijah Newren
  2021-07-21 21:06 ` [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%

While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 7 +++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 08f8d3eedc3..44306fdaca2 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index a11d9d7f35d..f9e2f5f4aa1 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
 test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
-	# adding a "missing" file outside the cone should fail
-	test_sparse_match test_must_fail git add folder1/a &&
-
 	# folder1 is at HEAD, but outside the sparse cone
 	run_on_sparse mkdir folder1 &&
 	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
@@ -633,7 +630,9 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/README.md &&
 	ensure_not_expanded add -A &&
 	echo >>sparse-index/extra.txt &&
-	ensure_not_expanded add extra.txt
+	ensure_not_expanded add extra.txt &&
+	echo >>sparse-index/untracked.txt &&
+	ensure_not_expanded add .
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When running 'git add --refresh <path>', Git does not actually stage the
change, but instead updates the index based on the stat() information in
the working directory.

This typically does not make sense in a sparse-checkout scenario, where
Git wants this file to not exist. However, sometimes the file can exist
on-disk for other reasons, such as a user manually adding the file or a
merge conflict outside of the sparse cone.

Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output an advice
message to indicate that this is not allowed when <path> is outside the
sparse cone. The check goes around the sparse index protections at the
moment, so it does not find a match when contained in a sparse directory
entry. We will update this behavior with a later change, but want to
demonstrate the failure now.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f9e2f5f4aa1..73c48a71d89 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -347,6 +347,25 @@ test_expect_success 'status/add: outside sparse cone' '
 	test_all_match git commit -m folder1/newer
 '
 
+test_expect_failure 'add: pathspec within sparse directory' '
+	init_repos &&
+
+	run_on_sparse mkdir folder1 &&
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	#
+	# However, in the sparse-index, folder1/a does not
+	# match any cache entry and fails with a different
+	# error message. This needs work.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 5/5] add: ignore outside the sparse-checkout in refresh()
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-07-21 21:06 ` [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-23 19:46   ` Elijah Newren
  2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  6 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output a warning
message when the path is outside the sparse-checkout definition. The
implementation of this warning happened in parallel with the
sparse-index work to add ensure_full_index() calls throughout the
codebase.

Update this loop to have the proper logic that checks to see if the
pathspec is outside the sparse-checkout definition. This avoids the need
to expand the sparse directory entry and determine if the path is
tracked, untracked, or ignored. We simply avoid updating the stat()
information because there isn't even an entry that matches the path!

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            | 10 +++++++++-
 t/t1092-sparse-checkout-compatibility.sh |  6 +-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c76e6ddd359..d512ece655b 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
 	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 	int flags = REFRESH_IGNORE_SKIP_WORKTREE |
 		    (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
+	struct pattern_list pl = { 0 };
+	int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
 
 	seen = xcalloc(pathspec->nr, 1);
 	refresh_index(&the_index, flags, pathspec, seen,
 		      _("Unstaged changes after refreshing the index:"));
 	for (i = 0; i < pathspec->nr; i++) {
 		if (!seen[i]) {
-			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+			const char *path = pathspec->items[i].original;
+			int dtype = DT_REG;
+
+			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
+			    (sparse_checkout_enabled &&
+			     !path_matches_pattern_list(path, strlen(path), NULL,
+							&dtype, &pl, &the_index))) {
 				string_list_append(&only_match_skip_worktree,
 						   pathspec->items[i].original);
 			} else {
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 73c48a71d89..c61424e2074 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -347,7 +347,7 @@ test_expect_success 'status/add: outside sparse cone' '
 	test_all_match git commit -m folder1/newer
 '
 
-test_expect_failure 'add: pathspec within sparse directory' '
+test_expect_success 'add: pathspec within sparse directory' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -357,10 +357,6 @@ test_expect_failure 'add: pathspec within sparse directory' '
 	# This "git add folder1/a" fails with a warning
 	# in the sparse repos, differing from the full
 	# repo. This is intentional.
-	#
-	# However, in the sparse-index, folder1/a does not
-	# match any cache entry and fails with a different
-	# error message. This needs work.
 	test_sparse_match test_must_fail git add folder1/a &&
 	test_sparse_match test_must_fail git add --refresh folder1/a &&
 	test_all_match git status --porcelain=v2
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-07-21 22:19   ` Junio C Hamano
  2021-07-21 22:50     ` Derrick Stolee
  2021-07-23 17:45   ` Elijah Newren
  1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-07-21 22:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, matheus.bernardino, stolee, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Disable command_requires_full_index for 'git add'. This does not require
> any additional removals of ensure_full_index(). The main reason is that
> 'git add' discovers changes based on the pathspec and the worktree
> itself. These are then inserted into the index directly, and calls to
> index_name_pos() or index_file_exists() already call expand_to_path() at
> the appropriate time to support a sparse-index.

OK.  With that explained, it still is quite surprising that we only
need this change (eh, rather, doing this change is safe without
doing anything else).

> -	# This "git add folder1/a" fails with a warning
> -	# in the sparse repos, differing from the full
> -	# repo. This is intentional.
> -	test_sparse_match test_must_fail git add folder1/a &&
> -	test_sparse_match test_must_fail git add --refresh folder1/a &&
> -	test_all_match git status --porcelain=v2 &&

And nice to see a known limitation lifted.

>  	test_all_match git add . &&
>  	test_all_match git status --porcelain=v2 &&
>  	test_all_match git commit -m folder1/new &&
> @@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
>  	git -C sparse-index reset --hard &&
>  	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>  	git -C sparse-index reset --hard &&
> -	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
> +	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
> +
> +	echo >>sparse-index/README.md &&
> +	ensure_not_expanded add -A &&
> +	echo >>sparse-index/extra.txt &&
> +	ensure_not_expanded add extra.txt
>  '
>  
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 22:19   ` Junio C Hamano
@ 2021-07-21 22:50     ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-07-21 22:50 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, matheus.bernardino, Derrick Stolee, Derrick Stolee

On 7/21/21 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Disable command_requires_full_index for 'git add'. This does not require
>> any additional removals of ensure_full_index(). The main reason is that
>> 'git add' discovers changes based on the pathspec and the worktree
>> itself. These are then inserted into the index directly, and calls to
>> index_name_pos() or index_file_exists() already call expand_to_path() at
>> the appropriate time to support a sparse-index.
> 
> OK.  With that explained, it still is quite surprising that we only
> need this change (eh, rather, doing this change is safe without
> doing anything else).

Yes, all of the hard work was done by the earlier work to expand
a sparse index when we search for a specific path that lands
within a sparse directory. See 95e0321 (read-cache: expand on query
into sparse-directory entry, 2021-04-01) for the specifics.

>> -	# This "git add folder1/a" fails with a warning
>> -	# in the sparse repos, differing from the full
>> -	# repo. This is intentional.
>> -	test_sparse_match test_must_fail git add folder1/a &&
>> -	test_sparse_match test_must_fail git add --refresh folder1/a &&
>> -	test_all_match git status --porcelain=v2 &&
> 
> And nice to see a known limitation lifted.

Thank you for pointing this out. This actually starts to _fail_ now
that we allow sparse indexes in 'git add', but it's because the error
messages don't match, not that the 'test_must_fail' is violated.

Patch 4 adds a similar test that is then set to work in patch 5. That
allows us a clear way to describe the behavior change and to motivate
the fix in patch 5. This could be explained better, perhaps by merging
Patch 4 into this one. That helps describe how this specific case
changes behavior (for the worse) in this patch, but is handled in a
careful way later, once the behavior change is documented.

If there is a better way to reorganize these patches, then I could
try another approach.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/5] Sparse Index: Integrate with 'git add'
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
@ 2021-07-23 12:51 ` Elijah Newren
  2021-07-23 20:10   ` Elijah Newren
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  6 siblings, 1 reply; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 12:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

Hi,

I haven't read the series yet (I'm going to try to read them all
today), but wanted to comment on a couple things in your cover
letter...

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series re-submits the 'git add' integration with sparse-index.
> The performance gains are the same as before.
>
> It is based on ds/commit-and-checkout-with-sparse-index.
>
> This series was delayed from its initial submission for a couple reasons.
>
> The first was because it was colliding with some changes in
> mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> branch is in our history and we can work forwards.
>
> The other concern was about how 'git add ' should respond when a path
> outside of the sparse-checkout cone exists. One recommendation (that I am
> failing to find a link to the message, sorry) was to disallow adding files
> that would become index entries with SKIP_WORKTREE on. However, as I worked

I think the recommendation was:
  * Permitted: running git add on tracked files with the SKIP_WORKTREE
bit *clear*
  * Disallowed: running git add on tracked files with the
SKIP_WORKTREE bit *set*
  * Disallowed: running git add on untracked files that would become
index entries with the SKIP_WORKTREE bit set

where the latter two exit with an error message that suggests changing
the sparsity specification first.  I think this is what has existed
for some time, other than Matheus adding some error messages to help
the user when their add command doesn't match anything otherwise.

> towards that goal I found that change would cause problems for a realistic
> scenario: merge conflicts outside of the sparse-checkout cone.

...which wouldn't be a problem because these are tracked files whose
SKIP_WORKTREE bit was cleared by the merge machinery (when it marked
them as conflicted and wrote them to the working tree).

> The first patch of this series adds tests that create merge conflicts
> outside of the sparse cone and then presents different ways a user could
> resolve the situation. We want all of them to be feasible, and this
> includes:
>
>  1. Reverting the file to a known version in history.
>  2. Adding the file with its contents on disk.
>  3. Moving the file to a new location in the sparse directory.
>
> Without maintaining the behavior of adding files outside of the
> sparse-checkout cone, we risk confusing users who get into this state. The
> only workaround they would have is to modify their sparse-checkout
> definition which might lead to expanding much more data than they need to
> resolve the conflicts.
>
> For that reason, I stopped trying to limit 'git add' to be within the cone.
> I'm open to suggestions here, but we need an approach that works for
> out-of-cone conflicts.

I believe my above suggestion works for out-of-cone conflicts.  Some
important other details to keep in mind in regards to how we make add
behave:

* We don't want "git add -A [GLOB_OR_DIRECTORY]" to accidentally be
treated as a directive to remove files from the repository (and
naively noticing that SKIP_WORKTREE files are missing but attempting
to operate on them anyway would give this problematic result).
* We don't want "git rm [GLOB_OR_DIRECTORY]" to nuke SKIP_WORKTREE
files; it should only operate on files that are present.
* We need add and rm to be consistent with each other in terms of how
SKIP_WORKTREE files and the sparsity cone are treated

and more generally about not-SKIP_WORKTREE-despite-not-matching-sparsity-paths:

* These files that aren't SKIP_WORKTREE but normally would be are
prone to "disappear" at some random later time after they are made to
match the index.  The disappearing can happen either with an explicit
"git sparse-checkout reapply" (which is fine since it was explicit) or
as a side-effect of various other commands that run through
unpack_trees() since it attempts to heed the sparsity rules.  Users
tend to get confused by the latter case; they'll understand at some
point that it was because the file was outside the sparsity paths, but
the randomness in when it's pulled out as a side-effect of other
commands can be slightly jarring.  So, I'd like to avoid that where we
can easily do so, which I think the recommendation above does.  (As a
side note to these kinds of cases, maybe we want to consider having
the sparsification logic in unpack_trees() first check whether paths
being removed from the working tree and having their SKIP_WORKTREE bit
set not only match the index but also match HEAD?  That'd be a bit
more expensive to check in the sparsification paths, and I'm not sure
they all have the relevant information, but it's an idea...)

> The one place I did continue to update is 'git add --refresh ' to match the
> behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> message. This happens even when the file exists in the working directory,
> but that seems appropriate enough.
>
> Thanks, -Stolee
>
> Derrick Stolee (5):
>   t1092: test merge conflicts outside cone
>   add: allow operating on a sparse-only index
>   pathspec: stop calling ensure_full_index
>   t1092: 'git add --refresh' difference with sparse-index
>   add: ignore outside the sparse-checkout in refresh()
>
>  builtin/add.c                            | 13 ++++-
>  pathspec.c                               |  2 -
>  t/t1092-sparse-checkout-compatibility.sh | 71 ++++++++++++++++++++----
>  3 files changed, 72 insertions(+), 14 deletions(-)
>
>
> base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/999
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
@ 2021-07-23 17:34   ` Elijah Newren
  2021-07-23 17:44     ` Eric Sunshine
  2021-07-26 14:10     ` Derrick Stolee
  0 siblings, 2 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 17:34 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 37 ++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 91e30d6ec22..a3c01d588d8 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -114,6 +114,16 @@ test_expect_success 'setup' '
>                 git add . &&
>                 git commit -m "file to dir" &&
>
> +               for side in left right
> +               do
> +                       git checkout -b merge-$side base &&
> +                       echo $side >>deep/deeper2/a &&
> +                       echo $side >>folder1/a &&
> +                       echo $side >>folder2/a &&
> +                       git add . &&
> +                       git commit -m "$side" || return 1

Why is this "|| return 1" here?

It looks like there are a number of other cases of this in the file
too, which I must have overlooked previously, because I don't
understand any of them.

> +               done &&
> +
>                 git checkout -b deepest base &&
>                 echo "updated deepest" >deep/deeper1/deepest/a &&
>                 git commit -a -m "update deepest" &&
> @@ -482,6 +492,33 @@ test_expect_success 'merge' '
>         test_all_match git rev-parse HEAD^{tree}
>  '
>
> +test_expect_success 'merge with conflict outside cone' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b merge-tip merge-left &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match test_must_fail git merge -m merge merge-right &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # resolve the conflict in different ways:
> +       # 1. revert to the base
> +       test_all_match git checkout base -- deep/deeper2/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # 2. add the file with conflict markers
> +       test_all_match git add folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # 3. rename the file to another sparse filename

But...that doesn't resolve the conflict.  Shouldn't this be titled
"accept the conflict & rename the file elsewhere"?

> +       run_on_all mv folder2/a folder2/z &&
> +       test_all_match git add folder2 &&

'mv' rather than 'git mv', then followed by 'git add'?  Any reason for
this order rather than git add followed by git mv?

Also, if you really do want to move first, did you use mv instead of
"git mv" due to the latter's shortcoming of only operating on stage 0?
(https://lore.kernel.org/git/CABPp-BGJdwpwhQUp4Wa4bKBp4hQFB9OM3N1FXH7SzY0mvLDa7Q@mail.gmail.com/)

Regardless of order, though, I still think mv or add should require a
--force to rename or add a file outside the sparsity paths given the
deferred negative surprises for users around such files.  (Or come up
with a solid way to remove those surprises.)

> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git merge --continue &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git rev-parse HEAD^{tree}
> +'
> +
>  test_expect_success 'merge with outside renames' '
>         init_repos &&

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-23 17:34   ` Elijah Newren
@ 2021-07-23 17:44     ` Eric Sunshine
  2021-07-23 17:47       ` Elijah Newren
  2021-07-26 14:10     ` Derrick Stolee
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Sunshine @ 2021-07-23 17:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Jul 23, 2021 at 1:34 PM Elijah Newren <newren@gmail.com> wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > +               for side in left right
> > +               do
> > +                       git checkout -b merge-$side base &&
> > +                       echo $side >>deep/deeper2/a &&
> > +                       echo $side >>folder1/a &&
> > +                       echo $side >>folder2/a &&
> > +                       git add . &&
> > +                       git commit -m "$side" || return 1
>
> Why is this "|| return 1" here?
>
> It looks like there are a number of other cases of this in the file
> too, which I must have overlooked previously, because I don't
> understand any of them.

A shell for-loop won't automatically terminate just because some
command in its body fails. Instead it will run to completion and
return the status of the last command of the last iteration, which may
not be the iteration which failed, thus a failure can be hidden.
Therefore, we need to proactively stop the loop iteration _and_ ensure
that the return status of the loop itself reflects the failure, which
we do by `|| return 1`. (If this loop was inside a subshell, we'd use
`|| exit 1` instead.)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
  2021-07-21 22:19   ` Junio C Hamano
@ 2021-07-23 17:45   ` Elijah Newren
  2021-07-26 13:11     ` Derrick Stolee
  2021-07-26 13:33     ` Derrick Stolee
  1 sibling, 2 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 17:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Disable command_requires_full_index for 'git add'. This does not require
> any additional removals of ensure_full_index(). The main reason is that
> 'git add' discovers changes based on the pathspec and the worktree
> itself. These are then inserted into the index directly, and calls to
> index_name_pos() or index_file_exists() already call expand_to_path() at
> the appropriate time to support a sparse-index.

Nice.

> Add a test to check that 'git add -A' and 'git add <file>' does not
> expand the index at all, as long as <file> is not within a sparse
> directory. This does not help the global 'git add .' case.

Good idea.

> We can measure the improvement using p2000-sparse-operations.sh with
> these results:
>
> Test                                  HEAD~1           HEAD
> ------------------------------------------------------------------------------
> 2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
> 2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
> 2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
> 2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%
>
> While the 91% improvement seems impressive, it's important to recognize
> that previously we had significant overhead for expanding the
> sparse-index. Comparing to the full index case, 'git add -A' goes from
> 0.37s to 0.05s, which is "only" an 86% improvement.

Hehe.  Yep, it's so "disappointing" to "only" have the code be 7x faster.  :-)

Out of curiosity, IIRC any operation involving the index took ~10s on
some of the Microsoft repos.  What does the speedup look like over
there for these changes to git-add?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c                            |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 14 ++++++--------
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index b773b5a4993..c76e6ddd359 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>         add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
>         require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
>
>         /*
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a3c01d588d8..a11d9d7f35d 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
>
>         test_sparse_match git status --porcelain=v2 &&
>
> -       # This "git add folder1/a" fails with a warning
> -       # in the sparse repos, differing from the full
> -       # repo. This is intentional.
> -       test_sparse_match test_must_fail git add folder1/a &&
> -       test_sparse_match test_must_fail git add --refresh folder1/a &&
> -       test_all_match git status --porcelain=v2 &&
> -

Why was this chunk removed?  Nothing in the commit message mentions
this, and it's not clear to me the reason for it.

I tried adding it back in at the end of the series and it still works
(and further I can't change test_sparse_match to test_all_match and
have the test work).

>         test_all_match git add . &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git commit -m folder1/new &&
> @@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
>         git -C sparse-index reset --hard &&
>         ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>         git -C sparse-index reset --hard &&
> -       ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
> +       ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
> +
> +       echo >>sparse-index/README.md &&
> +       ensure_not_expanded add -A &&
> +       echo >>sparse-index/extra.txt &&
> +       ensure_not_expanded add extra.txt

...and here's the extra test you mentioned in the commit message.  Looks good.

>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-23 17:44     ` Eric Sunshine
@ 2021-07-23 17:47       ` Elijah Newren
  0 siblings, 0 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 17:47 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Jul 23, 2021 at 10:44 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, Jul 23, 2021 at 1:34 PM Elijah Newren <newren@gmail.com> wrote:
> > On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > > +               for side in left right
> > > +               do
> > > +                       git checkout -b merge-$side base &&
> > > +                       echo $side >>deep/deeper2/a &&
> > > +                       echo $side >>folder1/a &&
> > > +                       echo $side >>folder2/a &&
> > > +                       git add . &&
> > > +                       git commit -m "$side" || return 1
> >
> > Why is this "|| return 1" here?
> >
> > It looks like there are a number of other cases of this in the file
> > too, which I must have overlooked previously, because I don't
> > understand any of them.
>
> A shell for-loop won't automatically terminate just because some
> command in its body fails. Instead it will run to completion and
> return the status of the last command of the last iteration, which may
> not be the iteration which failed, thus a failure can be hidden.
> Therefore, we need to proactively stop the loop iteration _and_ ensure
> that the return status of the loop itself reflects the failure, which
> we do by `|| return 1`. (If this loop was inside a subshell, we'd use
> `|| exit 1` instead.)

Ah, thanks for the explanation.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] pathspec: stop calling ensure_full_index
  2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-07-23 18:17   ` Elijah Newren
  0 siblings, 0 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 18:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The add_pathspec_matches_against_index() focuses on matching a pathspec
> to file entries in the index. This already works correctly for its only
> use: checking if untracked files exist in the index.
>
> The compatibility checks in t1092 already test that 'git add <dir>'
> works for a directory outside of the sparse cone. That provides coverage
> for removing this guard.
>
> This finalizes our ability to run 'git add .' without expanding a sparse
> index to a full one. This is evidenced by an update to t1092 and by
> these performance numbers for p2000-sparse-operations.sh:
>
> Test                                    HEAD~1            HEAD
> --------------------------------------------------------------------------------
> 2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
> 2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
> 2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
> 2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%
>
> While the ~90% improvement is shown by the test results, it is worth
> noting that expanding the sparse index was adding overhead in previous
> commits. Comparing to the full index case, we see the performance go
> from 0.33s to 0.05s, an 85% improvement.

These perf improvements are pretty sweet.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  pathspec.c                               | 2 --
>  t/t1092-sparse-checkout-compatibility.sh | 7 +++----
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index 08f8d3eedc3..44306fdaca2 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>                 if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a11d9d7f35d..f9e2f5f4aa1 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
>  test_expect_success 'status/add: outside sparse cone' '
>         init_repos &&
>
> -       # adding a "missing" file outside the cone should fail
> -       test_sparse_match test_must_fail git add folder1/a &&
> -

So this is removed because of non-matching errors.  In particular,
sparse-checkout shows
"""
The following pathspecs didn't match any eligible path, but they do match index
entries outside the current sparse checkout:
folder1/a
hint: Disable or modify the sparsity rules if you intend to update such entries.
hint: Disable this message with "git config advice.updateSparsePath false"
"""
while sparse-index now shows:
"""
fatal: pathspec 'folder1/a' did not match any files
"""

The new error seems entirely reasonable to me.  No objection here.


But allow me to go on a bit of a diversion...

If we modify this setup slightly by running:

$ mkdir folder1
$ echo garbage >folder1/a
$ git add folder1/a

Then you'll get the first of those errors in both the sparse-index and
the sparse-checkout.  I also like this behavior.

If you unset the SKIP_WORKTREE bit manually, and then add:

$ git update-index --no-skip-worktree folder1/a
$ git add folder1/a

Then the file is added with no error or warning.  I like this behavior too.

If you further change the setup with:

$ echo more garbage >folder1/z
$ git add folder1/z

Then you get no error, despite folder1/z being an untracked file
outside of sparsity paths.  No bueno.  :-(

>         # folder1 is at HEAD, but outside the sparse cone
>         run_on_sparse mkdir folder1 &&
>         cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> @@ -633,7 +630,9 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/README.md &&
>         ensure_not_expanded add -A &&
>         echo >>sparse-index/extra.txt &&
> -       ensure_not_expanded add extra.txt
> +       ensure_not_expanded add extra.txt &&
> +       echo >>sparse-index/untracked.txt &&
> +       ensure_not_expanded add .

:-)

>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --

So I added a lot of comments here, in part because I thought I'd test
a bit more of what I said in response to your cover letter and see how
close to it we were.  The patch in question looks fine.

I just added an aside as a convenient place to check whether the
behavior at the end of the series matches what you proposed in the
cover letter, or what I proposed in response.  It appears it matches
neither (though that's not due to this specific patch).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] add: ignore outside the sparse-checkout in refresh()
  2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
@ 2021-07-23 19:46   ` Elijah Newren
  0 siblings, 0 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 19:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
> entries, 2021-04-08), 'git add --refresh <path>' will output a warning
> message when the path is outside the sparse-checkout definition. The
> implementation of this warning happened in parallel with the
> sparse-index work to add ensure_full_index() calls throughout the
> codebase.
>
> Update this loop to have the proper logic that checks to see if the
> pathspec is outside the sparse-checkout definition. This avoids the need
> to expand the sparse directory entry and determine if the path is
> tracked, untracked, or ignored. We simply avoid updating the stat()
> information because there isn't even an entry that matches the path!
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c                            | 10 +++++++++-
>  t/t1092-sparse-checkout-compatibility.sh |  6 +-----
>  2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index c76e6ddd359..d512ece655b 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
>         struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
>         int flags = REFRESH_IGNORE_SKIP_WORKTREE |
>                     (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
> +       struct pattern_list pl = { 0 };
> +       int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
>
>         seen = xcalloc(pathspec->nr, 1);
>         refresh_index(&the_index, flags, pathspec, seen,
>                       _("Unstaged changes after refreshing the index:"));
>         for (i = 0; i < pathspec->nr; i++) {
>                 if (!seen[i]) {
> -                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> +                       const char *path = pathspec->items[i].original;
> +                       int dtype = DT_REG;
> +
> +                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
> +                           (sparse_checkout_enabled &&
> +                            !path_matches_pattern_list(path, strlen(path), NULL,
> +                                                       &dtype, &pl, &the_index))) {

I was slightly worried from the description in the commit message
about the case where you have a file without the SKIP_WORKTREE bit set
despite not matching sparsity paths.  I was worried that you'd skip
refreshing it, but I tweaked your testcases and couldn't trigger it.

>                                 string_list_append(&only_match_skip_worktree,
>                                                    pathspec->items[i].original);
>                         } else {
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 73c48a71d89..c61424e2074 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -347,7 +347,7 @@ test_expect_success 'status/add: outside sparse cone' '
>         test_all_match git commit -m folder1/newer
>  '
>
> -test_expect_failure 'add: pathspec within sparse directory' '
> +test_expect_success 'add: pathspec within sparse directory' '
>         init_repos &&
>
>         run_on_sparse mkdir folder1 &&
> @@ -357,10 +357,6 @@ test_expect_failure 'add: pathspec within sparse directory' '
>         # This "git add folder1/a" fails with a warning
>         # in the sparse repos, differing from the full
>         # repo. This is intentional.
> -       #
> -       # However, in the sparse-index, folder1/a does not
> -       # match any cache entry and fails with a different
> -       # error message. This needs work.
>         test_sparse_match test_must_fail git add folder1/a &&
>         test_sparse_match test_must_fail git add --refresh folder1/a &&
>         test_all_match git status --porcelain=v2
> --
> gitgitgadget

This and Patch 4/5 look good to me.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/5] Sparse Index: Integrate with 'git add'
  2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
@ 2021-07-23 20:10   ` Elijah Newren
  0 siblings, 0 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-23 20:10 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On Fri, Jul 23, 2021 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> I haven't read the series yet (I'm going to try to read them all
> today), but wanted to comment on a couple things in your cover
> letter...
>
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > This patch series re-submits the 'git add' integration with sparse-index.
> > The performance gains are the same as before.
> >
> > It is based on ds/commit-and-checkout-with-sparse-index.
> >
> > This series was delayed from its initial submission for a couple reasons.
> >
> > The first was because it was colliding with some changes in
> > mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> > branch is in our history and we can work forwards.
> >
> > The other concern was about how 'git add ' should respond when a path
> > outside of the sparse-checkout cone exists. One recommendation (that I am
> > failing to find a link to the message, sorry) was to disallow adding files
> > that would become index entries with SKIP_WORKTREE on. However, as I worked
>
> I think the recommendation was:
>   * Permitted: running git add on tracked files with the SKIP_WORKTREE
> bit *clear*
>   * Disallowed: running git add on tracked files with the
> SKIP_WORKTREE bit *set*
>   * Disallowed: running git add on untracked files that would become
> index entries with the SKIP_WORKTREE bit set
>
> where the latter two exit with an error message that suggests changing
> the sparsity specification first.  I think this is what has existed
> for some time, other than Matheus adding some error messages to help
> the user when their add command doesn't match anything otherwise.
>
> > towards that goal I found that change would cause problems for a realistic
> > scenario: merge conflicts outside of the sparse-checkout cone.
>
> ...which wouldn't be a problem because these are tracked files whose
> SKIP_WORKTREE bit was cleared by the merge machinery (when it marked
> them as conflicted and wrote them to the working tree).
>
> > The first patch of this series adds tests that create merge conflicts
> > outside of the sparse cone and then presents different ways a user could
> > resolve the situation. We want all of them to be feasible, and this
> > includes:
> >
> >  1. Reverting the file to a known version in history.
> >  2. Adding the file with its contents on disk.
> >  3. Moving the file to a new location in the sparse directory.
> >
> > Without maintaining the behavior of adding files outside of the
> > sparse-checkout cone, we risk confusing users who get into this state. The
> > only workaround they would have is to modify their sparse-checkout
> > definition which might lead to expanding much more data than they need to
> > resolve the conflicts.
> >
> > For that reason, I stopped trying to limit 'git add' to be within the cone.
> > I'm open to suggestions here, but we need an approach that works for
> > out-of-cone conflicts.
>
> I believe my above suggestion works for out-of-cone conflicts.  Some
> important other details to keep in mind in regards to how we make add
> behave:
>
> * We don't want "git add -A [GLOB_OR_DIRECTORY]" to accidentally be
> treated as a directive to remove files from the repository (and
> naively noticing that SKIP_WORKTREE files are missing but attempting
> to operate on them anyway would give this problematic result).
> * We don't want "git rm [GLOB_OR_DIRECTORY]" to nuke SKIP_WORKTREE
> files; it should only operate on files that are present.
> * We need add and rm to be consistent with each other in terms of how
> SKIP_WORKTREE files and the sparsity cone are treated
>
> and more generally about not-SKIP_WORKTREE-despite-not-matching-sparsity-paths:
>
> * These files that aren't SKIP_WORKTREE but normally would be are
> prone to "disappear" at some random later time after they are made to
> match the index.  The disappearing can happen either with an explicit
> "git sparse-checkout reapply" (which is fine since it was explicit) or
> as a side-effect of various other commands that run through
> unpack_trees() since it attempts to heed the sparsity rules.  Users
> tend to get confused by the latter case; they'll understand at some
> point that it was because the file was outside the sparsity paths, but
> the randomness in when it's pulled out as a side-effect of other
> commands can be slightly jarring.  So, I'd like to avoid that where we
> can easily do so, which I think the recommendation above does.  (As a
> side note to these kinds of cases, maybe we want to consider having
> the sparsification logic in unpack_trees() first check whether paths
> being removed from the working tree and having their SKIP_WORKTREE bit
> set not only match the index but also match HEAD?  That'd be a bit
> more expensive to check in the sparsification paths, and I'm not sure
> they all have the relevant information, but it's an idea...)
>
> > The one place I did continue to update is 'git add --refresh ' to match the
> > behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> > message. This happens even when the file exists in the working directory,
> > but that seems appropriate enough.

I've read through the patches now.  They look pretty good, and the
performance improvements are definitely appealing.

My comments on patch 1/5 probably reflect the differences in proposed
paths between your cover letter and my proposal.  My comments on patch
3/5 point out that the current behavior matches neither your
description in your cover letter (or my understanding thereof), nor
mine in my earlier response.  However, it also suggests that the
differences between my proposal and the code isn't due to your series
but a pre-existing issue, and thus wouldn't necessarily need to hold
your series up.

My comments on patch 2/5, asking about why you removed a hunk are thus
the biggest outstanding item for the series to move forward.

It would be nice for us to figure out and agree on the end goal for
add/rm/mv behavior, though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-23 17:45   ` Elijah Newren
@ 2021-07-26 13:11     ` Derrick Stolee
  2021-07-26 13:33     ` Derrick Stolee
  1 sibling, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-07-26 13:11 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 7/23/2021 1:45 PM, Elijah Newren wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>> Test                                  HEAD~1           HEAD
>> ------------------------------------------------------------------------------
>> 2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
>> 2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
>> 2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
>> 2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%
>>
>> While the 91% improvement seems impressive, it's important to recognize
>> that previously we had significant overhead for expanding the
>> sparse-index. Comparing to the full index case, 'git add -A' goes from
>> 0.37s to 0.05s, which is "only" an 86% improvement.
> 
> Hehe.  Yep, it's so "disappointing" to "only" have the code be 7x faster.  :-)
> 
> Out of curiosity, IIRC any operation involving the index took ~10s on
> some of the Microsoft repos.  What does the speedup look like over
> there for these changes to git-add?

The latest numbers I have for a repo with ~2 million tracked files is that
 index reads take about half a second (because of the threaded reads) and
writes take at least one second. There was a lot of work by Ben Peart, Jeff
Hostetler, and Kevin Willford to reduce this cost as much as possible a few
years ago. VFS for Git is still limited by this bottleneck, but Scalar's
use of sparse-checkout enables the use of the sparse index.

We have an experimental release [1] out to users right now, and I will
report to the mailing list about how that went after we get sufficient
adoption that the data can be significant. When focusing on individual
users I can find things like one user seeing "git commit" going from 4.3s
to 0.35s and "git add" going from 6.1s to 0.13s. (The "git add" time
might also be conflated with a change from the FS Monitor hook to the
builtin FS Monitor.)

[1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-23 17:45   ` Elijah Newren
  2021-07-26 13:11     ` Derrick Stolee
@ 2021-07-26 13:33     ` Derrick Stolee
  1 sibling, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-07-26 13:33 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 7/23/2021 1:45 PM, Elijah Newren wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
...
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index a3c01d588d8..a11d9d7f35d 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
>>
>>         test_sparse_match git status --porcelain=v2 &&
>>
>> -       # This "git add folder1/a" fails with a warning
>> -       # in the sparse repos, differing from the full
>> -       # repo. This is intentional.
>> -       test_sparse_match test_must_fail git add folder1/a &&
>> -       test_sparse_match test_must_fail git add --refresh folder1/a &&
>> -       test_all_match git status --porcelain=v2 &&
>> -
> 
> Why was this chunk removed?  Nothing in the commit message mentions
> this, and it's not clear to me the reason for it.
> 
> I tried adding it back in at the end of the series and it still works
> (and further I can't change test_sparse_match to test_all_match and
> have the test work).

I mentioned this in a reply to Junio, but this hunk removal is confusing.

As of this patch, this hunk causes a failure due to an error message not
matching, specifically this error:


+ diff -u sparse-checkout-err sparse-index-err
--- sparse-checkout-err 2021-07-26 13:30:50.304291264 +0000
+++ sparse-index-err    2021-07-26 13:30:50.308291259 +0000
@@ -1,5 +1 @@
-The following pathspecs didn't match any eligible path, but they do match index
-entries outside the current sparse checkout:
-folder1/a
-hint: Disable or modify the sparsity rules if you intend to update such entries.
-hint: Disable this message with "git config advice.updateSparsePath false"
+fatal: pathspec 'folder1/a' did not match any files


A similar test is added as a failure case in patch 4, then marked as success
in patch 5. This organization of test changes could be organized better, so
I will work on that in v2, along with your other suggestions.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-23 17:34   ` Elijah Newren
  2021-07-23 17:44     ` Eric Sunshine
@ 2021-07-26 14:10     ` Derrick Stolee
  1 sibling, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-07-26 14:10 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 7/23/2021 1:34 PM, Elijah Newren wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>

...

>> +       # 3. rename the file to another sparse filename
> 
> But...that doesn't resolve the conflict.  Shouldn't this be titled
> "accept the conflict & rename the file elsewhere"?

Sure. I'm less focused on the content of the file and more the steps
the user might have taken to resolve the conflict.
 
>> +       run_on_all mv folder2/a folder2/z &&
>> +       test_all_match git add folder2 &&
> 
> 'mv' rather than 'git mv', then followed by 'git add'?  Any reason for
> this order rather than git add followed by git mv?

I'm trying to mimic that a user might realize that a filename might
need to be renamed (say, because a naming convention changed that is
causing the conflict) and I don't expect users to use 'git mv' to do
this action.

> Also, if you really do want to move first, did you use mv instead of
> "git mv" due to the latter's shortcoming of only operating on stage 0?
> (https://lore.kernel.org/git/CABPp-BGJdwpwhQUp4Wa4bKBp4hQFB9OM3N1FXH7SzY0mvLDa7Q@mail.gmail.com/)

'git mv' had not occurred to me as a thing to do in this case. I'm
focused on ensuring that 'git add' works as expected to update the
index in response to filesystem changes.

> Regardless of order, though, I still think mv or add should require a
> --force to rename or add a file outside the sparsity paths given the
> deferred negative surprises for users around such files.  (Or come up
> with a solid way to remove those surprises.)

--force is focused on _ignored_ files. I imagine that it could be
repurposed to allow entries outside of the sparse-checkout definition,
but we would want to be careful for users who are adding the entire
directory, not just the individual files, as they might _also_ get any
ignored files that exist in that directory. That might justify creating
a new option instead.

Further, the error message reported when adding something outside of
the sparse cone should probably mention whatever option exists as a
way for users to bypass this limitation. I'll collect my thoughts (in
response to your detailed thoughts shared on my cover letter) and
start a new thread about hardening this behavior. I've got an internal
ticket tracking this, and I want to wrap my head around all of the
interesting commands (add, mv, rm, update-index?) and create a full
recommendation to bring as an RFC.

Of course, if someone else wants to create this clear vision in the
meantime I will not complain.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/5] Sparse Index: Integrate with 'git add'
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
@ 2021-07-26 15:18 ` Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
                     ` (6 more replies)
  6 siblings, 7 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-26 15:18 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee

This patch series re-submits the 'git add' integration with sparse-index.
The performance gains are the same as before.

It is based on ds/commit-and-checkout-with-sparse-index.

This series was delayed from its initial submission for a couple reasons.

The first was because it was colliding with some changes in
mt/add-rm-in-sparse-checkout, so now we are far enough along that that
branch is in our history and we can work forwards.

The other concern was about how 'git add ' should respond when a path
outside of the sparse-checkout cone exists. One recommendation (that I am
failing to find a link to the message, sorry) was to disallow adding files
that would become index entries with SKIP_WORKTREE on. However, as I worked
towards that goal I found that change would cause problems for a realistic
scenario: merge conflicts outside of the sparse-checkout cone.

Update: Elijah points out that the SKIP_WORKTREE bit is removed from
conflict files, which allows adding the conflicted files without warning.
(However, we also need to be careful about untracked files, as documented in
the test added here.)

The first patch of this series adds tests that create merge conflicts
outside of the sparse cone and then presents different ways a user could
resolve the situation. We want all of them to be feasible, and this
includes:

 1. Reverting the file to a known version in history.
 2. Adding the file with its contents on disk.
 3. Moving the file to a new location in the sparse directory.

The one place I did continue to update is 'git add --refresh ' to match the
behavior added by mt/add-rm-in-sparse-checkout which outputs an error
message. This happens even when the file exists in the working directory,
but that seems appropriate enough.


Updates in V2
=============

 * Test comments in patch 1 are improved.

 * The test hunk that was removed in patch 2 and reintroduced in the old
   patch 4 is modified to clarify how the behavior changes with that patch.
   Then, the test is modified by future patches.

 * Another instance of ensure_full_index() is removed from the --renormalize
   option. This option already ignored files with the SKIP_WORKTREE bit, so
   this should be an obviously-correct removal.

 * a full proposal for what to do with "git (add|mv|rm)" and paths outside
   the cone is delayed to another series (with an RFC round) because the
   behavior of the sparse-index matches a full index with sparse-checkout.

Thanks, -Stolee

Derrick Stolee (5):
  t1092: test merge conflicts outside cone
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  add: ignore outside the sparse-checkout in refresh()
  add: remove ensure_full_index() with --renormalize

 builtin/add.c                            | 15 ++++--
 pathspec.c                               |  2 -
 t/t1092-sparse-checkout-compatibility.sh | 62 ++++++++++++++++++++----
 3 files changed, 65 insertions(+), 14 deletions(-)


base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/999

Range-diff vs v1:

 1:  a763a7d15b8 ! 1:  8f2fd9370fe t1092: test merge conflicts outside cone
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge' '
      +	test_all_match test_must_fail git merge -m merge merge-right &&
      +	test_all_match git status --porcelain=v2 &&
      +
     -+	# resolve the conflict in different ways:
     -+	# 1. revert to the base
     ++	# Resolve the conflict in different ways:
     ++	# 1. Revert to the base
      +	test_all_match git checkout base -- deep/deeper2/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
     -+	# 2. add the file with conflict markers
     ++	# 2. Add the file with conflict markers
      +	test_all_match git add folder1/a &&
      +	test_all_match git status --porcelain=v2 &&
      +
     -+	# 3. rename the file to another sparse filename
     ++	# 3. Rename the file to another sparse filename and
     ++	#    accept conflict markers as resolved content.
      +	run_on_all mv folder2/a folder2/z &&
      +	test_all_match git add folder2 &&
      +	test_all_match git status --porcelain=v2 &&
 2:  791c6c2c9ad ! 2:  6e43f118fa0 add: allow operating on a sparse-only index
     @@ Commit message
          sparse-index. Comparing to the full index case, 'git add -A' goes from
          0.37s to 0.05s, which is "only" an 86% improvement.
      
     +    This modification to 'git add' creates some behavior change depending on
     +    the use of a sparse index. We modify a test in t1092 to demonstrate
     +    these changes which will be remedied in future changes.
     +
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/add.c ##
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outsi
      -	# This "git add folder1/a" fails with a warning
      -	# in the sparse repos, differing from the full
      -	# repo. This is intentional.
     --	test_sparse_match test_must_fail git add folder1/a &&
     ++	# Adding the path outside of the sparse-checkout cone should fail.
     + 	test_sparse_match test_must_fail git add folder1/a &&
      -	test_sparse_match test_must_fail git add --refresh folder1/a &&
      -	test_all_match git status --porcelain=v2 &&
     --
     ++
     ++	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
     ++	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
     ++	# NEEDSWORK: A sparse index changes the error message.
     ++	! test_cmp sparse-checkout-err sparse-index-err &&
     ++
     ++	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
     ++	test_sparse_match git add folder1/new &&
     + 
       	test_all_match git add . &&
       	test_all_match git status --porcelain=v2 &&
       	test_all_match git commit -m folder1/new &&
     ++	test_all_match git rev-parse HEAD^{tree} &&
     + 
     + 	run_on_all ../edit-contents folder1/newer &&
     + 	test_all_match git add folder1/ &&
     + 	test_all_match git status --porcelain=v2 &&
     +-	test_all_match git commit -m folder1/newer
     ++	test_all_match git commit -m folder1/newer &&
     ++	test_all_match git rev-parse HEAD^{tree}
     + '
     + 
     + test_expect_success 'checkout and reset --hard' '
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
       	git -C sparse-index reset --hard &&
       	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
 3:  a577ea4c74d = 3:  2ae91e0af29 pathspec: stop calling ensure_full_index
 4:  89ec6a7ce67 < -:  ----------- t1092: 'git add --refresh' difference with sparse-index
 5:  76066a78ce0 ! 4:  a79728d4c64 add: ignore outside the sparse-checkout in refresh()
     @@ builtin/add.c: static int refresh(int verbose, const struct pathspec *pathspec)
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outside sparse cone' '
     - 	test_all_match git commit -m folder1/newer
     - '
       
     --test_expect_failure 'add: pathspec within sparse directory' '
     -+test_expect_success 'add: pathspec within sparse directory' '
     - 	init_repos &&
     - 
     - 	run_on_sparse mkdir folder1 &&
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'add: pathspec within sparse directory' '
     - 	# This "git add folder1/a" fails with a warning
     - 	# in the sparse repos, differing from the full
     - 	# repo. This is intentional.
     --	#
     --	# However, in the sparse-index, folder1/a does not
     --	# match any cache entry and fails with a different
     --	# error message. This needs work.
     + 	# Adding the path outside of the sparse-checkout cone should fail.
       	test_sparse_match test_must_fail git add folder1/a &&
     - 	test_sparse_match test_must_fail git add --refresh folder1/a &&
     - 	test_all_match git status --porcelain=v2
     +-
     +-	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
     +-	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
     +-	# NEEDSWORK: A sparse index changes the error message.
     +-	! test_cmp sparse-checkout-err sparse-index-err &&
     ++	test_sparse_match test_must_fail git add --refresh folder1/a &&
     + 
     + 	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
     + 	test_sparse_match git add folder1/new &&
 -:  ----------- > 5:  1543550a4e8 add: remove ensure_full_index() with --renormalize

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/5] t1092: test merge conflicts outside cone
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
@ 2021-07-26 15:18   ` Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-26 15:18 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 91e30d6ec22..47f8e5e54e3 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -114,6 +114,16 @@ test_expect_success 'setup' '
 		git add . &&
 		git commit -m "file to dir" &&
 
+		for side in left right
+		do
+			git checkout -b merge-$side base &&
+			echo $side >>deep/deeper2/a &&
+			echo $side >>folder1/a &&
+			echo $side >>folder2/a &&
+			git add . &&
+			git commit -m "$side" || return 1
+		done &&
+
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
 		git commit -a -m "update deepest" &&
@@ -482,6 +492,34 @@ test_expect_success 'merge' '
 	test_all_match git rev-parse HEAD^{tree}
 '
 
+test_expect_success 'merge with conflict outside cone' '
+	init_repos &&
+
+	test_all_match git checkout -b merge-tip merge-left &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match test_must_fail git merge -m merge merge-right &&
+	test_all_match git status --porcelain=v2 &&
+
+	# Resolve the conflict in different ways:
+	# 1. Revert to the base
+	test_all_match git checkout base -- deep/deeper2/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 2. Add the file with conflict markers
+	test_all_match git add folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 3. Rename the file to another sparse filename and
+	#    accept conflict markers as resolved content.
+	run_on_all mv folder2/a folder2/z &&
+	test_all_match git add folder2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git merge --continue &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
 test_expect_success 'merge with outside renames' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 2/5] add: allow operating on a sparse-only index
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
@ 2021-07-26 15:18   ` Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-26 15:18 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%

While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.

This modification to 'git add' creates some behavior change depending on
the use of a sparse index. We modify a test in t1092 to demonstrate
these changes which will be remedied in future changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 25 +++++++++++++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index b773b5a4993..c76e6ddd359 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 47f8e5e54e3..19d38f18ed6 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -340,21 +340,27 @@ test_expect_success 'status/add: outside sparse cone' '
 
 	test_sparse_match git status --porcelain=v2 &&
 
-	# This "git add folder1/a" fails with a warning
-	# in the sparse repos, differing from the full
-	# repo. This is intentional.
+	# Adding the path outside of the sparse-checkout cone should fail.
 	test_sparse_match test_must_fail git add folder1/a &&
-	test_sparse_match test_must_fail git add --refresh folder1/a &&
-	test_all_match git status --porcelain=v2 &&
+
+	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
+	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
+	# NEEDSWORK: A sparse index changes the error message.
+	! test_cmp sparse-checkout-err sparse-index-err &&
+
+	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
+	test_sparse_match git add folder1/new &&
 
 	test_all_match git add . &&
 	test_all_match git status --porcelain=v2 &&
 	test_all_match git commit -m folder1/new &&
+	test_all_match git rev-parse HEAD^{tree} &&
 
 	run_on_all ../edit-contents folder1/newer &&
 	test_all_match git add folder1/ &&
 	test_all_match git status --porcelain=v2 &&
-	test_all_match git commit -m folder1/newer
+	test_all_match git commit -m folder1/newer &&
+	test_all_match git rev-parse HEAD^{tree}
 '
 
 test_expect_success 'checkout and reset --hard' '
@@ -636,7 +642,12 @@ test_expect_success 'sparse-index is not expanded' '
 	git -C sparse-index reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
 	git -C sparse-index reset --hard &&
-	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
+	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
+
+	echo >>sparse-index/README.md &&
+	ensure_not_expanded add -A &&
+	echo >>sparse-index/extra.txt &&
+	ensure_not_expanded add extra.txt
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 3/5] pathspec: stop calling ensure_full_index
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-07-26 15:18   ` Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 4/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-26 15:18 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%

While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 7 +++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 08f8d3eedc3..44306fdaca2 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 19d38f18ed6..50dce0e0f99 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
 test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
-	# adding a "missing" file outside the cone should fail
-	test_sparse_match test_must_fail git add folder1/a &&
-
 	# folder1 is at HEAD, but outside the sparse cone
 	run_on_sparse mkdir folder1 &&
 	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
@@ -647,7 +644,9 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/README.md &&
 	ensure_not_expanded add -A &&
 	echo >>sparse-index/extra.txt &&
-	ensure_not_expanded add extra.txt
+	ensure_not_expanded add extra.txt &&
+	echo >>sparse-index/untracked.txt &&
+	ensure_not_expanded add .
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 4/5] add: ignore outside the sparse-checkout in refresh()
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-07-26 15:18   ` [PATCH v2 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-07-26 15:18   ` Derrick Stolee via GitGitGadget
  2021-07-26 15:18   ` [PATCH v2 5/5] add: remove ensure_full_index() with --renormalize Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-26 15:18 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output a warning
message when the path is outside the sparse-checkout definition. The
implementation of this warning happened in parallel with the
sparse-index work to add ensure_full_index() calls throughout the
codebase.

Update this loop to have the proper logic that checks to see if the
pathspec is outside the sparse-checkout definition. This avoids the need
to expand the sparse directory entry and determine if the path is
tracked, untracked, or ignored. We simply avoid updating the stat()
information because there isn't even an entry that matches the path!

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            | 10 +++++++++-
 t/t1092-sparse-checkout-compatibility.sh |  6 +-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c76e6ddd359..d512ece655b 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
 	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 	int flags = REFRESH_IGNORE_SKIP_WORKTREE |
 		    (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
+	struct pattern_list pl = { 0 };
+	int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
 
 	seen = xcalloc(pathspec->nr, 1);
 	refresh_index(&the_index, flags, pathspec, seen,
 		      _("Unstaged changes after refreshing the index:"));
 	for (i = 0; i < pathspec->nr; i++) {
 		if (!seen[i]) {
-			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+			const char *path = pathspec->items[i].original;
+			int dtype = DT_REG;
+
+			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
+			    (sparse_checkout_enabled &&
+			     !path_matches_pattern_list(path, strlen(path), NULL,
+							&dtype, &pl, &the_index))) {
 				string_list_append(&only_match_skip_worktree,
 						   pathspec->items[i].original);
 			} else {
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 50dce0e0f99..3e201546577 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -339,11 +339,7 @@ test_expect_success 'status/add: outside sparse cone' '
 
 	# Adding the path outside of the sparse-checkout cone should fail.
 	test_sparse_match test_must_fail git add folder1/a &&
-
-	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
-	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
-	# NEEDSWORK: A sparse index changes the error message.
-	! test_cmp sparse-checkout-err sparse-index-err &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
 
 	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
 	test_sparse_match git add folder1/new &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 5/5] add: remove ensure_full_index() with --renormalize
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-07-26 15:18   ` [PATCH v2 4/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
@ 2021-07-26 15:18   ` Derrick Stolee via GitGitGadget
  2021-07-28 23:13   ` [PATCH v2 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  6 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-26 15:18 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The --renormalize option updates the EOL conversions for the tracked
files. However, the loop already ignores files marked with the
SKIP_WORKTREE bit, so it will continue to do so with a sparse index
because the sparse directory entries also have this bit set.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index d512ece655b..c49e179abc3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -144,8 +144,6 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 {
 	int i, retval = 0;
 
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] Sparse Index: Integrate with 'git add'
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-07-26 15:18   ` [PATCH v2 5/5] add: remove ensure_full_index() with --renormalize Derrick Stolee via GitGitGadget
@ 2021-07-28 23:13   ` Elijah Newren
  2021-07-29  2:03     ` Derrick Stolee
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  6 siblings, 1 reply; 36+ messages in thread
From: Elijah Newren @ 2021-07-28 23:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Eric Sunshine, Derrick Stolee

On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series re-submits the 'git add' integration with sparse-index.
> The performance gains are the same as before.
>
> It is based on ds/commit-and-checkout-with-sparse-index.
>
> This series was delayed from its initial submission for a couple reasons.
>
> The first was because it was colliding with some changes in
> mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> branch is in our history and we can work forwards.
>
> The other concern was about how 'git add ' should respond when a path
> outside of the sparse-checkout cone exists. One recommendation (that I am
> failing to find a link to the message, sorry) was to disallow adding files
> that would become index entries with SKIP_WORKTREE on. However, as I worked
> towards that goal I found that change would cause problems for a realistic
> scenario: merge conflicts outside of the sparse-checkout cone.
>
> Update: Elijah points out that the SKIP_WORKTREE bit is removed from
> conflict files, which allows adding the conflicted files without warning.
> (However, we also need to be careful about untracked files, as documented in
> the test added here.)
>
> The first patch of this series adds tests that create merge conflicts
> outside of the sparse cone and then presents different ways a user could
> resolve the situation. We want all of them to be feasible, and this
> includes:
>
>  1. Reverting the file to a known version in history.
>  2. Adding the file with its contents on disk.
>  3. Moving the file to a new location in the sparse directory.
>
> The one place I did continue to update is 'git add --refresh ' to match the
> behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> message. This happens even when the file exists in the working directory,
> but that seems appropriate enough.
>
>
> Updates in V2
> =============
>
>  * Test comments in patch 1 are improved.
>
>  * The test hunk that was removed in patch 2 and reintroduced in the old
>    patch 4 is modified to clarify how the behavior changes with that patch.
>    Then, the test is modified by future patches.
>
>  * Another instance of ensure_full_index() is removed from the --renormalize
>    option. This option already ignored files with the SKIP_WORKTREE bit, so
>    this should be an obviously-correct removal.
>
>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
>    the cone is delayed to another series (with an RFC round) because the
>    behavior of the sparse-index matches a full index with sparse-checkout.

I think this makes sense.

I've read through the patches, and I like this version...with one
exception.  Can we mark the test added in patch 1 under

     # 3. Rename the file to another sparse filename and
     #    accept conflict markers as resolved content.

as NEEDSWORK or even MAYNEEDWORK?  I'm still quite unconvinced that it
is testing for correct behavior, and don't want to paint ourselves
into a corner.  In particular, we don't allow folks to "git add
$IGNORED_FILE" without a --force override because it's likely to be a
mistake.  I think the same logic holds for adding untracked files
outside the sparsity cone.  But it's actually even worse than that
case because there's a secondary level of surprise too: adding files
outside the sparsity cone will result in delayed user surprises when
the next git command that happens to call unpack_trees() (which are
found all over the codebase) removes the file from the working tree.
I've had some such reports already.

If that test is marked as NEEDSWORK or even as the correct behavior
still being under dispute, then you can happily apply my:

Reviewed-by: Elijah Newren <newren@gmail.com>

> Thanks, -Stolee
>
> Derrick Stolee (5):
>   t1092: test merge conflicts outside cone
>   add: allow operating on a sparse-only index
>   pathspec: stop calling ensure_full_index
>   add: ignore outside the sparse-checkout in refresh()
>   add: remove ensure_full_index() with --renormalize
>
>  builtin/add.c                            | 15 ++++--
>  pathspec.c                               |  2 -
>  t/t1092-sparse-checkout-compatibility.sh | 62 ++++++++++++++++++++----
>  3 files changed, 65 insertions(+), 14 deletions(-)
>
>
> base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/999
>
> Range-diff vs v1:
>
>  1:  a763a7d15b8 ! 1:  8f2fd9370fe t1092: test merge conflicts outside cone
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge' '
>       + test_all_match test_must_fail git merge -m merge merge-right &&
>       + test_all_match git status --porcelain=v2 &&
>       +
>      -+ # resolve the conflict in different ways:
>      -+ # 1. revert to the base
>      ++ # Resolve the conflict in different ways:
>      ++ # 1. Revert to the base
>       + test_all_match git checkout base -- deep/deeper2/a &&
>       + test_all_match git status --porcelain=v2 &&
>       +
>      -+ # 2. add the file with conflict markers
>      ++ # 2. Add the file with conflict markers
>       + test_all_match git add folder1/a &&
>       + test_all_match git status --porcelain=v2 &&
>       +
>      -+ # 3. rename the file to another sparse filename
>      ++ # 3. Rename the file to another sparse filename and
>      ++ #    accept conflict markers as resolved content.
>       + run_on_all mv folder2/a folder2/z &&
>       + test_all_match git add folder2 &&
>       + test_all_match git status --porcelain=v2 &&
>  2:  791c6c2c9ad ! 2:  6e43f118fa0 add: allow operating on a sparse-only index
>      @@ Commit message
>           sparse-index. Comparing to the full index case, 'git add -A' goes from
>           0.37s to 0.05s, which is "only" an 86% improvement.
>
>      +    This modification to 'git add' creates some behavior change depending on
>      +    the use of a sparse index. We modify a test in t1092 to demonstrate
>      +    these changes which will be remedied in future changes.
>      +
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## builtin/add.c ##
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outsi
>       - # This "git add folder1/a" fails with a warning
>       - # in the sparse repos, differing from the full
>       - # repo. This is intentional.
>      -- test_sparse_match test_must_fail git add folder1/a &&
>      ++ # Adding the path outside of the sparse-checkout cone should fail.
>      +  test_sparse_match test_must_fail git add folder1/a &&
>       - test_sparse_match test_must_fail git add --refresh folder1/a &&
>       - test_all_match git status --porcelain=v2 &&
>      --
>      ++
>      ++ test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
>      ++ test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
>      ++ # NEEDSWORK: A sparse index changes the error message.
>      ++ ! test_cmp sparse-checkout-err sparse-index-err &&
>      ++
>      ++ # NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
>      ++ test_sparse_match git add folder1/new &&
>      +
>         test_all_match git add . &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git commit -m folder1/new &&
>      ++ test_all_match git rev-parse HEAD^{tree} &&
>      +
>      +  run_on_all ../edit-contents folder1/newer &&
>      +  test_all_match git add folder1/ &&
>      +  test_all_match git status --porcelain=v2 &&
>      +- test_all_match git commit -m folder1/newer
>      ++ test_all_match git commit -m folder1/newer &&
>      ++ test_all_match git rev-parse HEAD^{tree}
>      + '
>      +
>      + test_expect_success 'checkout and reset --hard' '
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
>         git -C sparse-index reset --hard &&
>         ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>  3:  a577ea4c74d = 3:  2ae91e0af29 pathspec: stop calling ensure_full_index
>  4:  89ec6a7ce67 < -:  ----------- t1092: 'git add --refresh' difference with sparse-index
>  5:  76066a78ce0 ! 4:  a79728d4c64 add: ignore outside the sparse-checkout in refresh()
>      @@ builtin/add.c: static int refresh(int verbose, const struct pathspec *pathspec)
>
>        ## t/t1092-sparse-checkout-compatibility.sh ##
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'status/add: outside sparse cone' '
>      -  test_all_match git commit -m folder1/newer
>      - '
>
>      --test_expect_failure 'add: pathspec within sparse directory' '
>      -+test_expect_success 'add: pathspec within sparse directory' '
>      -  init_repos &&
>      -
>      -  run_on_sparse mkdir folder1 &&
>      -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'add: pathspec within sparse directory' '
>      -  # This "git add folder1/a" fails with a warning
>      -  # in the sparse repos, differing from the full
>      -  # repo. This is intentional.
>      -- #
>      -- # However, in the sparse-index, folder1/a does not
>      -- # match any cache entry and fails with a different
>      -- # error message. This needs work.
>      +  # Adding the path outside of the sparse-checkout cone should fail.
>         test_sparse_match test_must_fail git add folder1/a &&
>      -  test_sparse_match test_must_fail git add --refresh folder1/a &&
>      -  test_all_match git status --porcelain=v2
>      +-
>      +- test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
>      +- test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
>      +- # NEEDSWORK: A sparse index changes the error message.
>      +- ! test_cmp sparse-checkout-err sparse-index-err &&
>      ++ test_sparse_match test_must_fail git add --refresh folder1/a &&
>      +
>      +  # NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
>      +  test_sparse_match git add folder1/new &&
>  -:  ----------- > 5:  1543550a4e8 add: remove ensure_full_index() with --renormalize
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] Sparse Index: Integrate with 'git add'
  2021-07-28 23:13   ` [PATCH v2 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
@ 2021-07-29  2:03     ` Derrick Stolee
  2021-07-29  2:57       ` Elijah Newren
  0 siblings, 1 reply; 36+ messages in thread
From: Derrick Stolee @ 2021-07-29  2:03 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Eric Sunshine, Derrick Stolee

On 7/28/2021 7:13 PM, Elijah Newren wrote:
> On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
...
>>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
>>    the cone is delayed to another series (with an RFC round) because the
>>    behavior of the sparse-index matches a full index with sparse-checkout.
> 
> I think this makes sense.
> 
> I've read through the patches, and I like this version...with one
> exception.  Can we mark the test added in patch 1 under
> 
>      # 3. Rename the file to another sparse filename and
>      #    accept conflict markers as resolved content.
> 
> as NEEDSWORK or even MAYNEEDWORK?

I have no objection to adding a blurb such as:

	# NEEDSWORK: allowing adds outside the sparse cone can be
	# confusingto users, as the file can disappear from the
	# worktree without warning in later Git commands.

And perhaps I'm misunderstanding the situation a bit, but that
seems to apply not just to this third case, but all of them. I
don't see why the untracked case is special compared to the
tracked case. More investigation may be required on my part.

>  I'm still quite unconvinced that it
> is testing for correct behavior, and don't want to paint ourselves
> into a corner.  In particular, we don't allow folks to "git add
> $IGNORED_FILE" without a --force override because it's likely to be a
> mistake. 

I agree about ignored files, and that is true whether or not they
are in the sparse cone.

> I think the same logic holds for adding untracked files
> outside the sparsity cone.  But it's actually even worse than that
> case because there's a secondary level of surprise too: adding files
> outside the sparsity cone will result in delayed user surprises when
> the next git command that happens to call unpack_trees() (which are
> found all over the codebase) removes the file from the working tree.
> I've had some such reports already.

I believe this is testing a realistic scenario that users are
hitting in the wild today. I would believe that users succeed with
these commands more often than they are confused by the file
disappearing from the worktree in a later Git command, so having
this sequence of events be documented as a potential use case has
some value.

I simultaneously don't think it is behavior we want to commit to
as a contract for all future Git versions, but there is value in
showing how this situation changes with any future meddling. In
particular: will users be able to self-discover the "new" way of
doing things?

The proposal part of changing how add/mv/rm behave in these cases
would need to adjust this test with something that would also help
direct users to a helpful resolution. For example, the first run
of

	git add sparse/dir/file

could error out with an error message saying "The pathspec is
outside of your sparse cone, so staging the file might lead to
a staged change that is removed from your working directory."
But we should _also_ include two strategies for getting out of
this state:

1. Adjust your sparse-checkout definition so this file is in scope.

-or- (and this is the part that would be new)

2. If you understand the risks of staging a file outside the sparse
   cone, then run 'git add --sparse sparse/dir/file'.

(Insert whatever option would be appropriate for --sparse here.)

Such a warning message would allow users who follow the steps listed
in the test to know how to adjust their usage to then get into a
good state.

> If that test is marked as NEEDSWORK or even as the correct behavior
> still being under dispute, then you can happily apply my:

I would classify this as "The test documents current behavior, but
isn't a contract for future behavior." With a concept such as my
suggestion above, the test could be modified to check for the
warning and then run the second command with the extra option and
complete the test's expectations. Having the existing behavior
documented in a test helps demonstrate how behavior is changing.

We we've discussed, we want to give such a behavior change the
right venue for feedback and suggestions for alternate approaches,
and this series is not the right place for that. Hopefully you
can tell that it is on my mind and that I want to recommend a
change in the near future.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] Sparse Index: Integrate with 'git add'
  2021-07-29  2:03     ` Derrick Stolee
@ 2021-07-29  2:57       ` Elijah Newren
  2021-07-29 14:49         ` Derrick Stolee
  0 siblings, 1 reply; 36+ messages in thread
From: Elijah Newren @ 2021-07-29  2:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Eric Sunshine,
	Derrick Stolee

On Wed, Jul 28, 2021 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/28/2021 7:13 PM, Elijah Newren wrote:
> > On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> ...
> >>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
> >>    the cone is delayed to another series (with an RFC round) because the
> >>    behavior of the sparse-index matches a full index with sparse-checkout.
> >
> > I think this makes sense.
> >
> > I've read through the patches, and I like this version...with one
> > exception.  Can we mark the test added in patch 1 under
> >
> >      # 3. Rename the file to another sparse filename and
> >      #    accept conflict markers as resolved content.
> >
> > as NEEDSWORK or even MAYNEEDWORK?
>
> I have no objection to adding a blurb such as:
>
>         # NEEDSWORK: allowing adds outside the sparse cone can be
>         # confusingto users, as the file can disappear from the
>         # worktree without warning in later Git commands.
>

Sounds great to me other than the simple typo (s/confusingto/confusing to/)

> And perhaps I'm misunderstanding the situation a bit, but that
> seems to apply not just to this third case, but all of them. I
> don't see why the untracked case is special compared to the
> tracked case. More investigation may be required on my part.

The possible cases for files outside the sparsity patterns are:
  a) untracked
  b) tracked and SKIP_WORKTREE
  c) tracked and !SKIP_WORKTREE (e.g. because merge conflicts)

From the above set, we've been talking about untracked and I think
we're on the same page about those.  Case (b) was already corrected by
Matheus a number of releases back; git-add will throw an error
explaining the situation and prevent the adding.  The error tells the
user to expand their sparsity set to work on those files.  For case
(c), you are right that those are problematic in the same way (they
can disappear later after a git-add)...but we're also in the situation
where the only way to get rid of the conflicting stages is to run git
add.  So, in my mind, case (c) puts us between a rock and a hard
place, and we probably need to allow the git-add.

> >  I'm still quite unconvinced that it
> > is testing for correct behavior, and don't want to paint ourselves
> > into a corner.  In particular, we don't allow folks to "git add
> > $IGNORED_FILE" without a --force override because it's likely to be a
> > mistake.
>
> I agree about ignored files, and that is true whether or not they
> are in the sparse cone.

Yes, and...

> > I think the same logic holds for adding untracked files
> > outside the sparsity cone.

In my opinion, "outside the sparsity cone" is another form of "being
ignored", and in my mind should be treated similarly -- it should
generally require an override to add such files.  (Case (c) possibly
being an exception, though maybe even it shouldn't be.)

> >  But it's actually even worse than that
> > case because there's a secondary level of surprise too: adding files
> > outside the sparsity cone will result in delayed user surprises when
> > the next git command that happens to call unpack_trees() (which are
> > found all over the codebase) removes the file from the working tree.
> > I've had some such reports already.
>
> I believe this is testing a realistic scenario that users are
> hitting in the wild today. I would believe that users succeed with
> these commands more often than they are confused by the file
> disappearing from the worktree in a later Git command, so having
> this sequence of events be documented as a potential use case has
> some value.
>
> I simultaneously don't think it is behavior we want to commit to
> as a contract for all future Git versions, but there is value in
> showing how this situation changes with any future meddling. In
> particular: will users be able to self-discover the "new" way of
> doing things?

Oh, I totally agree that documenting how things work definitely has
value.  I've added several test_expect_failure cases and whatnot to
the testsuite.  But there's a big difference between documenting how
things work and documenting how we expect them to work.  If the two
differ, then any good provided by documenting how things work with a
test marked as test_expect_success may be counterbalanced or even
overwhelmed by the harm it also causes, particularly in areas where
working around backward compatibility constraints are more difficult.

For example, not that long ago, it seemed people agreed (even Junio)
that commit hooks were never intended to be part of rebase (they
aren't part of the apply backend, and were only part of the
merge/interactive backend due to historical accident) and could be
removed (being replaced by just a rebase hook called at the end of the
rebase instead of with every commit).  There were user complaints
about the commit hooks being triggered when the default backend
switched, backing up the expectation.  But no one jumped in to fix it
at the time.  Then when it was brought up again recently, Junio said
we couldn't just remove those because of backward compatibility.
That's forcing me to consider suggesting a bunch of new arguments to
rebase to let users get unbroken when they discover they need it, or
maybe even a new toplevel command because we painted ourselves into a
corner (there are more backward compatibility corners in rebase
too...).

Trying to get out of a corner we paint ourselves into with
sparse-checkout would be massively harder, which is why I keep harping
on this kind of thing.  I'm very concerned it's happening even despite
my numerous comments and worries about it.

> The proposal part of changing how add/mv/rm behave in these cases
> would need to adjust this test with something that would also help
> direct users to a helpful resolution. For example, the first run
> of
>
>         git add sparse/dir/file
>
> could error out with an error message saying "The pathspec is
> outside of your sparse cone, so staging the file might lead to
> a staged change that is removed from your working directory."

Yes, much like we currently do with tracked files which are SKIP_WORKTREE.

> But we should _also_ include two strategies for getting out of
> this state:
>
> 1. Adjust your sparse-checkout definition so this file is in scope.
>
> -or- (and this is the part that would be new)
>
> 2. If you understand the risks of staging a file outside the sparse
>    cone, then run 'git add --sparse sparse/dir/file'.
>
> (Insert whatever option would be appropriate for --sparse here.)
>
> Such a warning message would allow users who follow the steps listed
> in the test to know how to adjust their usage to then get into a
> good state.

Choice 2 doesn't exist yet, but yeah your suggestion makes sense.

> > If that test is marked as NEEDSWORK or even as the correct behavior
> > still being under dispute, then you can happily apply my:
>
> I would classify this as "The test documents current behavior, but
> isn't a contract for future behavior." With a concept such as my
> suggestion above, the test could be modified to check for the
> warning and then run the second command with the extra option and
> complete the test's expectations. Having the existing behavior
> documented in a test helps demonstrate how behavior is changing.
>
> We we've discussed, we want to give such a behavior change the
> right venue for feedback and suggestions for alternate approaches,
> and this series is not the right place for that. Hopefully you
> can tell that it is on my mind and that I want to recommend a
> change in the near future.

I'm totally fine with such changes not being part of this series.  I
just don't want a test_expect_success that checks for behavior that I
consider buggy unless it comes with a disclaimer that it's checking
for existing rather than expected behavior.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] Sparse Index: Integrate with 'git add'
  2021-07-29  2:57       ` Elijah Newren
@ 2021-07-29 14:49         ` Derrick Stolee
  0 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee @ 2021-07-29 14:49 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Eric Sunshine,
	Derrick Stolee

On 7/28/2021 10:57 PM, Elijah Newren wrote:
> On Wed, Jul 28, 2021 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 7/28/2021 7:13 PM, Elijah Newren wrote:
>>> On Mon, Jul 26, 2021 at 9:18 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>> ...
>>>>  * a full proposal for what to do with "git (add|mv|rm)" and paths outside
>>>>    the cone is delayed to another series (with an RFC round) because the
>>>>    behavior of the sparse-index matches a full index with sparse-checkout.
>>>
>>> I think this makes sense.
>>>
>>> I've read through the patches, and I like this version...with one
>>> exception.  Can we mark the test added in patch 1 under
>>>
>>>      # 3. Rename the file to another sparse filename and
>>>      #    accept conflict markers as resolved content.
>>>
>>> as NEEDSWORK or even MAYNEEDWORK?
>>
>> I have no objection to adding a blurb such as:
>>
>>         # NEEDSWORK: allowing adds outside the sparse cone can be
>>         # confusingto users, as the file can disappear from the
>>         # worktree without warning in later Git commands.
>>
> 
> Sounds great to me other than the simple typo (s/confusingto/confusing to/)
> 
>> And perhaps I'm misunderstanding the situation a bit, but that
>> seems to apply not just to this third case, but all of them. I
>> don't see why the untracked case is special compared to the
>> tracked case. More investigation may be required on my part.
> 
> The possible cases for files outside the sparsity patterns are:
>   a) untracked
>   b) tracked and SKIP_WORKTREE
>   c) tracked and !SKIP_WORKTREE (e.g. because merge conflicts)
> 
> From the above set, we've been talking about untracked and I think
> we're on the same page about those.  Case (b) was already corrected by
> Matheus a number of releases back; git-add will throw an error
> explaining the situation and prevent the adding.  The error tells the
> user to expand their sparsity set to work on those files.  For case
> (c), you are right that those are problematic in the same way (they
> can disappear later after a git-add)...but we're also in the situation
> where the only way to get rid of the conflicting stages is to run git
> add.  So, in my mind, case (c) puts us between a rock and a hard
> place, and we probably need to allow the git-add.

I appreciate this additional context. Thanks.
 
>>>  I'm still quite unconvinced that it
>>> is testing for correct behavior, and don't want to paint ourselves
>>> into a corner.  In particular, we don't allow folks to "git add
>>> $IGNORED_FILE" without a --force override because it's likely to be a
>>> mistake.
>>
>> I agree about ignored files, and that is true whether or not they
>> are in the sparse cone.
> 
> Yes, and...
> 
>>> I think the same logic holds for adding untracked files
>>> outside the sparsity cone.
> 
> In my opinion, "outside the sparsity cone" is another form of "being
> ignored", and in my mind should be treated similarly -- it should
> generally require an override to add such files.  (Case (c) possibly
> being an exception, though maybe even it shouldn't be.)

I don't hold that same interpretation. I think of it instead as
"hidden" files, but they still matter. I also think that advising
one to adjust their sparsity patterns might be dangerous because
not all users know the ramifications of doing that. They might
accidentally download an enormous amount of data to correct a
single file.

Having an override seems like the best option, and we can hopefully
make it consistent across all the cases and commands.

...

> Trying to get out of a corner we paint ourselves into with
> sparse-checkout would be massively harder, which is why I keep harping
> on this kind of thing.  I'm very concerned it's happening even despite
> my numerous comments and worries about it.
...
> I'm totally fine with such changes not being part of this series.  I
> just don't want a test_expect_success that checks for behavior that I
> consider buggy unless it comes with a disclaimer that it's checking
> for existing rather than expected behavior.

I understand your perspective. I'll send a v3 soon that adds a
comment on top of the entire test signalling the things we talked
about here: this is a documentation of behavior, not an endorsement,
and we should probably change it because users can get confused.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 0/5] Sparse Index: Integrate with 'git add'
  2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-07-28 23:13   ` [PATCH v2 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
@ 2021-07-29 14:52   ` Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
                       ` (5 more replies)
  6 siblings, 6 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-29 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee

This patch series re-submits the 'git add' integration with sparse-index.
The performance gains are the same as before.

It is based on ds/commit-and-checkout-with-sparse-index.

This series was delayed from its initial submission for a couple reasons.

The first was because it was colliding with some changes in
mt/add-rm-in-sparse-checkout, so now we are far enough along that that
branch is in our history and we can work forwards.

The other concern was about how 'git add ' should respond when a path
outside of the sparse-checkout cone exists. One recommendation (that I am
failing to find a link to the message, sorry) was to disallow adding files
that would become index entries with SKIP_WORKTREE on. However, as I worked
towards that goal I found that change would cause problems for a realistic
scenario: merge conflicts outside of the sparse-checkout cone.

Update: Elijah points out that the SKIP_WORKTREE bit is removed from
conflict files, which allows adding the conflicted files without warning.
(However, we also need to be careful about untracked files, as documented in
the test added here.)

The first patch of this series adds tests that create merge conflicts
outside of the sparse cone and then presents different ways a user could
resolve the situation. We want all of them to be feasible, and this
includes:

 1. Reverting the file to a known version in history.
 2. Adding the file with its contents on disk.
 3. Moving the file to a new location in the sparse directory.

The one place I did continue to update is 'git add --refresh ' to match the
behavior added by mt/add-rm-in-sparse-checkout which outputs an error
message. This happens even when the file exists in the working directory,
but that seems appropriate enough.


Updates in V3
=============

 * Added disclaimer to the merge-conflict test that this is documenting
   current behavior, not endorsing it.

 * Added Elijah's reviewed-by. Thanks for the review!

Thanks, -Stolee

Derrick Stolee (5):
  t1092: test merge conflicts outside cone
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  add: ignore outside the sparse-checkout in refresh()
  add: remove ensure_full_index() with --renormalize

 builtin/add.c                            | 15 ++++--
 pathspec.c                               |  2 -
 t/t1092-sparse-checkout-compatibility.sh | 67 ++++++++++++++++++++----
 3 files changed, 70 insertions(+), 14 deletions(-)


base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/999

Range-diff vs v2:

 1:  8f2fd9370fe ! 1:  5e96df4df58 t1092: test merge conflicts outside cone
     @@ Metadata
       ## Commit message ##
          t1092: test merge conflicts outside cone
      
     +    Conflicts can occur outside of the sparse-checkout definition, and in
     +    that case users might try to resolve the conflicts in several ways.
     +    Document a few of these ways in a test. Make it clear that this behavior
     +    is not necessarily the optimal flow, since users can become confused
     +    when Git deletes these files from the worktree in later commands.
     +
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge' '
       	test_all_match git rev-parse HEAD^{tree}
       '
       
     ++# NEEDSWORK: This test is documenting current behavior, but that
     ++# behavior can be confusing to users so there is desire to change it.
     ++# Right now, users might be using this flow to work through conflicts,
     ++# so any solution should present advice to users who try this sequence
     ++# of commands to follow whatever new method we create.
      +test_expect_success 'merge with conflict outside cone' '
      +	init_repos &&
      +
 2:  6e43f118fa0 ! 2:  defab1b86d3 add: allow operating on a sparse-only index
     @@ Commit message
          the use of a sparse index. We modify a test in t1092 to demonstrate
          these changes which will be remedied in future changes.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/add.c ##
 3:  2ae91e0af29 ! 3:  9fc4313c889 pathspec: stop calling ensure_full_index
     @@ Commit message
          commits. Comparing to the full index case, we see the performance go
          from 0.33s to 0.05s, an 85% improvement.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## pathspec.c ##
 4:  a79728d4c64 ! 4:  0ec03ab021d add: ignore outside the sparse-checkout in refresh()
     @@ Commit message
          tracked, untracked, or ignored. We simply avoid updating the stat()
          information because there isn't even an entry that matches the path!
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/add.c ##
 5:  1543550a4e8 ! 5:  adf5b15ac3d add: remove ensure_full_index() with --renormalize
     @@ Commit message
          SKIP_WORKTREE bit, so it will continue to do so with a sparse index
          because the sparse directory entries also have this bit set.
      
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/add.c ##

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 1/5] t1092: test merge conflicts outside cone
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2021-07-29 14:52     ` Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-29 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Conflicts can occur outside of the sparse-checkout definition, and in
that case users might try to resolve the conflicts in several ways.
Document a few of these ways in a test. Make it clear that this behavior
is not necessarily the optimal flow, since users can become confused
when Git deletes these files from the worktree in later commands.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 43 ++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 91e30d6ec22..4c3bcb34999 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -114,6 +114,16 @@ test_expect_success 'setup' '
 		git add . &&
 		git commit -m "file to dir" &&
 
+		for side in left right
+		do
+			git checkout -b merge-$side base &&
+			echo $side >>deep/deeper2/a &&
+			echo $side >>folder1/a &&
+			echo $side >>folder2/a &&
+			git add . &&
+			git commit -m "$side" || return 1
+		done &&
+
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
 		git commit -a -m "update deepest" &&
@@ -482,6 +492,39 @@ test_expect_success 'merge' '
 	test_all_match git rev-parse HEAD^{tree}
 '
 
+# NEEDSWORK: This test is documenting current behavior, but that
+# behavior can be confusing to users so there is desire to change it.
+# Right now, users might be using this flow to work through conflicts,
+# so any solution should present advice to users who try this sequence
+# of commands to follow whatever new method we create.
+test_expect_success 'merge with conflict outside cone' '
+	init_repos &&
+
+	test_all_match git checkout -b merge-tip merge-left &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match test_must_fail git merge -m merge merge-right &&
+	test_all_match git status --porcelain=v2 &&
+
+	# Resolve the conflict in different ways:
+	# 1. Revert to the base
+	test_all_match git checkout base -- deep/deeper2/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 2. Add the file with conflict markers
+	test_all_match git add folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 3. Rename the file to another sparse filename and
+	#    accept conflict markers as resolved content.
+	run_on_all mv folder2/a folder2/z &&
+	test_all_match git add folder2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git merge --continue &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
 test_expect_success 'merge with outside renames' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 2/5] add: allow operating on a sparse-only index
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
@ 2021-07-29 14:52     ` Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-29 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%

While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.

This modification to 'git add' creates some behavior change depending on
the use of a sparse index. We modify a test in t1092 to demonstrate
these changes which will be remedied in future changes.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 25 +++++++++++++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index b773b5a4993..c76e6ddd359 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4c3bcb34999..77343cb6d95 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -340,21 +340,27 @@ test_expect_success 'status/add: outside sparse cone' '
 
 	test_sparse_match git status --porcelain=v2 &&
 
-	# This "git add folder1/a" fails with a warning
-	# in the sparse repos, differing from the full
-	# repo. This is intentional.
+	# Adding the path outside of the sparse-checkout cone should fail.
 	test_sparse_match test_must_fail git add folder1/a &&
-	test_sparse_match test_must_fail git add --refresh folder1/a &&
-	test_all_match git status --porcelain=v2 &&
+
+	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
+	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
+	# NEEDSWORK: A sparse index changes the error message.
+	! test_cmp sparse-checkout-err sparse-index-err &&
+
+	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
+	test_sparse_match git add folder1/new &&
 
 	test_all_match git add . &&
 	test_all_match git status --porcelain=v2 &&
 	test_all_match git commit -m folder1/new &&
+	test_all_match git rev-parse HEAD^{tree} &&
 
 	run_on_all ../edit-contents folder1/newer &&
 	test_all_match git add folder1/ &&
 	test_all_match git status --porcelain=v2 &&
-	test_all_match git commit -m folder1/newer
+	test_all_match git commit -m folder1/newer &&
+	test_all_match git rev-parse HEAD^{tree}
 '
 
 test_expect_success 'checkout and reset --hard' '
@@ -641,7 +647,12 @@ test_expect_success 'sparse-index is not expanded' '
 	git -C sparse-index reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
 	git -C sparse-index reset --hard &&
-	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
+	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
+
+	echo >>sparse-index/README.md &&
+	ensure_not_expanded add -A &&
+	echo >>sparse-index/extra.txt &&
+	ensure_not_expanded add extra.txt
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 3/5] pathspec: stop calling ensure_full_index
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-07-29 14:52     ` Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 4/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-29 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%

While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 7 +++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 08f8d3eedc3..44306fdaca2 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 77343cb6d95..dee20db5bb1 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
 test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
-	# adding a "missing" file outside the cone should fail
-	test_sparse_match test_must_fail git add folder1/a &&
-
 	# folder1 is at HEAD, but outside the sparse cone
 	run_on_sparse mkdir folder1 &&
 	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
@@ -652,7 +649,9 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/README.md &&
 	ensure_not_expanded add -A &&
 	echo >>sparse-index/extra.txt &&
-	ensure_not_expanded add extra.txt
+	ensure_not_expanded add extra.txt &&
+	echo >>sparse-index/untracked.txt &&
+	ensure_not_expanded add .
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 4/5] add: ignore outside the sparse-checkout in refresh()
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-07-29 14:52     ` [PATCH v3 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-07-29 14:52     ` Derrick Stolee via GitGitGadget
  2021-07-29 14:52     ` [PATCH v3 5/5] add: remove ensure_full_index() with --renormalize Derrick Stolee via GitGitGadget
  2021-07-29 14:58     ` [PATCH v3 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-29 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output a warning
message when the path is outside the sparse-checkout definition. The
implementation of this warning happened in parallel with the
sparse-index work to add ensure_full_index() calls throughout the
codebase.

Update this loop to have the proper logic that checks to see if the
pathspec is outside the sparse-checkout definition. This avoids the need
to expand the sparse directory entry and determine if the path is
tracked, untracked, or ignored. We simply avoid updating the stat()
information because there isn't even an entry that matches the path!

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            | 10 +++++++++-
 t/t1092-sparse-checkout-compatibility.sh |  6 +-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c76e6ddd359..d512ece655b 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
 	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 	int flags = REFRESH_IGNORE_SKIP_WORKTREE |
 		    (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
+	struct pattern_list pl = { 0 };
+	int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
 
 	seen = xcalloc(pathspec->nr, 1);
 	refresh_index(&the_index, flags, pathspec, seen,
 		      _("Unstaged changes after refreshing the index:"));
 	for (i = 0; i < pathspec->nr; i++) {
 		if (!seen[i]) {
-			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+			const char *path = pathspec->items[i].original;
+			int dtype = DT_REG;
+
+			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
+			    (sparse_checkout_enabled &&
+			     !path_matches_pattern_list(path, strlen(path), NULL,
+							&dtype, &pl, &the_index))) {
 				string_list_append(&only_match_skip_worktree,
 						   pathspec->items[i].original);
 			} else {
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index dee20db5bb1..ddc86bb4152 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -339,11 +339,7 @@ test_expect_success 'status/add: outside sparse cone' '
 
 	# Adding the path outside of the sparse-checkout cone should fail.
 	test_sparse_match test_must_fail git add folder1/a &&
-
-	test_must_fail git -C sparse-checkout add --refresh folder1/a 2>sparse-checkout-err &&
-	test_must_fail git -C sparse-index add --refresh folder1/a 2>sparse-index-err &&
-	# NEEDSWORK: A sparse index changes the error message.
-	! test_cmp sparse-checkout-err sparse-index-err &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
 
 	# NEEDSWORK: Adding a newly-tracked file outside the cone succeeds
 	test_sparse_match git add folder1/new &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 5/5] add: remove ensure_full_index() with --renormalize
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-07-29 14:52     ` [PATCH v3 4/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
@ 2021-07-29 14:52     ` Derrick Stolee via GitGitGadget
  2021-07-29 14:58     ` [PATCH v3 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
  5 siblings, 0 replies; 36+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-29 14:52 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Eric Sunshine,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The --renormalize option updates the EOL conversions for the tracked
files. However, the loop already ignores files marked with the
SKIP_WORKTREE bit, so it will continue to do so with a sparse index
because the sparse directory entries also have this bit set.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index d512ece655b..c49e179abc3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -144,8 +144,6 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 {
 	int i, retval = 0;
 
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 0/5] Sparse Index: Integrate with 'git add'
  2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-07-29 14:52     ` [PATCH v3 5/5] add: remove ensure_full_index() with --renormalize Derrick Stolee via GitGitGadget
@ 2021-07-29 14:58     ` Elijah Newren
  5 siblings, 0 replies; 36+ messages in thread
From: Elijah Newren @ 2021-07-29 14:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Eric Sunshine, Derrick Stolee

On Thu, Jul 29, 2021 at 8:52 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series re-submits the 'git add' integration with sparse-index.
> The performance gains are the same as before.
>
> It is based on ds/commit-and-checkout-with-sparse-index.
>
> This series was delayed from its initial submission for a couple reasons.
>
> The first was because it was colliding with some changes in
> mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> branch is in our history and we can work forwards.
>
> The other concern was about how 'git add ' should respond when a path
> outside of the sparse-checkout cone exists. One recommendation (that I am
> failing to find a link to the message, sorry) was to disallow adding files
> that would become index entries with SKIP_WORKTREE on. However, as I worked
> towards that goal I found that change would cause problems for a realistic
> scenario: merge conflicts outside of the sparse-checkout cone.
>
> Update: Elijah points out that the SKIP_WORKTREE bit is removed from
> conflict files, which allows adding the conflicted files without warning.
> (However, we also need to be careful about untracked files, as documented in
> the test added here.)
>
> The first patch of this series adds tests that create merge conflicts
> outside of the sparse cone and then presents different ways a user could
> resolve the situation. We want all of them to be feasible, and this
> includes:
>
>  1. Reverting the file to a known version in history.
>  2. Adding the file with its contents on disk.
>  3. Moving the file to a new location in the sparse directory.
>
> The one place I did continue to update is 'git add --refresh ' to match the
> behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> message. This happens even when the file exists in the working directory,
> but that seems appropriate enough.
>
>
> Updates in V3
> =============
>
>  * Added disclaimer to the merge-conflict test that this is documenting
>    current behavior, not endorsing it.
>
>  * Added Elijah's reviewed-by. Thanks for the review!

Yep, this one looks ready to merge down to me.  Thanks!

> Thanks, -Stolee
>
> Derrick Stolee (5):
>   t1092: test merge conflicts outside cone
>   add: allow operating on a sparse-only index
>   pathspec: stop calling ensure_full_index
>   add: ignore outside the sparse-checkout in refresh()
>   add: remove ensure_full_index() with --renormalize
>
>  builtin/add.c                            | 15 ++++--
>  pathspec.c                               |  2 -
>  t/t1092-sparse-checkout-compatibility.sh | 67 ++++++++++++++++++++----
>  3 files changed, 70 insertions(+), 14 deletions(-)
>
>
> base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/999
>
> Range-diff vs v2:
>
>  1:  8f2fd9370fe ! 1:  5e96df4df58 t1092: test merge conflicts outside cone
>      @@ Metadata
>        ## Commit message ##
>           t1092: test merge conflicts outside cone
>
>      +    Conflicts can occur outside of the sparse-checkout definition, and in
>      +    that case users might try to resolve the conflicts in several ways.
>      +    Document a few of these ways in a test. Make it clear that this behavior
>      +    is not necessarily the optimal flow, since users can become confused
>      +    when Git deletes these files from the worktree in later commands.
>      +
>      +    Reviewed-by: Elijah Newren <newren@gmail.com>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## t/t1092-sparse-checkout-compatibility.sh ##
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge' '
>         test_all_match git rev-parse HEAD^{tree}
>        '
>
>      ++# NEEDSWORK: This test is documenting current behavior, but that
>      ++# behavior can be confusing to users so there is desire to change it.
>      ++# Right now, users might be using this flow to work through conflicts,
>      ++# so any solution should present advice to users who try this sequence
>      ++# of commands to follow whatever new method we create.
>       +test_expect_success 'merge with conflict outside cone' '
>       + init_repos &&
>       +
>  2:  6e43f118fa0 ! 2:  defab1b86d3 add: allow operating on a sparse-only index
>      @@ Commit message
>           the use of a sparse index. We modify a test in t1092 to demonstrate
>           these changes which will be remedied in future changes.
>
>      +    Reviewed-by: Elijah Newren <newren@gmail.com>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## builtin/add.c ##
>  3:  2ae91e0af29 ! 3:  9fc4313c889 pathspec: stop calling ensure_full_index
>      @@ Commit message
>           commits. Comparing to the full index case, we see the performance go
>           from 0.33s to 0.05s, an 85% improvement.
>
>      +    Reviewed-by: Elijah Newren <newren@gmail.com>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## pathspec.c ##
>  4:  a79728d4c64 ! 4:  0ec03ab021d add: ignore outside the sparse-checkout in refresh()
>      @@ Commit message
>           tracked, untracked, or ignored. We simply avoid updating the stat()
>           information because there isn't even an entry that matches the path!
>
>      +    Reviewed-by: Elijah Newren <newren@gmail.com>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## builtin/add.c ##
>  5:  1543550a4e8 ! 5:  adf5b15ac3d add: remove ensure_full_index() with --renormalize
>      @@ Commit message
>           SKIP_WORKTREE bit, so it will continue to do so with a sparse index
>           because the sparse directory entries also have this bit set.
>
>      +    Reviewed-by: Elijah Newren <newren@gmail.com>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## builtin/add.c ##
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-07-29 14:58 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
2021-07-23 17:34   ` Elijah Newren
2021-07-23 17:44     ` Eric Sunshine
2021-07-23 17:47       ` Elijah Newren
2021-07-26 14:10     ` Derrick Stolee
2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-07-21 22:19   ` Junio C Hamano
2021-07-21 22:50     ` Derrick Stolee
2021-07-23 17:45   ` Elijah Newren
2021-07-26 13:11     ` Derrick Stolee
2021-07-26 13:33     ` Derrick Stolee
2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-07-23 18:17   ` Elijah Newren
2021-07-21 21:06 ` [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index Derrick Stolee via GitGitGadget
2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
2021-07-23 19:46   ` Elijah Newren
2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
2021-07-23 20:10   ` Elijah Newren
2021-07-26 15:18 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2021-07-26 15:18   ` [PATCH v2 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
2021-07-26 15:18   ` [PATCH v2 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-07-26 15:18   ` [PATCH v2 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-07-26 15:18   ` [PATCH v2 4/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
2021-07-26 15:18   ` [PATCH v2 5/5] add: remove ensure_full_index() with --renormalize Derrick Stolee via GitGitGadget
2021-07-28 23:13   ` [PATCH v2 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
2021-07-29  2:03     ` Derrick Stolee
2021-07-29  2:57       ` Elijah Newren
2021-07-29 14:49         ` Derrick Stolee
2021-07-29 14:52   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-07-29 14:52     ` [PATCH v3 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
2021-07-29 14:52     ` [PATCH v3 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-07-29 14:52     ` [PATCH v3 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-07-29 14:52     ` [PATCH v3 4/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
2021-07-29 14:52     ` [PATCH v3 5/5] add: remove ensure_full_index() with --renormalize Derrick Stolee via GitGitGadget
2021-07-29 14:58     ` [PATCH v3 0/5] Sparse Index: Integrate with 'git add' Elijah Newren

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git