git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Sparse Index: Integrate with 'git add'
@ 2021-07-21 21:06 Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee

This patch series re-submits the 'git add' integration with sparse-index.
The performance gains are the same as before.

It is based on ds/commit-and-checkout-with-sparse-index.

This series was delayed from its initial submission for a couple reasons.

The first was because it was colliding with some changes in
mt/add-rm-in-sparse-checkout, so now we are far enough along that that
branch is in our history and we can work forwards.

The other concern was about how 'git add ' should respond when a path
outside of the sparse-checkout cone exists. One recommendation (that I am
failing to find a link to the message, sorry) was to disallow adding files
that would become index entries with SKIP_WORKTREE on. However, as I worked
towards that goal I found that change would cause problems for a realistic
scenario: merge conflicts outside of the sparse-checkout cone.

The first patch of this series adds tests that create merge conflicts
outside of the sparse cone and then presents different ways a user could
resolve the situation. We want all of them to be feasible, and this
includes:

 1. Reverting the file to a known version in history.
 2. Adding the file with its contents on disk.
 3. Moving the file to a new location in the sparse directory.

Without maintaining the behavior of adding files outside of the
sparse-checkout cone, we risk confusing users who get into this state. The
only workaround they would have is to modify their sparse-checkout
definition which might lead to expanding much more data than they need to
resolve the conflicts.

For that reason, I stopped trying to limit 'git add' to be within the cone.
I'm open to suggestions here, but we need an approach that works for
out-of-cone conflicts.

The one place I did continue to update is 'git add --refresh ' to match the
behavior added by mt/add-rm-in-sparse-checkout which outputs an error
message. This happens even when the file exists in the working directory,
but that seems appropriate enough.

Thanks, -Stolee

Derrick Stolee (5):
  t1092: test merge conflicts outside cone
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  t1092: 'git add --refresh' difference with sparse-index
  add: ignore outside the sparse-checkout in refresh()

 builtin/add.c                            | 13 ++++-
 pathspec.c                               |  2 -
 t/t1092-sparse-checkout-compatibility.sh | 71 ++++++++++++++++++++----
 3 files changed, 72 insertions(+), 14 deletions(-)


base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/999
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-23 17:34   ` Elijah Newren
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 37 ++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 91e30d6ec22..a3c01d588d8 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -114,6 +114,16 @@ test_expect_success 'setup' '
 		git add . &&
 		git commit -m "file to dir" &&
 
+		for side in left right
+		do
+			git checkout -b merge-$side base &&
+			echo $side >>deep/deeper2/a &&
+			echo $side >>folder1/a &&
+			echo $side >>folder2/a &&
+			git add . &&
+			git commit -m "$side" || return 1
+		done &&
+
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
 		git commit -a -m "update deepest" &&
@@ -482,6 +492,33 @@ test_expect_success 'merge' '
 	test_all_match git rev-parse HEAD^{tree}
 '
 
+test_expect_success 'merge with conflict outside cone' '
+	init_repos &&
+
+	test_all_match git checkout -b merge-tip merge-left &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match test_must_fail git merge -m merge merge-right &&
+	test_all_match git status --porcelain=v2 &&
+
+	# resolve the conflict in different ways:
+	# 1. revert to the base
+	test_all_match git checkout base -- deep/deeper2/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 2. add the file with conflict markers
+	test_all_match git add folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# 3. rename the file to another sparse filename
+	run_on_all mv folder2/a folder2/z &&
+	test_all_match git add folder2 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git merge --continue &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD^{tree}
+'
+
 test_expect_success 'merge with outside renames' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-21 22:19   ` Junio C Hamano
  2021-07-23 17:45   ` Elijah Newren
  2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%

While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 14 ++++++--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index b773b5a4993..c76e6ddd359 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	/*
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index a3c01d588d8..a11d9d7f35d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
 
 	test_sparse_match git status --porcelain=v2 &&
 
-	# This "git add folder1/a" fails with a warning
-	# in the sparse repos, differing from the full
-	# repo. This is intentional.
-	test_sparse_match test_must_fail git add folder1/a &&
-	test_sparse_match test_must_fail git add --refresh folder1/a &&
-	test_all_match git status --porcelain=v2 &&
-
 	test_all_match git add . &&
 	test_all_match git status --porcelain=v2 &&
 	test_all_match git commit -m folder1/new &&
@@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
 	git -C sparse-index reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
 	git -C sparse-index reset --hard &&
-	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
+	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
+
+	echo >>sparse-index/README.md &&
+	ensure_not_expanded add -A &&
+	echo >>sparse-index/extra.txt &&
+	ensure_not_expanded add extra.txt
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/5] pathspec: stop calling ensure_full_index
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-23 18:17   ` Elijah Newren
  2021-07-21 21:06 ` [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%

While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c                               | 2 --
 t/t1092-sparse-checkout-compatibility.sh | 7 +++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 08f8d3eedc3..44306fdaca2 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index a11d9d7f35d..f9e2f5f4aa1 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
 test_expect_success 'status/add: outside sparse cone' '
 	init_repos &&
 
-	# adding a "missing" file outside the cone should fail
-	test_sparse_match test_must_fail git add folder1/a &&
-
 	# folder1 is at HEAD, but outside the sparse cone
 	run_on_sparse mkdir folder1 &&
 	cp initial-repo/folder1/a sparse-checkout/folder1/a &&
@@ -633,7 +630,9 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/README.md &&
 	ensure_not_expanded add -A &&
 	echo >>sparse-index/extra.txt &&
-	ensure_not_expanded add extra.txt
+	ensure_not_expanded add extra.txt &&
+	echo >>sparse-index/untracked.txt &&
+	ensure_not_expanded add .
 '
 
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
  2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
  5 siblings, 0 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When running 'git add --refresh <path>', Git does not actually stage the
change, but instead updates the index based on the stat() information in
the working directory.

This typically does not make sense in a sparse-checkout scenario, where
Git wants this file to not exist. However, sometimes the file can exist
on-disk for other reasons, such as a user manually adding the file or a
merge conflict outside of the sparse cone.

Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output an advice
message to indicate that this is not allowed when <path> is outside the
sparse cone. The check goes around the sparse index protections at the
moment, so it does not find a match when contained in a sparse directory
entry. We will update this behavior with a later change, but want to
demonstrate the failure now.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f9e2f5f4aa1..73c48a71d89 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -347,6 +347,25 @@ test_expect_success 'status/add: outside sparse cone' '
 	test_all_match git commit -m folder1/newer
 '
 
+test_expect_failure 'add: pathspec within sparse directory' '
+	init_repos &&
+
+	run_on_sparse mkdir folder1 &&
+	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/new &&
+
+	# This "git add folder1/a" fails with a warning
+	# in the sparse repos, differing from the full
+	# repo. This is intentional.
+	#
+	# However, in the sparse-index, folder1/a does not
+	# match any cache entry and fails with a different
+	# error message. This needs work.
+	test_sparse_match test_must_fail git add folder1/a &&
+	test_sparse_match test_must_fail git add --refresh folder1/a &&
+	test_all_match git status --porcelain=v2
+'
+
 test_expect_success 'checkout and reset --hard' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/5] add: ignore outside the sparse-checkout in refresh()
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-07-21 21:06 ` [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index Derrick Stolee via GitGitGadget
@ 2021-07-21 21:06 ` Derrick Stolee via GitGitGadget
  2021-07-23 19:46   ` Elijah Newren
  2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
  5 siblings, 1 reply; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-07-21 21:06 UTC (permalink / raw)
  To: git
  Cc: gitster, newren, matheus.bernardino, stolee, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output a warning
message when the path is outside the sparse-checkout definition. The
implementation of this warning happened in parallel with the
sparse-index work to add ensure_full_index() calls throughout the
codebase.

Update this loop to have the proper logic that checks to see if the
pathspec is outside the sparse-checkout definition. This avoids the need
to expand the sparse directory entry and determine if the path is
tracked, untracked, or ignored. We simply avoid updating the stat()
information because there isn't even an entry that matches the path!

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c                            | 10 +++++++++-
 t/t1092-sparse-checkout-compatibility.sh |  6 +-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c76e6ddd359..d512ece655b 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
 	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 	int flags = REFRESH_IGNORE_SKIP_WORKTREE |
 		    (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
+	struct pattern_list pl = { 0 };
+	int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
 
 	seen = xcalloc(pathspec->nr, 1);
 	refresh_index(&the_index, flags, pathspec, seen,
 		      _("Unstaged changes after refreshing the index:"));
 	for (i = 0; i < pathspec->nr; i++) {
 		if (!seen[i]) {
-			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+			const char *path = pathspec->items[i].original;
+			int dtype = DT_REG;
+
+			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
+			    (sparse_checkout_enabled &&
+			     !path_matches_pattern_list(path, strlen(path), NULL,
+							&dtype, &pl, &the_index))) {
 				string_list_append(&only_match_skip_worktree,
 						   pathspec->items[i].original);
 			} else {
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 73c48a71d89..c61424e2074 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -347,7 +347,7 @@ test_expect_success 'status/add: outside sparse cone' '
 	test_all_match git commit -m folder1/newer
 '
 
-test_expect_failure 'add: pathspec within sparse directory' '
+test_expect_success 'add: pathspec within sparse directory' '
 	init_repos &&
 
 	run_on_sparse mkdir folder1 &&
@@ -357,10 +357,6 @@ test_expect_failure 'add: pathspec within sparse directory' '
 	# This "git add folder1/a" fails with a warning
 	# in the sparse repos, differing from the full
 	# repo. This is intentional.
-	#
-	# However, in the sparse-index, folder1/a does not
-	# match any cache entry and fails with a different
-	# error message. This needs work.
 	test_sparse_match test_must_fail git add folder1/a &&
 	test_sparse_match test_must_fail git add --refresh folder1/a &&
 	test_all_match git status --porcelain=v2
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
@ 2021-07-21 22:19   ` Junio C Hamano
  2021-07-21 22:50     ` Derrick Stolee
  2021-07-23 17:45   ` Elijah Newren
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2021-07-21 22:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, matheus.bernardino, stolee, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Disable command_requires_full_index for 'git add'. This does not require
> any additional removals of ensure_full_index(). The main reason is that
> 'git add' discovers changes based on the pathspec and the worktree
> itself. These are then inserted into the index directly, and calls to
> index_name_pos() or index_file_exists() already call expand_to_path() at
> the appropriate time to support a sparse-index.

OK.  With that explained, it still is quite surprising that we only
need this change (eh, rather, doing this change is safe without
doing anything else).

> -	# This "git add folder1/a" fails with a warning
> -	# in the sparse repos, differing from the full
> -	# repo. This is intentional.
> -	test_sparse_match test_must_fail git add folder1/a &&
> -	test_sparse_match test_must_fail git add --refresh folder1/a &&
> -	test_all_match git status --porcelain=v2 &&

And nice to see a known limitation lifted.

>  	test_all_match git add . &&
>  	test_all_match git status --porcelain=v2 &&
>  	test_all_match git commit -m folder1/new &&
> @@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
>  	git -C sparse-index reset --hard &&
>  	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>  	git -C sparse-index reset --hard &&
> -	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
> +	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
> +
> +	echo >>sparse-index/README.md &&
> +	ensure_not_expanded add -A &&
> +	echo >>sparse-index/extra.txt &&
> +	ensure_not_expanded add extra.txt
>  '
>  
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 22:19   ` Junio C Hamano
@ 2021-07-21 22:50     ` Derrick Stolee
  0 siblings, 0 replies; 16+ messages in thread
From: Derrick Stolee @ 2021-07-21 22:50 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, matheus.bernardino, Derrick Stolee, Derrick Stolee

On 7/21/21 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Disable command_requires_full_index for 'git add'. This does not require
>> any additional removals of ensure_full_index(). The main reason is that
>> 'git add' discovers changes based on the pathspec and the worktree
>> itself. These are then inserted into the index directly, and calls to
>> index_name_pos() or index_file_exists() already call expand_to_path() at
>> the appropriate time to support a sparse-index.
> 
> OK.  With that explained, it still is quite surprising that we only
> need this change (eh, rather, doing this change is safe without
> doing anything else).

Yes, all of the hard work was done by the earlier work to expand
a sparse index when we search for a specific path that lands
within a sparse directory. See 95e0321 (read-cache: expand on query
into sparse-directory entry, 2021-04-01) for the specifics.

>> -	# This "git add folder1/a" fails with a warning
>> -	# in the sparse repos, differing from the full
>> -	# repo. This is intentional.
>> -	test_sparse_match test_must_fail git add folder1/a &&
>> -	test_sparse_match test_must_fail git add --refresh folder1/a &&
>> -	test_all_match git status --porcelain=v2 &&
> 
> And nice to see a known limitation lifted.

Thank you for pointing this out. This actually starts to _fail_ now
that we allow sparse indexes in 'git add', but it's because the error
messages don't match, not that the 'test_must_fail' is violated.

Patch 4 adds a similar test that is then set to work in patch 5. That
allows us a clear way to describe the behavior change and to motivate
the fix in patch 5. This could be explained better, perhaps by merging
Patch 4 into this one. That helps describe how this specific case
changes behavior (for the worse) in this patch, but is handled in a
careful way later, once the behavior change is documented.

If there is a better way to reorganize these patches, then I could
try another approach.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] Sparse Index: Integrate with 'git add'
  2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
@ 2021-07-23 12:51 ` Elijah Newren
  2021-07-23 20:10   ` Elijah Newren
  5 siblings, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 12:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

Hi,

I haven't read the series yet (I'm going to try to read them all
today), but wanted to comment on a couple things in your cover
letter...

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series re-submits the 'git add' integration with sparse-index.
> The performance gains are the same as before.
>
> It is based on ds/commit-and-checkout-with-sparse-index.
>
> This series was delayed from its initial submission for a couple reasons.
>
> The first was because it was colliding with some changes in
> mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> branch is in our history and we can work forwards.
>
> The other concern was about how 'git add ' should respond when a path
> outside of the sparse-checkout cone exists. One recommendation (that I am
> failing to find a link to the message, sorry) was to disallow adding files
> that would become index entries with SKIP_WORKTREE on. However, as I worked

I think the recommendation was:
  * Permitted: running git add on tracked files with the SKIP_WORKTREE
bit *clear*
  * Disallowed: running git add on tracked files with the
SKIP_WORKTREE bit *set*
  * Disallowed: running git add on untracked files that would become
index entries with the SKIP_WORKTREE bit set

where the latter two exit with an error message that suggests changing
the sparsity specification first.  I think this is what has existed
for some time, other than Matheus adding some error messages to help
the user when their add command doesn't match anything otherwise.

> towards that goal I found that change would cause problems for a realistic
> scenario: merge conflicts outside of the sparse-checkout cone.

...which wouldn't be a problem because these are tracked files whose
SKIP_WORKTREE bit was cleared by the merge machinery (when it marked
them as conflicted and wrote them to the working tree).

> The first patch of this series adds tests that create merge conflicts
> outside of the sparse cone and then presents different ways a user could
> resolve the situation. We want all of them to be feasible, and this
> includes:
>
>  1. Reverting the file to a known version in history.
>  2. Adding the file with its contents on disk.
>  3. Moving the file to a new location in the sparse directory.
>
> Without maintaining the behavior of adding files outside of the
> sparse-checkout cone, we risk confusing users who get into this state. The
> only workaround they would have is to modify their sparse-checkout
> definition which might lead to expanding much more data than they need to
> resolve the conflicts.
>
> For that reason, I stopped trying to limit 'git add' to be within the cone.
> I'm open to suggestions here, but we need an approach that works for
> out-of-cone conflicts.

I believe my above suggestion works for out-of-cone conflicts.  Some
important other details to keep in mind in regards to how we make add
behave:

* We don't want "git add -A [GLOB_OR_DIRECTORY]" to accidentally be
treated as a directive to remove files from the repository (and
naively noticing that SKIP_WORKTREE files are missing but attempting
to operate on them anyway would give this problematic result).
* We don't want "git rm [GLOB_OR_DIRECTORY]" to nuke SKIP_WORKTREE
files; it should only operate on files that are present.
* We need add and rm to be consistent with each other in terms of how
SKIP_WORKTREE files and the sparsity cone are treated

and more generally about not-SKIP_WORKTREE-despite-not-matching-sparsity-paths:

* These files that aren't SKIP_WORKTREE but normally would be are
prone to "disappear" at some random later time after they are made to
match the index.  The disappearing can happen either with an explicit
"git sparse-checkout reapply" (which is fine since it was explicit) or
as a side-effect of various other commands that run through
unpack_trees() since it attempts to heed the sparsity rules.  Users
tend to get confused by the latter case; they'll understand at some
point that it was because the file was outside the sparsity paths, but
the randomness in when it's pulled out as a side-effect of other
commands can be slightly jarring.  So, I'd like to avoid that where we
can easily do so, which I think the recommendation above does.  (As a
side note to these kinds of cases, maybe we want to consider having
the sparsification logic in unpack_trees() first check whether paths
being removed from the working tree and having their SKIP_WORKTREE bit
set not only match the index but also match HEAD?  That'd be a bit
more expensive to check in the sparsification paths, and I'm not sure
they all have the relevant information, but it's an idea...)

> The one place I did continue to update is 'git add --refresh ' to match the
> behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> message. This happens even when the file exists in the working directory,
> but that seems appropriate enough.
>
> Thanks, -Stolee
>
> Derrick Stolee (5):
>   t1092: test merge conflicts outside cone
>   add: allow operating on a sparse-only index
>   pathspec: stop calling ensure_full_index
>   t1092: 'git add --refresh' difference with sparse-index
>   add: ignore outside the sparse-checkout in refresh()
>
>  builtin/add.c                            | 13 ++++-
>  pathspec.c                               |  2 -
>  t/t1092-sparse-checkout-compatibility.sh | 71 ++++++++++++++++++++----
>  3 files changed, 72 insertions(+), 14 deletions(-)
>
>
> base-commit: 71e301501c88399711a1bf8515d1747e92cfbb9b
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-999%2Fderrickstolee%2Fsparse-index%2Fadd-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-999/derrickstolee/sparse-index/add-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/999
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
@ 2021-07-23 17:34   ` Elijah Newren
  2021-07-23 17:44     ` Eric Sunshine
  0 siblings, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 17:34 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 37 ++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 91e30d6ec22..a3c01d588d8 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -114,6 +114,16 @@ test_expect_success 'setup' '
>                 git add . &&
>                 git commit -m "file to dir" &&
>
> +               for side in left right
> +               do
> +                       git checkout -b merge-$side base &&
> +                       echo $side >>deep/deeper2/a &&
> +                       echo $side >>folder1/a &&
> +                       echo $side >>folder2/a &&
> +                       git add . &&
> +                       git commit -m "$side" || return 1

Why is this "|| return 1" here?

It looks like there are a number of other cases of this in the file
too, which I must have overlooked previously, because I don't
understand any of them.

> +               done &&
> +
>                 git checkout -b deepest base &&
>                 echo "updated deepest" >deep/deeper1/deepest/a &&
>                 git commit -a -m "update deepest" &&
> @@ -482,6 +492,33 @@ test_expect_success 'merge' '
>         test_all_match git rev-parse HEAD^{tree}
>  '
>
> +test_expect_success 'merge with conflict outside cone' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b merge-tip merge-left &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match test_must_fail git merge -m merge merge-right &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # resolve the conflict in different ways:
> +       # 1. revert to the base
> +       test_all_match git checkout base -- deep/deeper2/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # 2. add the file with conflict markers
> +       test_all_match git add folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # 3. rename the file to another sparse filename

But...that doesn't resolve the conflict.  Shouldn't this be titled
"accept the conflict & rename the file elsewhere"?

> +       run_on_all mv folder2/a folder2/z &&
> +       test_all_match git add folder2 &&

'mv' rather than 'git mv', then followed by 'git add'?  Any reason for
this order rather than git add followed by git mv?

Also, if you really do want to move first, did you use mv instead of
"git mv" due to the latter's shortcoming of only operating on stage 0?
(https://lore.kernel.org/git/CABPp-BGJdwpwhQUp4Wa4bKBp4hQFB9OM3N1FXH7SzY0mvLDa7Q@mail.gmail.com/)

Regardless of order, though, I still think mv or add should require a
--force to rename or add a file outside the sparsity paths given the
deferred negative surprises for users around such files.  (Or come up
with a solid way to remove those surprises.)

> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git merge --continue &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git rev-parse HEAD^{tree}
> +'
> +
>  test_expect_success 'merge with outside renames' '
>         init_repos &&

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-23 17:34   ` Elijah Newren
@ 2021-07-23 17:44     ` Eric Sunshine
  2021-07-23 17:47       ` Elijah Newren
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Sunshine @ 2021-07-23 17:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Jul 23, 2021 at 1:34 PM Elijah Newren <newren@gmail.com> wrote:
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > +               for side in left right
> > +               do
> > +                       git checkout -b merge-$side base &&
> > +                       echo $side >>deep/deeper2/a &&
> > +                       echo $side >>folder1/a &&
> > +                       echo $side >>folder2/a &&
> > +                       git add . &&
> > +                       git commit -m "$side" || return 1
>
> Why is this "|| return 1" here?
>
> It looks like there are a number of other cases of this in the file
> too, which I must have overlooked previously, because I don't
> understand any of them.

A shell for-loop won't automatically terminate just because some
command in its body fails. Instead it will run to completion and
return the status of the last command of the last iteration, which may
not be the iteration which failed, thus a failure can be hidden.
Therefore, we need to proactively stop the loop iteration _and_ ensure
that the return status of the loop itself reflects the failure, which
we do by `|| return 1`. (If this loop was inside a subshell, we'd use
`|| exit 1` instead.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] add: allow operating on a sparse-only index
  2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
  2021-07-21 22:19   ` Junio C Hamano
@ 2021-07-23 17:45   ` Elijah Newren
  1 sibling, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 17:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Disable command_requires_full_index for 'git add'. This does not require
> any additional removals of ensure_full_index(). The main reason is that
> 'git add' discovers changes based on the pathspec and the worktree
> itself. These are then inserted into the index directly, and calls to
> index_name_pos() or index_file_exists() already call expand_to_path() at
> the appropriate time to support a sparse-index.

Nice.

> Add a test to check that 'git add -A' and 'git add <file>' does not
> expand the index at all, as long as <file> is not within a sparse
> directory. This does not help the global 'git add .' case.

Good idea.

> We can measure the improvement using p2000-sparse-operations.sh with
> these results:
>
> Test                                  HEAD~1           HEAD
> ------------------------------------------------------------------------------
> 2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
> 2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
> 2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
> 2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%
>
> While the 91% improvement seems impressive, it's important to recognize
> that previously we had significant overhead for expanding the
> sparse-index. Comparing to the full index case, 'git add -A' goes from
> 0.37s to 0.05s, which is "only" an 86% improvement.

Hehe.  Yep, it's so "disappointing" to "only" have the code be 7x faster.  :-)

Out of curiosity, IIRC any operation involving the index took ~10s on
some of the Microsoft repos.  What does the speedup look like over
there for these changes to git-add?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c                            |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 14 ++++++--------
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index b773b5a4993..c76e6ddd359 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -528,6 +528,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>         add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
>         require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
>
>         /*
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a3c01d588d8..a11d9d7f35d 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -340,13 +340,6 @@ test_expect_success 'status/add: outside sparse cone' '
>
>         test_sparse_match git status --porcelain=v2 &&
>
> -       # This "git add folder1/a" fails with a warning
> -       # in the sparse repos, differing from the full
> -       # repo. This is intentional.
> -       test_sparse_match test_must_fail git add folder1/a &&
> -       test_sparse_match test_must_fail git add --refresh folder1/a &&
> -       test_all_match git status --porcelain=v2 &&
> -

Why was this chunk removed?  Nothing in the commit message mentions
this, and it's not clear to me the reason for it.

I tried adding it back in at the end of the series and it still works
(and further I can't change test_sparse_match to test_all_match and
have the test work).

>         test_all_match git add . &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git commit -m folder1/new &&
> @@ -635,7 +628,12 @@ test_expect_success 'sparse-index is not expanded' '
>         git -C sparse-index reset --hard &&
>         ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
>         git -C sparse-index reset --hard &&
> -       ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1
> +       ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
> +
> +       echo >>sparse-index/README.md &&
> +       ensure_not_expanded add -A &&
> +       echo >>sparse-index/extra.txt &&
> +       ensure_not_expanded add extra.txt

...and here's the extra test you mentioned in the commit message.  Looks good.

>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] t1092: test merge conflicts outside cone
  2021-07-23 17:44     ` Eric Sunshine
@ 2021-07-23 17:47       ` Elijah Newren
  0 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 17:47 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Matheus Tavares Bernardino, Derrick Stolee,
	Derrick Stolee, Derrick Stolee

On Fri, Jul 23, 2021 at 10:44 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Fri, Jul 23, 2021 at 1:34 PM Elijah Newren <newren@gmail.com> wrote:
> > On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > > +               for side in left right
> > > +               do
> > > +                       git checkout -b merge-$side base &&
> > > +                       echo $side >>deep/deeper2/a &&
> > > +                       echo $side >>folder1/a &&
> > > +                       echo $side >>folder2/a &&
> > > +                       git add . &&
> > > +                       git commit -m "$side" || return 1
> >
> > Why is this "|| return 1" here?
> >
> > It looks like there are a number of other cases of this in the file
> > too, which I must have overlooked previously, because I don't
> > understand any of them.
>
> A shell for-loop won't automatically terminate just because some
> command in its body fails. Instead it will run to completion and
> return the status of the last command of the last iteration, which may
> not be the iteration which failed, thus a failure can be hidden.
> Therefore, we need to proactively stop the loop iteration _and_ ensure
> that the return status of the loop itself reflects the failure, which
> we do by `|| return 1`. (If this loop was inside a subshell, we'd use
> `|| exit 1` instead.)

Ah, thanks for the explanation.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] pathspec: stop calling ensure_full_index
  2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
@ 2021-07-23 18:17   ` Elijah Newren
  0 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 18:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The add_pathspec_matches_against_index() focuses on matching a pathspec
> to file entries in the index. This already works correctly for its only
> use: checking if untracked files exist in the index.
>
> The compatibility checks in t1092 already test that 'git add <dir>'
> works for a directory outside of the sparse cone. That provides coverage
> for removing this guard.
>
> This finalizes our ability to run 'git add .' without expanding a sparse
> index to a full one. This is evidenced by an update to t1092 and by
> these performance numbers for p2000-sparse-operations.sh:
>
> Test                                    HEAD~1            HEAD
> --------------------------------------------------------------------------------
> 2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
> 2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
> 2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
> 2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%
>
> While the ~90% improvement is shown by the test results, it is worth
> noting that expanding the sparse index was adding overhead in previous
> commits. Comparing to the full index case, we see the performance go
> from 0.33s to 0.05s, an 85% improvement.

These perf improvements are pretty sweet.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  pathspec.c                               | 2 --
>  t/t1092-sparse-checkout-compatibility.sh | 7 +++----
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index 08f8d3eedc3..44306fdaca2 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -37,8 +37,6 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
> -       /* TODO: audit for interaction with sparse-index. */
> -       ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>                 if (sw_action == PS_IGNORE_SKIP_WORKTREE && ce_skip_worktree(ce))
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index a11d9d7f35d..f9e2f5f4aa1 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -322,9 +322,6 @@ test_expect_success 'commit including unstaged changes' '
>  test_expect_success 'status/add: outside sparse cone' '
>         init_repos &&
>
> -       # adding a "missing" file outside the cone should fail
> -       test_sparse_match test_must_fail git add folder1/a &&
> -

So this is removed because of non-matching errors.  In particular,
sparse-checkout shows
"""
The following pathspecs didn't match any eligible path, but they do match index
entries outside the current sparse checkout:
folder1/a
hint: Disable or modify the sparsity rules if you intend to update such entries.
hint: Disable this message with "git config advice.updateSparsePath false"
"""
while sparse-index now shows:
"""
fatal: pathspec 'folder1/a' did not match any files
"""

The new error seems entirely reasonable to me.  No objection here.


But allow me to go on a bit of a diversion...

If we modify this setup slightly by running:

$ mkdir folder1
$ echo garbage >folder1/a
$ git add folder1/a

Then you'll get the first of those errors in both the sparse-index and
the sparse-checkout.  I also like this behavior.

If you unset the SKIP_WORKTREE bit manually, and then add:

$ git update-index --no-skip-worktree folder1/a
$ git add folder1/a

Then the file is added with no error or warning.  I like this behavior too.

If you further change the setup with:

$ echo more garbage >folder1/z
$ git add folder1/z

Then you get no error, despite folder1/z being an untracked file
outside of sparsity paths.  No bueno.  :-(

>         # folder1 is at HEAD, but outside the sparse cone
>         run_on_sparse mkdir folder1 &&
>         cp initial-repo/folder1/a sparse-checkout/folder1/a &&
> @@ -633,7 +630,9 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/README.md &&
>         ensure_not_expanded add -A &&
>         echo >>sparse-index/extra.txt &&
> -       ensure_not_expanded add extra.txt
> +       ensure_not_expanded add extra.txt &&
> +       echo >>sparse-index/untracked.txt &&
> +       ensure_not_expanded add .

:-)

>  '
>
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> --

So I added a lot of comments here, in part because I thought I'd test
a bit more of what I said in response to your cover letter and see how
close to it we were.  The patch in question looks fine.

I just added an aside as a convenient place to check whether the
behavior at the end of the series matches what you proposed in the
cover letter, or what I proposed in response.  It appears it matches
neither (though that's not due to this specific patch).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] add: ignore outside the sparse-checkout in refresh()
  2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
@ 2021-07-23 19:46   ` Elijah Newren
  0 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 19:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
> entries, 2021-04-08), 'git add --refresh <path>' will output a warning
> message when the path is outside the sparse-checkout definition. The
> implementation of this warning happened in parallel with the
> sparse-index work to add ensure_full_index() calls throughout the
> codebase.
>
> Update this loop to have the proper logic that checks to see if the
> pathspec is outside the sparse-checkout definition. This avoids the need
> to expand the sparse directory entry and determine if the path is
> tracked, untracked, or ignored. We simply avoid updating the stat()
> information because there isn't even an entry that matches the path!
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c                            | 10 +++++++++-
>  t/t1092-sparse-checkout-compatibility.sh |  6 +-----
>  2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index c76e6ddd359..d512ece655b 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -192,13 +192,21 @@ static int refresh(int verbose, const struct pathspec *pathspec)
>         struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
>         int flags = REFRESH_IGNORE_SKIP_WORKTREE |
>                     (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
> +       struct pattern_list pl = { 0 };
> +       int sparse_checkout_enabled = !get_sparse_checkout_patterns(&pl);
>
>         seen = xcalloc(pathspec->nr, 1);
>         refresh_index(&the_index, flags, pathspec, seen,
>                       _("Unstaged changes after refreshing the index:"));
>         for (i = 0; i < pathspec->nr; i++) {
>                 if (!seen[i]) {
> -                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> +                       const char *path = pathspec->items[i].original;
> +                       int dtype = DT_REG;
> +
> +                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen) ||
> +                           (sparse_checkout_enabled &&
> +                            !path_matches_pattern_list(path, strlen(path), NULL,
> +                                                       &dtype, &pl, &the_index))) {

I was slightly worried from the description in the commit message
about the case where you have a file without the SKIP_WORKTREE bit set
despite not matching sparsity paths.  I was worried that you'd skip
refreshing it, but I tweaked your testcases and couldn't trigger it.

>                                 string_list_append(&only_match_skip_worktree,
>                                                    pathspec->items[i].original);
>                         } else {
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 73c48a71d89..c61424e2074 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -347,7 +347,7 @@ test_expect_success 'status/add: outside sparse cone' '
>         test_all_match git commit -m folder1/newer
>  '
>
> -test_expect_failure 'add: pathspec within sparse directory' '
> +test_expect_success 'add: pathspec within sparse directory' '
>         init_repos &&
>
>         run_on_sparse mkdir folder1 &&
> @@ -357,10 +357,6 @@ test_expect_failure 'add: pathspec within sparse directory' '
>         # This "git add folder1/a" fails with a warning
>         # in the sparse repos, differing from the full
>         # repo. This is intentional.
> -       #
> -       # However, in the sparse-index, folder1/a does not
> -       # match any cache entry and fails with a different
> -       # error message. This needs work.
>         test_sparse_match test_must_fail git add folder1/a &&
>         test_sparse_match test_must_fail git add --refresh folder1/a &&
>         test_all_match git status --porcelain=v2
> --
> gitgitgadget

This and Patch 4/5 look good to me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] Sparse Index: Integrate with 'git add'
  2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
@ 2021-07-23 20:10   ` Elijah Newren
  0 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2021-07-23 20:10 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On Fri, Jul 23, 2021 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> I haven't read the series yet (I'm going to try to read them all
> today), but wanted to comment on a couple things in your cover
> letter...
>
> On Wed, Jul 21, 2021 at 2:07 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > This patch series re-submits the 'git add' integration with sparse-index.
> > The performance gains are the same as before.
> >
> > It is based on ds/commit-and-checkout-with-sparse-index.
> >
> > This series was delayed from its initial submission for a couple reasons.
> >
> > The first was because it was colliding with some changes in
> > mt/add-rm-in-sparse-checkout, so now we are far enough along that that
> > branch is in our history and we can work forwards.
> >
> > The other concern was about how 'git add ' should respond when a path
> > outside of the sparse-checkout cone exists. One recommendation (that I am
> > failing to find a link to the message, sorry) was to disallow adding files
> > that would become index entries with SKIP_WORKTREE on. However, as I worked
>
> I think the recommendation was:
>   * Permitted: running git add on tracked files with the SKIP_WORKTREE
> bit *clear*
>   * Disallowed: running git add on tracked files with the
> SKIP_WORKTREE bit *set*
>   * Disallowed: running git add on untracked files that would become
> index entries with the SKIP_WORKTREE bit set
>
> where the latter two exit with an error message that suggests changing
> the sparsity specification first.  I think this is what has existed
> for some time, other than Matheus adding some error messages to help
> the user when their add command doesn't match anything otherwise.
>
> > towards that goal I found that change would cause problems for a realistic
> > scenario: merge conflicts outside of the sparse-checkout cone.
>
> ...which wouldn't be a problem because these are tracked files whose
> SKIP_WORKTREE bit was cleared by the merge machinery (when it marked
> them as conflicted and wrote them to the working tree).
>
> > The first patch of this series adds tests that create merge conflicts
> > outside of the sparse cone and then presents different ways a user could
> > resolve the situation. We want all of them to be feasible, and this
> > includes:
> >
> >  1. Reverting the file to a known version in history.
> >  2. Adding the file with its contents on disk.
> >  3. Moving the file to a new location in the sparse directory.
> >
> > Without maintaining the behavior of adding files outside of the
> > sparse-checkout cone, we risk confusing users who get into this state. The
> > only workaround they would have is to modify their sparse-checkout
> > definition which might lead to expanding much more data than they need to
> > resolve the conflicts.
> >
> > For that reason, I stopped trying to limit 'git add' to be within the cone.
> > I'm open to suggestions here, but we need an approach that works for
> > out-of-cone conflicts.
>
> I believe my above suggestion works for out-of-cone conflicts.  Some
> important other details to keep in mind in regards to how we make add
> behave:
>
> * We don't want "git add -A [GLOB_OR_DIRECTORY]" to accidentally be
> treated as a directive to remove files from the repository (and
> naively noticing that SKIP_WORKTREE files are missing but attempting
> to operate on them anyway would give this problematic result).
> * We don't want "git rm [GLOB_OR_DIRECTORY]" to nuke SKIP_WORKTREE
> files; it should only operate on files that are present.
> * We need add and rm to be consistent with each other in terms of how
> SKIP_WORKTREE files and the sparsity cone are treated
>
> and more generally about not-SKIP_WORKTREE-despite-not-matching-sparsity-paths:
>
> * These files that aren't SKIP_WORKTREE but normally would be are
> prone to "disappear" at some random later time after they are made to
> match the index.  The disappearing can happen either with an explicit
> "git sparse-checkout reapply" (which is fine since it was explicit) or
> as a side-effect of various other commands that run through
> unpack_trees() since it attempts to heed the sparsity rules.  Users
> tend to get confused by the latter case; they'll understand at some
> point that it was because the file was outside the sparsity paths, but
> the randomness in when it's pulled out as a side-effect of other
> commands can be slightly jarring.  So, I'd like to avoid that where we
> can easily do so, which I think the recommendation above does.  (As a
> side note to these kinds of cases, maybe we want to consider having
> the sparsification logic in unpack_trees() first check whether paths
> being removed from the working tree and having their SKIP_WORKTREE bit
> set not only match the index but also match HEAD?  That'd be a bit
> more expensive to check in the sparsification paths, and I'm not sure
> they all have the relevant information, but it's an idea...)
>
> > The one place I did continue to update is 'git add --refresh ' to match the
> > behavior added by mt/add-rm-in-sparse-checkout which outputs an error
> > message. This happens even when the file exists in the working directory,
> > but that seems appropriate enough.

I've read through the patches now.  They look pretty good, and the
performance improvements are definitely appealing.

My comments on patch 1/5 probably reflect the differences in proposed
paths between your cover letter and my proposal.  My comments on patch
3/5 point out that the current behavior matches neither your
description in your cover letter (or my understanding thereof), nor
mine in my earlier response.  However, it also suggests that the
differences between my proposal and the code isn't due to your series
but a pre-existing issue, and thus wouldn't necessarily need to hold
your series up.

My comments on patch 2/5, asking about why you removed a hunk are thus
the biggest outstanding item for the series to move forward.

It would be nice for us to figure out and agree on the end goal for
add/rm/mv behavior, though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-23 20:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 21:06 [PATCH 0/5] Sparse Index: Integrate with 'git add' Derrick Stolee via GitGitGadget
2021-07-21 21:06 ` [PATCH 1/5] t1092: test merge conflicts outside cone Derrick Stolee via GitGitGadget
2021-07-23 17:34   ` Elijah Newren
2021-07-23 17:44     ` Eric Sunshine
2021-07-23 17:47       ` Elijah Newren
2021-07-21 21:06 ` [PATCH 2/5] add: allow operating on a sparse-only index Derrick Stolee via GitGitGadget
2021-07-21 22:19   ` Junio C Hamano
2021-07-21 22:50     ` Derrick Stolee
2021-07-23 17:45   ` Elijah Newren
2021-07-21 21:06 ` [PATCH 3/5] pathspec: stop calling ensure_full_index Derrick Stolee via GitGitGadget
2021-07-23 18:17   ` Elijah Newren
2021-07-21 21:06 ` [PATCH 4/5] t1092: 'git add --refresh' difference with sparse-index Derrick Stolee via GitGitGadget
2021-07-21 21:06 ` [PATCH 5/5] add: ignore outside the sparse-checkout in refresh() Derrick Stolee via GitGitGadget
2021-07-23 19:46   ` Elijah Newren
2021-07-23 12:51 ` [PATCH 0/5] Sparse Index: Integrate with 'git add' Elijah Newren
2021-07-23 20:10   ` Elijah Newren

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git