git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/4] rm: integrate with sparse-index
@ 2022-08-03  4:51 Shaoxuan Yuan
  2022-08-03  4:51 ` [PATCH v1 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-03  4:51 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Turn on sparse-index feature within `git-rm` command.
Add necessary modifications and test them.

Shaoxuan Yuan (4):
  t1092: add tests for `git-rm`
  pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
  rm: expand the index only when necessary
  rm: integrate with sparse-index

 builtin/reset.c                          | 84 +---------------------
 builtin/rm.c                             |  7 +-
 pathspec.c                               | 89 ++++++++++++++++++++++++
 pathspec.h                               | 12 ++++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 73 ++++++++++++++++++-
 6 files changed, 180 insertions(+), 86 deletions(-)


base-commit: 350dc9f0e8974b6fcbdeb3808186c5a79c3e7386
-- 
2.37.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 1/4] t1092: add tests for `git-rm`
  2022-08-03  4:51 [PATCH v1 0/4] rm: integrate with sparse-index Shaoxuan Yuan
@ 2022-08-03  4:51 ` Shaoxuan Yuan
  2022-08-03 14:32   ` Derrick Stolee
  2022-08-03  4:51 ` [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-03  4:51 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Add tests for `git-rm`, make sure it behaves as expected when
<pathspec> is both inside or outside of sparse-checkout definition.

Also add ensure_not_expanded test to make sure `git-rm` does not
accidentally expand the index when <pathspec> is within the
sparse-checkout definition.

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 71 ++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 763c6cc684..75649e3265 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1853,4 +1853,75 @@ test_expect_success 'mv directory from out-of-cone to in-cone' '
 	grep -e "H deep/0/1" actual
 '
 
+test_expect_success 'rm pathspec inside sparse definition' '
+	init_repos &&
+
+	test_all_match git rm deep/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# test wildcard
+	run_on_all git reset --hard &&
+	test_all_match git rm deep/* &&
+	test_all_match git status --porcelain=v2 &&
+
+	# test recursive rm
+	run_on_all git reset --hard &&
+	test_all_match git rm -r deep &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_failure 'rm pathspec outside sparse definition' '
+	init_repos &&
+
+	for file in folder1/a folder1/0/1
+	do
+		test_sparse_match test_must_fail git rm $file &&
+		test_sparse_match test_must_fail git rm --cached $file &&
+		test_sparse_match git rm --sparse $file &&
+		test_sparse_match git status --porcelain=v2
+	done &&
+
+	cat >folder1-full <<-EOF &&
+	rm ${SQ}folder1/0/0/0${SQ}
+	rm ${SQ}folder1/0/1${SQ}
+	rm ${SQ}folder1/a${SQ}
+	EOF
+
+	cat >folder1-sparse <<-EOF &&
+	rm ${SQ}folder1/${SQ}
+	EOF
+
+	# test wildcard
+	run_on_sparse git reset --hard &&
+	run_on_sparse git sparse-checkout reapply &&
+	test_sparse_match test_must_fail git rm folder1/* &&
+	run_on_sparse git rm --sparse folder1/* &&
+	test_cmp folder1-full sparse-checkout-out &&
+	test_cmp folder1-sparse sparse-index-out &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# test recursive rm
+	run_on_sparse git reset --hard &&
+	run_on_sparse git sparse-checkout reapply &&
+	test_sparse_match test_must_fail git rm --sparse folder1 &&
+	run_on_sparse git rm --sparse -r folder1 &&
+	test_cmp folder1-full sparse-checkout-out &&
+	test_cmp folder1-sparse sparse-index-out &&
+	test_sparse_match git status --porcelain=v2
+'
+
+test_expect_failure 'sparse index is not expanded: rm' '
+	init_repos &&
+
+	ensure_not_expanded rm deep/a &&
+
+	# test in-cone wildcard
+	git -C sparse-index reset --hard &&
+	ensure_not_expanded rm deep/* &&
+
+	# test recursive rm
+	git -C sparse-index reset --hard &&
+	ensure_not_expanded rm -r deep
+'
+
 test_done
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
  2022-08-03  4:51 [PATCH v1 0/4] rm: integrate with sparse-index Shaoxuan Yuan
  2022-08-03  4:51 ` [PATCH v1 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
@ 2022-08-03  4:51 ` Shaoxuan Yuan
  2022-08-03 14:35   ` Derrick Stolee
  2022-08-03  4:51 ` [PATCH v1 3/4] rm: expand the index only when necessary Shaoxuan Yuan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-03  4:51 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Method pathspec_needs_expanded_index() in reset.c from 4d1cfc1351
(reset: make --mixed sparse-aware, 2021-11-29) is reusable when we
need to verify if the index needs to be expanded when the command
is utilizing a pathspec rather than a literal path.

Move it to pathspec.h for reusability.

Add a few items to the function so it can better serve its purpose as
a standalone public function:

* Add a check in front so if the index is not sparse, return early since
  no expansion is needed.

* Add documentation to the function.

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/reset.c | 84 +---------------------------------------------
 pathspec.c      | 89 +++++++++++++++++++++++++++++++++++++++++++++++++
 pathspec.h      | 12 +++++++
 3 files changed, 102 insertions(+), 83 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 344fff8f3a..fdce6f8c85 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -174,88 +174,6 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	}
 }
 
-static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
-{
-	unsigned int i, pos;
-	int res = 0;
-	char *skip_worktree_seen = NULL;
-
-	/*
-	 * When using a magic pathspec, assume for the sake of simplicity that
-	 * the index needs to be expanded to match all matchable files.
-	 */
-	if (pathspec->magic)
-		return 1;
-
-	for (i = 0; i < pathspec->nr; i++) {
-		struct pathspec_item item = pathspec->items[i];
-
-		/*
-		 * If the pathspec item has a wildcard, the index should be expanded
-		 * if the pathspec has the possibility of matching a subset of entries inside
-		 * of a sparse directory (but not the entire directory).
-		 *
-		 * If the pathspec item is a literal path, the index only needs to be expanded
-		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
-		 * expand for in-cone files) and b) it doesn't match any sparse directories
-		 * (since we can reset whole sparse directories without expanding them).
-		 */
-		if (item.nowildcard_len < item.len) {
-			/*
-			 * Special case: if the pattern is a path inside the cone
-			 * followed by only wildcards, the pattern cannot match
-			 * partial sparse directories, so we know we don't need to
-			 * expand the index.
-			 *
-			 * Examples:
-			 * - in-cone/foo***: doesn't need expanded index
-			 * - not-in-cone/bar*: may need expanded index
-			 * - **.c: may need expanded index
-			 */
-			if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
-			    path_in_cone_mode_sparse_checkout(item.original, &the_index))
-				continue;
-
-			for (pos = 0; pos < active_nr; pos++) {
-				struct cache_entry *ce = active_cache[pos];
-
-				if (!S_ISSPARSEDIR(ce->ce_mode))
-					continue;
-
-				/*
-				 * If the pre-wildcard length is longer than the sparse
-				 * directory name and the sparse directory is the first
-				 * component of the pathspec, need to expand the index.
-				 */
-				if (item.nowildcard_len > ce_namelen(ce) &&
-				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
-					res = 1;
-					break;
-				}
-
-				/*
-				 * If the pre-wildcard length is shorter than the sparse
-				 * directory and the pathspec does not match the whole
-				 * directory, need to expand the index.
-				 */
-				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
-				    wildmatch(item.original, ce->name, 0)) {
-					res = 1;
-					break;
-				}
-			}
-		} else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
-			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
-			res = 1;
-
-		if (res > 0)
-			break;
-	}
-
-	free(skip_worktree_seen);
-	return res;
-}
-
 static int read_from_tree(const struct pathspec *pathspec,
 			  struct object_id *tree_oid,
 			  int intent_to_add)
@@ -273,7 +191,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.change = diff_change;
 	opt.add_remove = diff_addremove;
 
-	if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
+	if (pathspec->nr && pathspec_needs_expanded_index(&the_index, pathspec))
 		ensure_full_index(&the_index);
 
 	if (do_diff_cache(tree_oid, &opt))
diff --git a/pathspec.c b/pathspec.c
index 84ad9c73cf..46e77a85fe 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -759,3 +759,92 @@ int match_pathspec_attrs(struct index_state *istate,
 
 	return 1;
 }
+
+int pathspec_needs_expanded_index(struct index_state *istate,
+				  const struct pathspec *pathspec)
+{
+	unsigned int i, pos;
+	int res = 0;
+	char *skip_worktree_seen = NULL;
+
+	/*
+	 * If index is not sparse, no index expansion is needed.
+	 */
+	if (!istate->sparse_index)
+		return 0;
+
+	/*
+	 * When using a magic pathspec, assume for the sake of simplicity that
+	 * the index needs to be expanded to match all matchable files.
+	 */
+	if (pathspec->magic)
+		return 1;
+
+	for (i = 0; i < pathspec->nr; i++) {
+		struct pathspec_item item = pathspec->items[i];
+
+		/*
+		 * If the pathspec item has a wildcard, the index should be expanded
+		 * if the pathspec has the possibility of matching a subset of entries inside
+		 * of a sparse directory (but not the entire directory).
+		 *
+		 * If the pathspec item is a literal path, the index only needs to be expanded
+		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
+		 * expand for in-cone files) and b) it doesn't match any sparse directories
+		 * (since we can reset whole sparse directories without expanding them).
+		 */
+		if (item.nowildcard_len < item.len) {
+			/*
+			 * Special case: if the pattern is a path inside the cone
+			 * followed by only wildcards, the pattern cannot match
+			 * partial sparse directories, so we know we don't need to
+			 * expand the index.
+			 *
+			 * Examples:
+			 * - in-cone/foo***: doesn't need expanded index
+			 * - not-in-cone/bar*: may need expanded index
+			 * - **.c: may need expanded index
+			 */
+			if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
+			    path_in_cone_mode_sparse_checkout(item.original, istate))
+				continue;
+
+			for (pos = 0; pos < istate->cache_nr; pos++) {
+				struct cache_entry *ce = istate->cache[pos];
+
+				if (!S_ISSPARSEDIR(ce->ce_mode))
+					continue;
+
+				/*
+				 * If the pre-wildcard length is longer than the sparse
+				 * directory name and the sparse directory is the first
+				 * component of the pathspec, need to expand the index.
+				 */
+				if (item.nowildcard_len > ce_namelen(ce) &&
+				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
+					res = 1;
+					break;
+				}
+
+				/*
+				 * If the pre-wildcard length is shorter than the sparse
+				 * directory and the pathspec does not match the whole
+				 * directory, need to expand the index.
+				 */
+				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
+				    wildmatch(item.original, ce->name, 0)) {
+					res = 1;
+					break;
+				}
+			}
+		} else if (!path_in_cone_mode_sparse_checkout(item.original, istate) &&
+			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
+			res = 1;
+
+		if (res > 0)
+			break;
+	}
+
+	free(skip_worktree_seen);
+	return res;
+}
diff --git a/pathspec.h b/pathspec.h
index 402ebb8080..41f6adfbb4 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -171,4 +171,16 @@ int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
 
+/*
+ * Determine whether a pathspec will match only entire index entries (non-sparse
+ * files and/or entire sparse directories). If the pathspec has the potential to
+ * match partial contents of a sparse directory, return 1 to indicate the index
+ * should be expanded to match the  appropriate index entries.
+ *
+ * For the sake of simplicity, always return 1 if using a more complex "magic"
+ * pathspec.
+ */
+int pathspec_needs_expanded_index(struct index_state *istate,
+				  const struct pathspec *pathspec);
+
 #endif /* PATHSPEC_H */
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 3/4] rm: expand the index only when necessary
  2022-08-03  4:51 [PATCH v1 0/4] rm: integrate with sparse-index Shaoxuan Yuan
  2022-08-03  4:51 ` [PATCH v1 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
  2022-08-03  4:51 ` [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
@ 2022-08-03  4:51 ` Shaoxuan Yuan
  2022-08-03 14:40   ` Derrick Stolee
  2022-08-03  4:51 ` [PATCH v1 4/4] rm: integrate with sparse-index Shaoxuan Yuan
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
  4 siblings, 1 reply; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-03  4:51 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Originally, rm a pathspec that is out-of-cone in a sparse-index
environment, Git dies with "pathspec '<x>' did not match any files",
mainly because it does not expand the index so nothing is matched.

Remove the `ensure_full_index()` method so `git-rm` does not always
expand the index when the expansion is unnecessary, i.e. when
<pathspec> does not have any possibilities to match anything outside
of sparse-checkout definition.

Expand the index when the <pathspec> needs an expanded index, i.e. the
<pathspec> contains wildcard that may need a full-index or the
<pathspec> is simply outside of sparse-checkout definition.

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/rm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index 84a935a16e..58ed924f0d 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -296,8 +296,9 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(&the_index);
+	if (pathspec_needs_expanded_index(&the_index, &pathspec))
+		ensure_full_index(&the_index);
+
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 4/4] rm: integrate with sparse-index
  2022-08-03  4:51 [PATCH v1 0/4] rm: integrate with sparse-index Shaoxuan Yuan
                   ` (2 preceding siblings ...)
  2022-08-03  4:51 ` [PATCH v1 3/4] rm: expand the index only when necessary Shaoxuan Yuan
@ 2022-08-03  4:51 ` Shaoxuan Yuan
  2022-08-04 14:48   ` Derrick Stolee
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
  4 siblings, 1 reply; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-03  4:51 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Enable the sparse index within the `git-rm` command.

The `p2000` tests demonstrate a ~96% execution time reduction for
'git rm' using a sparse index.

Test                                     before  after
-------------------------------------------------------------
2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%
2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%

Also, normalize a behavioral difference of `git-rm` under sparse-index.
See related discussion [1].

`git-rm` a sparse-directory entry within a sparse-index enabled repo
behaves differently from a sparse directory within a sparse-checkout
enabled repo.

For example, in a sparse-index repo, where 'folder1' is a
sparse-directory entry, `git rm -r --sparse folder1` provides this:

        rm 'folder1/'

Whereas in a sparse-checkout repo *without* sparse-index, doing so
provides this:

        rm 'folder1/0/0/0'
        rm 'folder1/0/1'
        rm 'folder1/a'

Because `git rm` a sparse-directory entry does not need to expand the
index, therefore we should accept the current behavior, which is faster
than "expand the sparse-directory entry to match the sparse-checkout
situation".

Modify a previous test so such difference is not considered as an error.

[1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/rm.c                             | 2 ++
 t/perf/p2000-sparse-operations.sh        | 1 +
 t/t1092-sparse-checkout-compatibility.sh | 6 +++---
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index 58ed924f0d..b6ba859fe4 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -287,6 +287,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (!index_only)
 		setup_work_tree();
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	if (read_cache() < 0)
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index c181110a43..853513eb9b 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -123,5 +123,6 @@ test_perf_on_all git blame $SPARSE_CONE/f3/a
 test_perf_on_all git read-tree -mu HEAD
 test_perf_on_all git checkout-index -f --all
 test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
+test_perf_on_all git rm -f $SPARSE_CONE/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 75649e3265..58632fe483 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -912,7 +912,7 @@ test_expect_success 'read-tree --prefix' '
 	test_all_match git read-tree --prefix=deep/deeper1/deepest -u deepest &&
 	test_all_match git status --porcelain=v2 &&
 
-	test_all_match git rm -rf --sparse folder1/ &&
+	run_on_all git rm -rf --sparse folder1/ &&
 	test_all_match git read-tree --prefix=folder1/ -u update-folder1 &&
 	test_all_match git status --porcelain=v2 &&
 
@@ -1870,7 +1870,7 @@ test_expect_success 'rm pathspec inside sparse definition' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_failure 'rm pathspec outside sparse definition' '
+test_expect_success 'rm pathspec outside sparse definition' '
 	init_repos &&
 
 	for file in folder1/a folder1/0/1
@@ -1910,7 +1910,7 @@ test_expect_failure 'rm pathspec outside sparse definition' '
 	test_sparse_match git status --porcelain=v2
 '
 
-test_expect_failure 'sparse index is not expanded: rm' '
+test_expect_success 'sparse index is not expanded: rm' '
 	init_repos &&
 
 	ensure_not_expanded rm deep/a &&
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 1/4] t1092: add tests for `git-rm`
  2022-08-03  4:51 ` [PATCH v1 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
@ 2022-08-03 14:32   ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2022-08-03 14:32 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: vdye

On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
> Add tests for `git-rm`, make sure it behaves as expected when
> <pathspec> is both inside or outside of sparse-checkout definition.

This is good to demonstrate that we already have feature parity,
even if it is because we expand the sparse index immediately.
 
> Also add ensure_not_expanded test to make sure `git-rm` does not
> accidentally expand the index when <pathspec> is within the
> sparse-checkout definition.

> +test_expect_failure 'sparse index is not expanded: rm' '
> +	init_repos &&
> +
> +	ensure_not_expanded rm deep/a &&
> +
> +	# test in-cone wildcard
> +	git -C sparse-index reset --hard &&
> +	ensure_not_expanded rm deep/* &&
> +
> +	# test recursive rm
> +	git -C sparse-index reset --hard &&
> +	ensure_not_expanded rm -r deep
> +'
> +

Instead of adding a test_expect_failure here, I would wait to add
this as a test_expect_success in patch 4.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
  2022-08-03  4:51 ` [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
@ 2022-08-03 14:35   ` Derrick Stolee
  2022-08-05  7:53     ` Shaoxuan Yuan
  0 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2022-08-03 14:35 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: vdye

On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
> Method pathspec_needs_expanded_index() in reset.c from 4d1cfc1351
> (reset: make --mixed sparse-aware, 2021-11-29) is reusable when we
> need to verify if the index needs to be expanded when the command
> is utilizing a pathspec rather than a literal path.
> 
> Move it to pathspec.h for reusability.
> 
> Add a few items to the function so it can better serve its purpose as
> a standalone public function:
> 
> * Add a check in front so if the index is not sparse, return early since
>   no expansion is needed.
> 
> * Add documentation to the function.

I took a look at this diff on my machine with --color-moved, which
highlighted the other valuable thing about this move: it takes an
arbitrary 'struct index_state' pointer instead of using the_index and
active_cache. These are good things that might be worth mentioning in
the commit message.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/4] rm: expand the index only when necessary
  2022-08-03  4:51 ` [PATCH v1 3/4] rm: expand the index only when necessary Shaoxuan Yuan
@ 2022-08-03 14:40   ` Derrick Stolee
  2022-08-05  8:07     ` Shaoxuan Yuan
  0 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2022-08-03 14:40 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: vdye

On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
> Originally, rm a pathspec that is out-of-cone in a sparse-index
> environment, Git dies with "pathspec '<x>' did not match any files",
> mainly because it does not expand the index so nothing is matched.

This paragraph appears to be assuming that we've stopped expanding the
sparse index already. It might be worthwhile to rewrite this to say
"Before integrating 'git rm' with the sparse index, we need to..." or
something like that. 

> Remove the `ensure_full_index()` method so `git-rm` does not always
> expand the index when the expansion is unnecessary, i.e. when
> <pathspec> does not have any possibilities to match anything outside
> of sparse-checkout definition.
> 
> Expand the index when the <pathspec> needs an expanded index, i.e. the
> <pathspec> contains wildcard that may need a full-index or the
> <pathspec> is simply outside of sparse-checkout definition.
> 
> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
>  builtin/rm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/rm.c b/builtin/rm.c
> index 84a935a16e..58ed924f0d 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -296,8 +296,9 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>  
>  	seen = xcalloc(pathspec.nr, 1);
>  
> -	/* TODO: audit for interaction with sparse-index. */
> -	ensure_full_index(&the_index);
> +	if (pathspec_needs_expanded_index(&the_index, &pathspec))
> +		ensure_full_index(&the_index);
> +
>  	for (i = 0; i < active_nr; i++) {
>  		const struct cache_entry *ce = active_cache[i];

Looking back on the tests in patch 1, I don't see any tests that really
emphasize the kinds of pathspecs that could not ever integrate with the
sparse index. They are all of the form "folder1/*" or similar, making it
be something that could be seen as a prefix match. Such a pattern _could_
be integrated carefully with the sparse index.

Instead, something like `git rm "*/a"` would be much harder to integrate
with the sparse index. Could we add a test (in this patch) that checks
that kind of case. That would also help justify this as its own patch and
not squashed with patch 4.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 4/4] rm: integrate with sparse-index
  2022-08-03  4:51 ` [PATCH v1 4/4] rm: integrate with sparse-index Shaoxuan Yuan
@ 2022-08-04 14:48   ` Derrick Stolee
  2022-08-06  3:18     ` Shaoxuan Yuan
  0 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2022-08-04 14:48 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: vdye

On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
> Enable the sparse index within the `git-rm` command.
> 
> The `p2000` tests demonstrate a ~96% execution time reduction for
> 'git rm' using a sparse index.

Sorry that I got sidetracked yesterday when I was reviewing this
series, but I noticed something looking at these results:
 
> Test                                     before  after
> -------------------------------------------------------------
> 2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
> 2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%

The range of _growth_ here seemed odd, so I wanted to check if this was
due to a small sample size or not.

> 2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
> 2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%

These numbers are as expected.

>  test_perf_on_all git read-tree -mu HEAD
>  test_perf_on_all git checkout-index -f --all
>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
> +test_perf_on_all git rm -f $SPARSE_CONE/a

At first, I was confused why we needed '-f' and thought that maybe
this was turning into a no-op after the first deletion. However, the
test_perf_on_all helper does an "echo >>$SPARSE_CONE/a" before hand,
so the file exists _in the worktree_ every time. That requires '-f'
since otherwise Git complains that we have modifications.

However, after the first instance the file no longer exists in the
index, so we are losing some testing of the index modification.

We can fix this by resetting the index in each test loop:

  test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"

Running this version of the test with GIT_PERF_REPEAT_COUNT=10 and
using the Git repository itself, I get these numbers:

Test                              HEAD~1            HEAD
--------------------------------------------------------------------------
2000.74: git rm ... (full-v3)     0.41(0.37+0.05)   0.43(0.36+0.07) +4.9% 
2000.75: git rm ... (full-v4)     0.38(0.34+0.05)   0.39(0.35+0.05) +2.6% 
2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01)   0.05(0.05+0.00) -91.2%
2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02)   0.03(0.03+0.00) -94.7%

Yes, the 'git checkout' command is contributing to the overall
numbers, but it also already has the performance improvements of
the sparse-index, so it contributes only a little to the performance
on the left.

(Also note that the full index cases change only by amounts within
reasonable noise. The repeat count helps there.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
  2022-08-03 14:35   ` Derrick Stolee
@ 2022-08-05  7:53     ` Shaoxuan Yuan
  0 siblings, 0 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-05  7:53 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: vdye

On 8/3/2022 10:35 PM, Derrick Stolee wrote:
 > On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
 >> Method pathspec_needs_expanded_index() in reset.c from 4d1cfc1351
 >> (reset: make --mixed sparse-aware, 2021-11-29) is reusable when we
 >> need to verify if the index needs to be expanded when the command
 >> is utilizing a pathspec rather than a literal path.
 >>
 >> Move it to pathspec.h for reusability.
 >>
 >> Add a few items to the function so it can better serve its purpose as
 >> a standalone public function:
 >>
 >> * Add a check in front so if the index is not sparse, return early since
 >>   no expansion is needed.
 >>
 >> * Add documentation to the function.
 >
 > I took a look at this diff on my machine with --color-moved, which
 > highlighted the other valuable thing about this move: it takes an
 > arbitrary 'struct index_state' pointer instead of using the_index and
 > active_cache. These are good things that might be worth mentioning in
 > the commit message.

Thanks for pointing it out! Will add.

--
Thanks,
Shaoxuan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/4] rm: expand the index only when necessary
  2022-08-03 14:40   ` Derrick Stolee
@ 2022-08-05  8:07     ` Shaoxuan Yuan
  0 siblings, 0 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-05  8:07 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: vdye



On 8/3/2022 10:40 PM, Derrick Stolee wrote:
 > On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
 >> Originally, rm a pathspec that is out-of-cone in a sparse-index
 >> environment, Git dies with "pathspec '<x>' did not match any files",
 >> mainly because it does not expand the index so nothing is matched.
 >
 > This paragraph appears to be assuming that we've stopped expanding the
 > sparse index already. It might be worthwhile to rewrite this to say
 > "Before integrating 'git rm' with the sparse index, we need to..." or
 > something like that.

I have absolutely no idea why I wrote this paragraph this way, maybe
I was zoning out composing it. Will fix.

 >> Remove the `ensure_full_index()` method so `git-rm` does not always
 >> expand the index when the expansion is unnecessary, i.e. when
 >> <pathspec> does not have any possibilities to match anything outside
 >> of sparse-checkout definition.
 >>
 >> Expand the index when the <pathspec> needs an expanded index, i.e. the
 >> <pathspec> contains wildcard that may need a full-index or the
 >> <pathspec> is simply outside of sparse-checkout definition.
 >>
 >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
 >> ---
 >>  builtin/rm.c | 5 +++--
 >>  1 file changed, 3 insertions(+), 2 deletions(-)
 >>
 >> diff --git a/builtin/rm.c b/builtin/rm.c
 >> index 84a935a16e..58ed924f0d 100644
 >> --- a/builtin/rm.c
 >> +++ b/builtin/rm.c
 >> @@ -296,8 +296,9 @@ int cmd_rm(int argc, const char **argv, const 
char *prefix)
 >>
 >>      seen = xcalloc(pathspec.nr, 1);
 >>
 >> -    /* TODO: audit for interaction with sparse-index. */
 >> -    ensure_full_index(&the_index);
 >> +    if (pathspec_needs_expanded_index(&the_index, &pathspec))
 >> +        ensure_full_index(&the_index);
 >> +
 >>      for (i = 0; i < active_nr; i++) {
 >>          const struct cache_entry *ce = active_cache[i];
 >
 > Looking back on the tests in patch 1, I don't see any tests that really
 > emphasize the kinds of pathspecs that could not ever integrate with the
 > sparse index. They are all of the form "folder1/*" or similar, making it
 > be something that could be seen as a prefix match. Such a pattern _could_
 > be integrated carefully with the sparse index.
 >
 > Instead, something like `git rm "*/a"` would be much harder to integrate
 > with the sparse index. Could we add a test (in this patch) that checks
 > that kind of case. That would also help justify this as its own patch and
 > not squashed with patch 4.

Makes sense. Will fix.

--
Thanks,
Shaoxuan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 4/4] rm: integrate with sparse-index
  2022-08-04 14:48   ` Derrick Stolee
@ 2022-08-06  3:18     ` Shaoxuan Yuan
  0 siblings, 0 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-06  3:18 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: vdye

On 8/4/2022 10:48 PM, Derrick Stolee wrote:
 > On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
 >> Enable the sparse index within the `git-rm` command.
 >>
 >> The `p2000` tests demonstrate a ~96% execution time reduction for
 >> 'git rm' using a sparse index.
 >
 > Sorry that I got sidetracked yesterday when I was reviewing this
 > series, but I noticed something looking at these results:
 >
 >> Test                                     before  after
 >> -------------------------------------------------------------
 >> 2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
 >> 2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%
 >
 > The range of _growth_ here seemed odd, so I wanted to check if this was
 > due to a small sample size or not.

Yes, I do feel they are odd, as I've been checking pervious
integrations and p2000 results, they usuallly fall below 10% range.
But I was not discerning enough to name a problem :-(

 >> 2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
 >> 2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%
 >
 > These numbers are as expected.
 >
 >>  test_perf_on_all git read-tree -mu HEAD
 >>  test_perf_on_all git checkout-index -f --all
 >>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
 >> +test_perf_on_all git rm -f $SPARSE_CONE/a
 >
 > At first, I was confused why we needed '-f' and thought that maybe
 > this was turning into a no-op after the first deletion. However, the
 > test_perf_on_all helper does an "echo >>$SPARSE_CONE/a" before hand,
 > so the file exists _in the worktree_ every time. That requires '-f'
 > since otherwise Git complains that we have modifications.

Yeah, it took me some time to find out.

 > However, after the first instance the file no longer exists in the
 > index, so we are losing some testing of the index modification.

So true, I didn't realize at all.

 > We can fix this by resetting the index in each test loop:
 >
 >   test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- 
$SPARSE_CONE/a"
 >
 > Running this version of the test with GIT_PERF_REPEAT_COUNT=10 and
 > using the Git repository itself, I get these numbers:
 >
 > Test                              HEAD~1            HEAD
 > 
--------------------------------------------------------------------------
 > 2000.74: git rm ... (full-v3)     0.41(0.37+0.05) 0.43(0.36+0.07) +4.9%
 > 2000.75: git rm ... (full-v4)     0.38(0.34+0.05) 0.39(0.35+0.05) +2.6%
 > 2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01) 0.05(0.05+0.00) -91.2%
 > 2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02) 0.03(0.03+0.00) -94.7%
 >
 > Yes, the 'git checkout' command is contributing to the overall
 > numbers, but it also already has the performance improvements of
 > the sparse-index, so it contributes only a little to the performance
 > on the left.
 >
 > (Also note that the full index cases change only by amounts within
 > reasonable noise. The repeat count helps there.)

New thing learned, repeat to average out noise.

--
Thanks,
Shaoxuan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-03  4:51 [PATCH v1 0/4] rm: integrate with sparse-index Shaoxuan Yuan
                   ` (3 preceding siblings ...)
  2022-08-03  4:51 ` [PATCH v1 4/4] rm: integrate with sparse-index Shaoxuan Yuan
@ 2022-08-07  4:13 ` Shaoxuan Yuan
  2022-08-07  4:13   ` [PATCH v2 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
                     ` (5 more replies)
  4 siblings, 6 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-07  4:13 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

## Changes since PATCH v1 ##

1. Move `ensure_not_expanded` test from the first patch to the last one.

2. Mention the parameter of `pathspec_needs_expanded_index()` is
   changed to use `struct index_state`.

3. Modify `ensure_not_expanded` method to record Git commands' stderr
   and stdout.

4. Add a test 'rm pathspec expands index when necessary' to test
   the expected index expansion when different pathspec is supplied.

5. Modify p2000 test by resetting the index in each test loop, so the
   index modification is properly tested. Update the perf stats using
   the results from the modified test.

## PATCH v1 info ##

Turn on sparse-index feature within `git-rm` command.
Add necessary modifications and test them.

Shaoxuan Yuan (4):
  t1092: add tests for `git-rm`
  pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
  rm: expand the index only when necessary
  rm: integrate with sparse-index

 builtin/reset.c                          |  84 +------------------
 builtin/rm.c                             |   7 +-
 pathspec.c                               |  89 ++++++++++++++++++++
 pathspec.h                               |  12 +++
 t/perf/p2000-sparse-operations.sh        |   1 +
 t/t1092-sparse-checkout-compatibility.sh | 100 ++++++++++++++++++++++-
 6 files changed, 205 insertions(+), 88 deletions(-)

Range-diff against v1:
1:  6b424a1eb1 ! 1:  ea4162c6ab t1092: add tests for `git-rm`
    @@ Commit message
         Add tests for `git-rm`, make sure it behaves as expected when
         <pathspec> is both inside or outside of sparse-checkout definition.
     
    -    Also add ensure_not_expanded test to make sure `git-rm` does not
    -    accidentally expand the index when <pathspec> is within the
    -    sparse-checkout definition.
    -
    +    Helped-by: Victoria Dye <vdye@github.com>
    +    Helped-by: Derrick Stolee <derrickstolee@github.com>
         Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
     
      ## t/t1092-sparse-checkout-compatibility.sh ##
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'mv directory from
     +	test_cmp folder1-sparse sparse-index-out &&
     +	test_sparse_match git status --porcelain=v2
     +'
    -+
    -+test_expect_failure 'sparse index is not expanded: rm' '
    -+	init_repos &&
    -+
    -+	ensure_not_expanded rm deep/a &&
    -+
    -+	# test in-cone wildcard
    -+	git -C sparse-index reset --hard &&
    -+	ensure_not_expanded rm deep/* &&
    -+
    -+	# test recursive rm
    -+	git -C sparse-index reset --hard &&
    -+	ensure_not_expanded rm -r deep
    -+'
     +
      test_done
2:  c2cf8b3c86 ! 2:  061c675c46 pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
    @@ Commit message
         * Add a check in front so if the index is not sparse, return early since
           no expansion is needed.
     
    +    * It now takes an arbitrary 'struct index_state' pointer instead of
    +      using `the_index` and `active_cache`.
    +
         * Add documentation to the function.
     
    +    Helped-by: Victoria Dye <vdye@github.com>
    +    Helped-by: Derrick Stolee <derrickstolee@github.com>
         Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
     
      ## builtin/reset.c ##
3:  443ca7a682 ! 3:  1c4a85fad3 rm: expand the index only when necessary
    @@ Metadata
      ## Commit message ##
         rm: expand the index only when necessary
     
    -    Originally, rm a pathspec that is out-of-cone in a sparse-index
    -    environment, Git dies with "pathspec '<x>' did not match any files",
    -    mainly because it does not expand the index so nothing is matched.
    -
         Remove the `ensure_full_index()` method so `git-rm` does not always
         expand the index when the expansion is unnecessary, i.e. when
         <pathspec> does not have any possibilities to match anything outside
    @@ Commit message
         <pathspec> contains wildcard that may need a full-index or the
         <pathspec> is simply outside of sparse-checkout definition.
     
    +    Notice that the test 'rm pathspec expands index when necessary' in
    +    t1092 *is* testing this code change behavior, though it will be marked
    +    as 'test_expect_success' only in the next patch, where we officially
    +    mark `command_requires_full_index = 0`, so the index does not expand
    +    unless we tell it to do so.
    +
    +    Notice that because we also want `ensure_full_index` to record the
    +    stdout and stderr from Git command, a corresponding modification
    +    is also included in this patch. The reason we want the "sparse-index-out"
    +    and "sparse-index-err", is that we need to make sure there is no error
    +    from Git command itself, so we can rely on the `test_region` result
    +    and determine if the index is expanded or not.
    +
    +    Helped-by: Victoria Dye <vdye@github.com>
    +    Helped-by: Derrick Stolee <derrickstolee@github.com>
         Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
     
      ## builtin/rm.c ##
    @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
      	for (i = 0; i < active_nr; i++) {
      		const struct cache_entry *ce = active_cache[i];
      
    +
    + ## t/t1092-sparse-checkout-compatibility.sh ##
    +@@ t/t1092-sparse-checkout-compatibility.sh: ensure_not_expanded () {
    + 		shift &&
    + 		test_must_fail env \
    + 			GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    +-			git -C sparse-index "$@" || return 1
    ++			git -C sparse-index "$@" \
    ++			>sparse-index-out \
    ++			2>sparse-index-error || return 1
    + 	else
    + 		GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    +-			git -C sparse-index "$@" || return 1
    ++			git -C sparse-index "$@" \
    ++			>sparse-index-out \
    ++			2>sparse-index-error || return 1
    + 	fi &&
    + 	test_region ! index ensure_full_index trace2.txt
    + }
    +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'rm pathspec outside sparse definition' '
    + 	test_sparse_match git status --porcelain=v2
    + '
    + 
    ++test_expect_failure 'rm pathspec expands index when necessary' '
    ++	init_repos &&
    ++
    ++	# in-cone pathspec (do not expand)
    ++	ensure_not_expanded rm "deep/deep*" &&
    ++	test_must_be_empty sparse-index-err &&
    ++
    ++	# out-of-cone pathspec (expand)
    ++	! ensure_not_expanded rm --sparse "folder1/a*" &&
    ++	test_must_be_empty sparse-index-err &&
    ++
    ++	# pathspec that should expand index
    ++	! ensure_not_expanded rm "*/a" &&
    ++	test_must_be_empty sparse-index-err &&
    ++
    ++	! ensure_not_expanded rm "**a" &&
    ++	test_must_be_empty sparse-index-err
    ++'
    ++
    + test_done
4:  adb62ca9bf ! 4:  861be8a91e rm: integrate with sparse-index
    @@ Commit message
     
         Enable the sparse index within the `git-rm` command.
     
    -    The `p2000` tests demonstrate a ~96% execution time reduction for
    +    The `p2000` tests demonstrate a ~92% execution time reduction for
         'git rm' using a sparse index.
     
    -    Test                                     before  after
    -    -------------------------------------------------------------
    -    2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
    -    2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%
    -    2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
    -    2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%
    +    Test                              HEAD~1            HEAD
    +    --------------------------------------------------------------------------
    +    2000.74: git rm ... (full-v3)     0.41(0.37+0.05)   0.43(0.36+0.07) +4.9%
    +    2000.75: git rm ... (full-v4)     0.38(0.34+0.05)   0.39(0.35+0.05) +2.6%
    +    2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01)   0.05(0.05+0.00) -91.2%
    +    2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02)   0.03(0.03+0.00) -94.7%
     
         ----
         Also, normalize a behavioral difference of `git-rm` under sparse-index.
    @@ Commit message
     
         [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398
     
    +    Helped-by: Victoria Dye <vdye@github.com>
    +    Helped-by: Derrick Stolee <derrickstolee@github.com>
         Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
     
      ## builtin/rm.c ##
    @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git blame $SPARSE_CONE/f3/a
      test_perf_on_all git read-tree -mu HEAD
      test_perf_on_all git checkout-index -f --all
      test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
    -+test_perf_on_all git rm -f $SPARSE_CONE/a
    ++test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
      
      test_done
     
    @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'rm pathspec outsi
      	test_sparse_match git status --porcelain=v2
      '
      
    --test_expect_failure 'sparse index is not expanded: rm' '
    -+test_expect_success 'sparse index is not expanded: rm' '
    +-test_expect_failure 'rm pathspec expands index when necessary' '
    ++test_expect_success 'rm pathspec expands index when necessary' '
      	init_repos &&
      
    - 	ensure_not_expanded rm deep/a &&
    + 	# in-cone pathspec (do not expand)
    +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'rm pathspec expands index when necessary' '
    + 	test_must_be_empty sparse-index-err
    + '
    + 
    ++test_expect_success 'sparse index is not expanded: rm' '
    ++	init_repos &&
    ++
    ++	ensure_not_expanded rm deep/a &&
    ++
    ++	# test in-cone wildcard
    ++	git -C sparse-index reset --hard &&
    ++	ensure_not_expanded rm deep/* &&
    ++
    ++	# test recursive rm
    ++	git -C sparse-index reset --hard &&
    ++	ensure_not_expanded rm -r deep
    ++'
    ++
    + test_done

base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9
-- 
2.37.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/4] t1092: add tests for `git-rm`
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
@ 2022-08-07  4:13   ` Shaoxuan Yuan
  2022-08-10 12:47     ` Derrick Stolee
  2022-08-07  4:13   ` [PATCH v2 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-07  4:13 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Add tests for `git-rm`, make sure it behaves as expected when
<pathspec> is both inside or outside of sparse-checkout definition.

Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 763c6cc684..c9300b77dd 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1853,4 +1853,61 @@ test_expect_success 'mv directory from out-of-cone to in-cone' '
 	grep -e "H deep/0/1" actual
 '
 
+test_expect_success 'rm pathspec inside sparse definition' '
+	init_repos &&
+
+	test_all_match git rm deep/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	# test wildcard
+	run_on_all git reset --hard &&
+	test_all_match git rm deep/* &&
+	test_all_match git status --porcelain=v2 &&
+
+	# test recursive rm
+	run_on_all git reset --hard &&
+	test_all_match git rm -r deep &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_failure 'rm pathspec outside sparse definition' '
+	init_repos &&
+
+	for file in folder1/a folder1/0/1
+	do
+		test_sparse_match test_must_fail git rm $file &&
+		test_sparse_match test_must_fail git rm --cached $file &&
+		test_sparse_match git rm --sparse $file &&
+		test_sparse_match git status --porcelain=v2
+	done &&
+
+	cat >folder1-full <<-EOF &&
+	rm ${SQ}folder1/0/0/0${SQ}
+	rm ${SQ}folder1/0/1${SQ}
+	rm ${SQ}folder1/a${SQ}
+	EOF
+
+	cat >folder1-sparse <<-EOF &&
+	rm ${SQ}folder1/${SQ}
+	EOF
+
+	# test wildcard
+	run_on_sparse git reset --hard &&
+	run_on_sparse git sparse-checkout reapply &&
+	test_sparse_match test_must_fail git rm folder1/* &&
+	run_on_sparse git rm --sparse folder1/* &&
+	test_cmp folder1-full sparse-checkout-out &&
+	test_cmp folder1-sparse sparse-index-out &&
+	test_sparse_match git status --porcelain=v2 &&
+
+	# test recursive rm
+	run_on_sparse git reset --hard &&
+	run_on_sparse git sparse-checkout reapply &&
+	test_sparse_match test_must_fail git rm --sparse folder1 &&
+	run_on_sparse git rm --sparse -r folder1 &&
+	test_cmp folder1-full sparse-checkout-out &&
+	test_cmp folder1-sparse sparse-index-out &&
+	test_sparse_match git status --porcelain=v2
+'
+
 test_done
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
  2022-08-07  4:13   ` [PATCH v2 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
@ 2022-08-07  4:13   ` Shaoxuan Yuan
  2022-08-07  4:13   ` [PATCH v2 3/4] rm: expand the index only when necessary Shaoxuan Yuan
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-07  4:13 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Method pathspec_needs_expanded_index() in reset.c from 4d1cfc1351
(reset: make --mixed sparse-aware, 2021-11-29) is reusable when we
need to verify if the index needs to be expanded when the command
is utilizing a pathspec rather than a literal path.

Move it to pathspec.h for reusability.

Add a few items to the function so it can better serve its purpose as
a standalone public function:

* Add a check in front so if the index is not sparse, return early since
  no expansion is needed.

* It now takes an arbitrary 'struct index_state' pointer instead of
  using `the_index` and `active_cache`.

* Add documentation to the function.

Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/reset.c | 84 +---------------------------------------------
 pathspec.c      | 89 +++++++++++++++++++++++++++++++++++++++++++++++++
 pathspec.h      | 12 +++++++
 3 files changed, 102 insertions(+), 83 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 344fff8f3a..fdce6f8c85 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -174,88 +174,6 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	}
 }
 
-static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
-{
-	unsigned int i, pos;
-	int res = 0;
-	char *skip_worktree_seen = NULL;
-
-	/*
-	 * When using a magic pathspec, assume for the sake of simplicity that
-	 * the index needs to be expanded to match all matchable files.
-	 */
-	if (pathspec->magic)
-		return 1;
-
-	for (i = 0; i < pathspec->nr; i++) {
-		struct pathspec_item item = pathspec->items[i];
-
-		/*
-		 * If the pathspec item has a wildcard, the index should be expanded
-		 * if the pathspec has the possibility of matching a subset of entries inside
-		 * of a sparse directory (but not the entire directory).
-		 *
-		 * If the pathspec item is a literal path, the index only needs to be expanded
-		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
-		 * expand for in-cone files) and b) it doesn't match any sparse directories
-		 * (since we can reset whole sparse directories without expanding them).
-		 */
-		if (item.nowildcard_len < item.len) {
-			/*
-			 * Special case: if the pattern is a path inside the cone
-			 * followed by only wildcards, the pattern cannot match
-			 * partial sparse directories, so we know we don't need to
-			 * expand the index.
-			 *
-			 * Examples:
-			 * - in-cone/foo***: doesn't need expanded index
-			 * - not-in-cone/bar*: may need expanded index
-			 * - **.c: may need expanded index
-			 */
-			if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
-			    path_in_cone_mode_sparse_checkout(item.original, &the_index))
-				continue;
-
-			for (pos = 0; pos < active_nr; pos++) {
-				struct cache_entry *ce = active_cache[pos];
-
-				if (!S_ISSPARSEDIR(ce->ce_mode))
-					continue;
-
-				/*
-				 * If the pre-wildcard length is longer than the sparse
-				 * directory name and the sparse directory is the first
-				 * component of the pathspec, need to expand the index.
-				 */
-				if (item.nowildcard_len > ce_namelen(ce) &&
-				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
-					res = 1;
-					break;
-				}
-
-				/*
-				 * If the pre-wildcard length is shorter than the sparse
-				 * directory and the pathspec does not match the whole
-				 * directory, need to expand the index.
-				 */
-				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
-				    wildmatch(item.original, ce->name, 0)) {
-					res = 1;
-					break;
-				}
-			}
-		} else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
-			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
-			res = 1;
-
-		if (res > 0)
-			break;
-	}
-
-	free(skip_worktree_seen);
-	return res;
-}
-
 static int read_from_tree(const struct pathspec *pathspec,
 			  struct object_id *tree_oid,
 			  int intent_to_add)
@@ -273,7 +191,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.change = diff_change;
 	opt.add_remove = diff_addremove;
 
-	if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
+	if (pathspec->nr && pathspec_needs_expanded_index(&the_index, pathspec))
 		ensure_full_index(&the_index);
 
 	if (do_diff_cache(tree_oid, &opt))
diff --git a/pathspec.c b/pathspec.c
index 84ad9c73cf..46e77a85fe 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -759,3 +759,92 @@ int match_pathspec_attrs(struct index_state *istate,
 
 	return 1;
 }
+
+int pathspec_needs_expanded_index(struct index_state *istate,
+				  const struct pathspec *pathspec)
+{
+	unsigned int i, pos;
+	int res = 0;
+	char *skip_worktree_seen = NULL;
+
+	/*
+	 * If index is not sparse, no index expansion is needed.
+	 */
+	if (!istate->sparse_index)
+		return 0;
+
+	/*
+	 * When using a magic pathspec, assume for the sake of simplicity that
+	 * the index needs to be expanded to match all matchable files.
+	 */
+	if (pathspec->magic)
+		return 1;
+
+	for (i = 0; i < pathspec->nr; i++) {
+		struct pathspec_item item = pathspec->items[i];
+
+		/*
+		 * If the pathspec item has a wildcard, the index should be expanded
+		 * if the pathspec has the possibility of matching a subset of entries inside
+		 * of a sparse directory (but not the entire directory).
+		 *
+		 * If the pathspec item is a literal path, the index only needs to be expanded
+		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
+		 * expand for in-cone files) and b) it doesn't match any sparse directories
+		 * (since we can reset whole sparse directories without expanding them).
+		 */
+		if (item.nowildcard_len < item.len) {
+			/*
+			 * Special case: if the pattern is a path inside the cone
+			 * followed by only wildcards, the pattern cannot match
+			 * partial sparse directories, so we know we don't need to
+			 * expand the index.
+			 *
+			 * Examples:
+			 * - in-cone/foo***: doesn't need expanded index
+			 * - not-in-cone/bar*: may need expanded index
+			 * - **.c: may need expanded index
+			 */
+			if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
+			    path_in_cone_mode_sparse_checkout(item.original, istate))
+				continue;
+
+			for (pos = 0; pos < istate->cache_nr; pos++) {
+				struct cache_entry *ce = istate->cache[pos];
+
+				if (!S_ISSPARSEDIR(ce->ce_mode))
+					continue;
+
+				/*
+				 * If the pre-wildcard length is longer than the sparse
+				 * directory name and the sparse directory is the first
+				 * component of the pathspec, need to expand the index.
+				 */
+				if (item.nowildcard_len > ce_namelen(ce) &&
+				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
+					res = 1;
+					break;
+				}
+
+				/*
+				 * If the pre-wildcard length is shorter than the sparse
+				 * directory and the pathspec does not match the whole
+				 * directory, need to expand the index.
+				 */
+				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
+				    wildmatch(item.original, ce->name, 0)) {
+					res = 1;
+					break;
+				}
+			}
+		} else if (!path_in_cone_mode_sparse_checkout(item.original, istate) &&
+			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
+			res = 1;
+
+		if (res > 0)
+			break;
+	}
+
+	free(skip_worktree_seen);
+	return res;
+}
diff --git a/pathspec.h b/pathspec.h
index 402ebb8080..41f6adfbb4 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -171,4 +171,16 @@ int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
 
+/*
+ * Determine whether a pathspec will match only entire index entries (non-sparse
+ * files and/or entire sparse directories). If the pathspec has the potential to
+ * match partial contents of a sparse directory, return 1 to indicate the index
+ * should be expanded to match the  appropriate index entries.
+ *
+ * For the sake of simplicity, always return 1 if using a more complex "magic"
+ * pathspec.
+ */
+int pathspec_needs_expanded_index(struct index_state *istate,
+				  const struct pathspec *pathspec);
+
 #endif /* PATHSPEC_H */
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/4] rm: expand the index only when necessary
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
  2022-08-07  4:13   ` [PATCH v2 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
  2022-08-07  4:13   ` [PATCH v2 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
@ 2022-08-07  4:13   ` Shaoxuan Yuan
  2022-08-10  0:24     ` Victoria Dye
  2022-08-07  4:13   ` [PATCH v2 4/4] rm: integrate with sparse-index Shaoxuan Yuan
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-07  4:13 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Remove the `ensure_full_index()` method so `git-rm` does not always
expand the index when the expansion is unnecessary, i.e. when
<pathspec> does not have any possibilities to match anything outside
of sparse-checkout definition.

Expand the index when the <pathspec> needs an expanded index, i.e. the
<pathspec> contains wildcard that may need a full-index or the
<pathspec> is simply outside of sparse-checkout definition.

Notice that the test 'rm pathspec expands index when necessary' in
t1092 *is* testing this code change behavior, though it will be marked
as 'test_expect_success' only in the next patch, where we officially
mark `command_requires_full_index = 0`, so the index does not expand
unless we tell it to do so.

Notice that because we also want `ensure_full_index` to record the
stdout and stderr from Git command, a corresponding modification
is also included in this patch. The reason we want the "sparse-index-out"
and "sparse-index-err", is that we need to make sure there is no error
from Git command itself, so we can rely on the `test_region` result
and determine if the index is expanded or not.

Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/rm.c                             |  5 +++--
 t/t1092-sparse-checkout-compatibility.sh | 27 ++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index 84a935a16e..58ed924f0d 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -296,8 +296,9 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(&the_index);
+	if (pathspec_needs_expanded_index(&the_index, &pathspec))
+		ensure_full_index(&the_index);
+
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index c9300b77dd..94464cf911 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1340,10 +1340,14 @@ ensure_not_expanded () {
 		shift &&
 		test_must_fail env \
 			GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
-			git -C sparse-index "$@" || return 1
+			git -C sparse-index "$@" \
+			>sparse-index-out \
+			2>sparse-index-error || return 1
 	else
 		GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
-			git -C sparse-index "$@" || return 1
+			git -C sparse-index "$@" \
+			>sparse-index-out \
+			2>sparse-index-error || return 1
 	fi &&
 	test_region ! index ensure_full_index trace2.txt
 }
@@ -1910,4 +1914,23 @@ test_expect_failure 'rm pathspec outside sparse definition' '
 	test_sparse_match git status --porcelain=v2
 '
 
+test_expect_failure 'rm pathspec expands index when necessary' '
+	init_repos &&
+
+	# in-cone pathspec (do not expand)
+	ensure_not_expanded rm "deep/deep*" &&
+	test_must_be_empty sparse-index-err &&
+
+	# out-of-cone pathspec (expand)
+	! ensure_not_expanded rm --sparse "folder1/a*" &&
+	test_must_be_empty sparse-index-err &&
+
+	# pathspec that should expand index
+	! ensure_not_expanded rm "*/a" &&
+	test_must_be_empty sparse-index-err &&
+
+	! ensure_not_expanded rm "**a" &&
+	test_must_be_empty sparse-index-err
+'
+
 test_done
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/4] rm: integrate with sparse-index
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
                     ` (2 preceding siblings ...)
  2022-08-07  4:13   ` [PATCH v2 3/4] rm: expand the index only when necessary Shaoxuan Yuan
@ 2022-08-07  4:13   ` Shaoxuan Yuan
  2022-08-08 17:24   ` [PATCH v2 0/4] " Junio C Hamano
  2022-08-10  0:27   ` Victoria Dye
  5 siblings, 0 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-07  4:13 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, Shaoxuan Yuan

Enable the sparse index within the `git-rm` command.

The `p2000` tests demonstrate a ~92% execution time reduction for
'git rm' using a sparse index.

Test                              HEAD~1            HEAD
--------------------------------------------------------------------------
2000.74: git rm ... (full-v3)     0.41(0.37+0.05)   0.43(0.36+0.07) +4.9%
2000.75: git rm ... (full-v4)     0.38(0.34+0.05)   0.39(0.35+0.05) +2.6%
2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01)   0.05(0.05+0.00) -91.2%
2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02)   0.03(0.03+0.00) -94.7%

----
Also, normalize a behavioral difference of `git-rm` under sparse-index.
See related discussion [1].

`git-rm` a sparse-directory entry within a sparse-index enabled repo
behaves differently from a sparse directory within a sparse-checkout
enabled repo.

For example, in a sparse-index repo, where 'folder1' is a
sparse-directory entry, `git rm -r --sparse folder1` provides this:

        rm 'folder1/'

Whereas in a sparse-checkout repo *without* sparse-index, doing so
provides this:

        rm 'folder1/0/0/0'
        rm 'folder1/0/1'
        rm 'folder1/a'

Because `git rm` a sparse-directory entry does not need to expand the
index, therefore we should accept the current behavior, which is faster
than "expand the sparse-directory entry to match the sparse-checkout
situation".

Modify a previous test so such difference is not considered as an error.

[1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398

Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/rm.c                             |  2 ++
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 20 +++++++++++++++++---
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index 58ed924f0d..b6ba859fe4 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -287,6 +287,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (!index_only)
 		setup_work_tree();
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	if (read_cache() < 0)
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index c181110a43..fce8151d41 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -123,5 +123,6 @@ test_perf_on_all git blame $SPARSE_CONE/f3/a
 test_perf_on_all git read-tree -mu HEAD
 test_perf_on_all git checkout-index -f --all
 test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
+test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 94464cf911..68ded9063b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -912,7 +912,7 @@ test_expect_success 'read-tree --prefix' '
 	test_all_match git read-tree --prefix=deep/deeper1/deepest -u deepest &&
 	test_all_match git status --porcelain=v2 &&
 
-	test_all_match git rm -rf --sparse folder1/ &&
+	run_on_all git rm -rf --sparse folder1/ &&
 	test_all_match git read-tree --prefix=folder1/ -u update-folder1 &&
 	test_all_match git status --porcelain=v2 &&
 
@@ -1874,7 +1874,7 @@ test_expect_success 'rm pathspec inside sparse definition' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_failure 'rm pathspec outside sparse definition' '
+test_expect_success 'rm pathspec outside sparse definition' '
 	init_repos &&
 
 	for file in folder1/a folder1/0/1
@@ -1914,7 +1914,7 @@ test_expect_failure 'rm pathspec outside sparse definition' '
 	test_sparse_match git status --porcelain=v2
 '
 
-test_expect_failure 'rm pathspec expands index when necessary' '
+test_expect_success 'rm pathspec expands index when necessary' '
 	init_repos &&
 
 	# in-cone pathspec (do not expand)
@@ -1933,4 +1933,18 @@ test_expect_failure 'rm pathspec expands index when necessary' '
 	test_must_be_empty sparse-index-err
 '
 
+test_expect_success 'sparse index is not expanded: rm' '
+	init_repos &&
+
+	ensure_not_expanded rm deep/a &&
+
+	# test in-cone wildcard
+	git -C sparse-index reset --hard &&
+	ensure_not_expanded rm deep/* &&
+
+	# test recursive rm
+	git -C sparse-index reset --hard &&
+	ensure_not_expanded rm -r deep
+'
+
 test_done
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
                     ` (3 preceding siblings ...)
  2022-08-07  4:13   ` [PATCH v2 4/4] rm: integrate with sparse-index Shaoxuan Yuan
@ 2022-08-08 17:24   ` Junio C Hamano
  2022-08-08 17:51     ` Victoria Dye
  2022-08-10  0:27   ` Victoria Dye
  5 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2022-08-08 17:24 UTC (permalink / raw)
  To: Shaoxuan Yuan; +Cc: git, vdye, derrickstolee

Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes:

> Turn on sparse-index feature within `git-rm` command.

That is a clearly written single-line summary.

> Add necessary modifications and test them.

This states an obvious without adding any useful information.  What
modifications were necessary and why they were necessary, what old
behaviour was undesirable and added tests prevent them to appear
again?  These details are better left to the proposed log message of
individual patches.

This series, when queued on top of 'master' without anything else,
seems to pass its own tests, but when combined with the "reset and
checkout fixes" <pull.1312.v2.git.1659841030.gitgitgadget@gmail.com>
by Victoria, the last one t1092 fails.

---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
expecting success of 1092.27 'reset hard with removed sparse dir':
        init_repos &&

        test_all_match git rm -r --sparse folder1 &&
        test_all_match git status --porcelain=v2 &&

        test_all_match git reset --hard &&
        test_all_match git status --porcelain=v2 &&

        cat >expect <<-\EOF &&
        folder1/
        EOF

        git -C sparse-index ls-files --sparse folder1 >out &&
        test_cmp expect out

HEAD is now at 703fd3e initial commit
HEAD is now at 703fd3e initial commit
HEAD is now at 703fd3e initial commit
--- full-checkout-out   2022-08-08 17:19:19.820840016 +0000
+++ sparse-index-out    2022-08-08 17:19:19.836841239 +0000
@@ -1,3 +1 @@
-rm 'folder1/0/0/0'
-rm 'folder1/0/1'
-rm 'folder1/a'
+rm 'folder1/'
not ok 27 - reset hard with removed sparse dir
#
#               init_repos &&
#
#               test_all_match git rm -r --sparse folder1 &&
#               test_all_match git status --porcelain=v2 &&
#
#               test_all_match git reset --hard &&
#               test_all_match git status --porcelain=v2 &&
#
#               cat >expect <<-\EOF &&
#               folder1/
#               EOF
#
#               git -C sparse-index ls-files --sparse folder1 >out &&
#               test_cmp expect out
#
---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 

When we have the index (incorrectly) fully expanded, and may have
(incorrectly) working tree files outside of our sparse-cone of
interest, we may have paths under the 'folder1/' that we may need to
remove (and report as removed), but after the bug that causes us to
"incorrectly check out" gets fixed, perhaps the 'folder1/' is the
only thing that needs removed if it is outside our sparse-cone of
interest?  IOW, is the test hardcoding the behaviour of a bug that
was fixed?  I dunno.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-08 17:24   ` [PATCH v2 0/4] " Junio C Hamano
@ 2022-08-08 17:51     ` Victoria Dye
  2022-08-08 19:01       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Victoria Dye @ 2022-08-08 17:51 UTC (permalink / raw)
  To: Junio C Hamano, Shaoxuan Yuan; +Cc: git, derrickstolee

Junio C Hamano wrote:
> Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes:
> 
>> Turn on sparse-index feature within `git-rm` command.
> 
> That is a clearly written single-line summary.
> 
>> Add necessary modifications and test them.
> 
> This states an obvious without adding any useful information.  What
> modifications were necessary and why they were necessary, what old
> behaviour was undesirable and added tests prevent them to appear
> again?  These details are better left to the proposed log message of
> individual patches.
> 
> This series, when queued on top of 'master' without anything else,
> seems to pass its own tests, but when combined with the "reset and
> checkout fixes" <pull.1312.v2.git.1659841030.gitgitgadget@gmail.com>
> by Victoria, the last one t1092 fails.
> 
> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
> expecting success of 1092.27 'reset hard with removed sparse dir':
>         init_repos &&
> 
>         test_all_match git rm -r --sparse folder1 &&
>         test_all_match git status --porcelain=v2 &&
> 
>         test_all_match git reset --hard &&
>         test_all_match git status --porcelain=v2 &&
> 
>         cat >expect <<-\EOF &&
>         folder1/
>         EOF
> 
>         git -C sparse-index ls-files --sparse folder1 >out &&
>         test_cmp expect out
> 
> HEAD is now at 703fd3e initial commit
> HEAD is now at 703fd3e initial commit
> HEAD is now at 703fd3e initial commit
> --- full-checkout-out   2022-08-08 17:19:19.820840016 +0000
> +++ sparse-index-out    2022-08-08 17:19:19.836841239 +0000
> @@ -1,3 +1 @@
> -rm 'folder1/0/0/0'
> -rm 'folder1/0/1'
> -rm 'folder1/a'
> +rm 'folder1/'
> not ok 27 - reset hard with removed sparse dir
> #
> #               init_repos &&
> #
> #               test_all_match git rm -r --sparse folder1 &&
> #               test_all_match git status --porcelain=v2 &&
> #
> #               test_all_match git reset --hard &&
> #               test_all_match git status --porcelain=v2 &&
> #
> #               cat >expect <<-\EOF &&
> #               folder1/
> #               EOF
> #
> #               git -C sparse-index ls-files --sparse folder1 >out &&
> #               test_cmp expect out
> #
> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
> 
> When we have the index (incorrectly) fully expanded, and may have
> (incorrectly) working tree files outside of our sparse-cone of
> interest, we may have paths under the 'folder1/' that we may need to
> remove (and report as removed), but after the bug that causes us to
> "incorrectly check out" gets fixed, perhaps the 'folder1/' is the
> only thing that needs removed if it is outside our sparse-cone of
> interest?  IOW, is the test hardcoding the behaviour of a bug that
> was fixed?  I dunno.
> 

This test failure is a result of a behavior change in the logging of 'git
rm' in this series when removing a sparse directory. Patch 4 talks about it
in more detail [1]; I failed to account for it in my series.

I'll re-roll my series and replace the 'test_all_match' on that line to
'run_on_all' to avoid the failure. This isn't the first conflict my series
has caused with this one, so I'll make sure everything builds and tests pass
with the changes from both series before resubmitting.

Thanks for catching this, and sorry for the inconvenience.

[1] https://lore.kernel.org/git/20220807041335.1790658-5-shaoxuan.yuan02@gmail.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-08 17:51     ` Victoria Dye
@ 2022-08-08 19:01       ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-08-08 19:01 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Shaoxuan Yuan, git, derrickstolee

Victoria Dye <vdye@github.com> writes:

> This test failure is a result of a behavior change in the logging of 'git
> rm' in this series when removing a sparse directory. Patch 4 talks about it
> in more detail [1]; I failed to account for it in my series.

Ah, OK.  When a directory being "git rm"'ed is outside the cone(s)
of our interest, it indeed does feel like a waste to expand the
index only to report which paths in that directory are being
removed.  It would be OK for the behaviour to be different inside
and outside the cone(s).



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/4] rm: expand the index only when necessary
  2022-08-07  4:13   ` [PATCH v2 3/4] rm: expand the index only when necessary Shaoxuan Yuan
@ 2022-08-10  0:24     ` Victoria Dye
  0 siblings, 0 replies; 25+ messages in thread
From: Victoria Dye @ 2022-08-10  0:24 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: derrickstolee

Shaoxuan Yuan wrote:
> Remove the `ensure_full_index()` method so `git-rm` does not always
> expand the index when the expansion is unnecessary, i.e. when
> <pathspec> does not have any possibilities to match anything outside
> of sparse-checkout definition.
> 
> Expand the index when the <pathspec> needs an expanded index, i.e. the
> <pathspec> contains wildcard that may need a full-index or the
> <pathspec> is simply outside of sparse-checkout definition.
> 
> Notice that the test 'rm pathspec expands index when necessary' in
> t1092 *is* testing this code change behavior, though it will be marked
> as 'test_expect_success' only in the next patch, where we officially
> mark `command_requires_full_index = 0`, so the index does not expand
> unless we tell it to do so.
> 
> Notice that because we also want `ensure_full_index` to record the
> stdout and stderr from Git command, a corresponding modification
> is also included in this patch. The reason we want the "sparse-index-out"
> and "sparse-index-err", is that we need to make sure there is no error
> from Git command itself, so we can rely on the `test_region` result
> and determine if the index is expanded or not.

I think this patch might make more sense _after_ patch 4. Without the
changes in patch 4, modifying how a sparse index is handled here doesn't
immediately change any functionality. Then, patch 4 effectively makes its
own changes (enable the sparse index) + "turns on" the changes from this
series, all at once.

I usually recommend trying to make the effects of a patch testable in that
patch (as long as it doesn't make a series more complicated/confusing). In
this case, it looks like you could swap the order of the commits and only
need to adjust the 'test_expect_success'/'test_expect_failure' settings on
the tests, making it a good candidate for this kind of reordering.

All that said, I don't think changing this is worth a re-roll on its own -
it's moreso intended as "things to consider for future series". :) 

> 
> Helped-by: Victoria Dye <vdye@github.com>
> Helped-by: Derrick Stolee <derrickstolee@github.com>
> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
>  builtin/rm.c                             |  5 +++--
>  t/t1092-sparse-checkout-compatibility.sh | 27 ++++++++++++++++++++++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/rm.c b/builtin/rm.c
> index 84a935a16e..58ed924f0d 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -296,8 +296,9 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>  
>  	seen = xcalloc(pathspec.nr, 1);
>  
> -	/* TODO: audit for interaction with sparse-index. */
> -	ensure_full_index(&the_index);
> +	if (pathspec_needs_expanded_index(&the_index, &pathspec))
> +		ensure_full_index(&the_index);
> +
>  	for (i = 0; i < active_nr; i++) {
>  		const struct cache_entry *ce = active_cache[i];
>  
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c9300b77dd..94464cf911 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1340,10 +1340,14 @@ ensure_not_expanded () {
>  		shift &&
>  		test_must_fail env \
>  			GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> -			git -C sparse-index "$@" || return 1
> +			git -C sparse-index "$@" \
> +			>sparse-index-out \
> +			2>sparse-index-error || return 1
>  	else
>  		GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> -			git -C sparse-index "$@" || return 1
> +			git -C sparse-index "$@" \
> +			>sparse-index-out \
> +			2>sparse-index-error || return 1
>  	fi &&
>  	test_region ! index ensure_full_index trace2.txt
>  }
> @@ -1910,4 +1914,23 @@ test_expect_failure 'rm pathspec outside sparse definition' '
>  	test_sparse_match git status --porcelain=v2
>  '
>  
> +test_expect_failure 'rm pathspec expands index when necessary' '
> +	init_repos &&
> +
> +	# in-cone pathspec (do not expand)
> +	ensure_not_expanded rm "deep/deep*" &&
> +	test_must_be_empty sparse-index-err &&
> +
> +	# out-of-cone pathspec (expand)
> +	! ensure_not_expanded rm --sparse "folder1/a*" &&
> +	test_must_be_empty sparse-index-err &&
> +
> +	# pathspec that should expand index
> +	! ensure_not_expanded rm "*/a" &&
> +	test_must_be_empty sparse-index-err &&
> +
> +	! ensure_not_expanded rm "**a" &&
> +	test_must_be_empty sparse-index-err
> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
                     ` (4 preceding siblings ...)
  2022-08-08 17:24   ` [PATCH v2 0/4] " Junio C Hamano
@ 2022-08-10  0:27   ` Victoria Dye
  2022-08-10  0:31     ` Shaoxuan Yuan
  2022-08-12 18:36     ` Junio C Hamano
  5 siblings, 2 replies; 25+ messages in thread
From: Victoria Dye @ 2022-08-10  0:27 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: derrickstolee

Shaoxuan Yuan wrote:
> ## Changes since PATCH v1 ##
> 
> 1. Move `ensure_not_expanded` test from the first patch to the last one.
> 
> 2. Mention the parameter of `pathspec_needs_expanded_index()` is
>    changed to use `struct index_state`.
> 
> 3. Modify `ensure_not_expanded` method to record Git commands' stderr
>    and stdout.
> 
> 4. Add a test 'rm pathspec expands index when necessary' to test
>    the expected index expansion when different pathspec is supplied.
> 
> 5. Modify p2000 test by resetting the index in each test loop, so the
>    index modification is properly tested. Update the perf stats using
>    the results from the modified test.
> 
> ## PATCH v1 info ##
> 
> Turn on sparse-index feature within `git-rm` command.
> Add necessary modifications and test them.

Other than a completely optional recommendation on commit ordering [1], I didn't have any comments on any individual patches. This series looks good to me!

[1] https://lore.kernel.org/git/2c0cb658-cd5a-420a-d313-6839149b9b40@github.com/

> 
> Shaoxuan Yuan (4):
>   t1092: add tests for `git-rm`
>   pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
>   rm: expand the index only when necessary
>   rm: integrate with sparse-index
> 
>  builtin/reset.c                          |  84 +------------------
>  builtin/rm.c                             |   7 +-
>  pathspec.c                               |  89 ++++++++++++++++++++
>  pathspec.h                               |  12 +++
>  t/perf/p2000-sparse-operations.sh        |   1 +
>  t/t1092-sparse-checkout-compatibility.sh | 100 ++++++++++++++++++++++-
>  6 files changed, 205 insertions(+), 88 deletions(-)
> 
> Range-diff against v1:
> 1:  6b424a1eb1 ! 1:  ea4162c6ab t1092: add tests for `git-rm`
>     @@ Commit message
>          Add tests for `git-rm`, make sure it behaves as expected when
>          <pathspec> is both inside or outside of sparse-checkout definition.
>      
>     -    Also add ensure_not_expanded test to make sure `git-rm` does not
>     -    accidentally expand the index when <pathspec> is within the
>     -    sparse-checkout definition.
>     -
>     +    Helped-by: Victoria Dye <vdye@github.com>
>     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
>          Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>      
>       ## t/t1092-sparse-checkout-compatibility.sh ##
>     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'mv directory from
>      +	test_cmp folder1-sparse sparse-index-out &&
>      +	test_sparse_match git status --porcelain=v2
>      +'
>     -+
>     -+test_expect_failure 'sparse index is not expanded: rm' '
>     -+	init_repos &&
>     -+
>     -+	ensure_not_expanded rm deep/a &&
>     -+
>     -+	# test in-cone wildcard
>     -+	git -C sparse-index reset --hard &&
>     -+	ensure_not_expanded rm deep/* &&
>     -+
>     -+	# test recursive rm
>     -+	git -C sparse-index reset --hard &&
>     -+	ensure_not_expanded rm -r deep
>     -+'
>      +
>       test_done
> 2:  c2cf8b3c86 ! 2:  061c675c46 pathspec.h: move pathspec_needs_expanded_index() from reset.c to here
>     @@ Commit message
>          * Add a check in front so if the index is not sparse, return early since
>            no expansion is needed.
>      
>     +    * It now takes an arbitrary 'struct index_state' pointer instead of
>     +      using `the_index` and `active_cache`.
>     +
>          * Add documentation to the function.
>      
>     +    Helped-by: Victoria Dye <vdye@github.com>
>     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
>          Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>      
>       ## builtin/reset.c ##
> 3:  443ca7a682 ! 3:  1c4a85fad3 rm: expand the index only when necessary
>     @@ Metadata
>       ## Commit message ##
>          rm: expand the index only when necessary
>      
>     -    Originally, rm a pathspec that is out-of-cone in a sparse-index
>     -    environment, Git dies with "pathspec '<x>' did not match any files",
>     -    mainly because it does not expand the index so nothing is matched.
>     -
>          Remove the `ensure_full_index()` method so `git-rm` does not always
>          expand the index when the expansion is unnecessary, i.e. when
>          <pathspec> does not have any possibilities to match anything outside
>     @@ Commit message
>          <pathspec> contains wildcard that may need a full-index or the
>          <pathspec> is simply outside of sparse-checkout definition.
>      
>     +    Notice that the test 'rm pathspec expands index when necessary' in
>     +    t1092 *is* testing this code change behavior, though it will be marked
>     +    as 'test_expect_success' only in the next patch, where we officially
>     +    mark `command_requires_full_index = 0`, so the index does not expand
>     +    unless we tell it to do so.
>     +
>     +    Notice that because we also want `ensure_full_index` to record the
>     +    stdout and stderr from Git command, a corresponding modification
>     +    is also included in this patch. The reason we want the "sparse-index-out"
>     +    and "sparse-index-err", is that we need to make sure there is no error
>     +    from Git command itself, so we can rely on the `test_region` result
>     +    and determine if the index is expanded or not.
>     +
>     +    Helped-by: Victoria Dye <vdye@github.com>
>     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
>          Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>      
>       ## builtin/rm.c ##
>     @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
>       	for (i = 0; i < active_nr; i++) {
>       		const struct cache_entry *ce = active_cache[i];
>       
>     +
>     + ## t/t1092-sparse-checkout-compatibility.sh ##
>     +@@ t/t1092-sparse-checkout-compatibility.sh: ensure_not_expanded () {
>     + 		shift &&
>     + 		test_must_fail env \
>     + 			GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
>     +-			git -C sparse-index "$@" || return 1
>     ++			git -C sparse-index "$@" \
>     ++			>sparse-index-out \
>     ++			2>sparse-index-error || return 1
>     + 	else
>     + 		GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
>     +-			git -C sparse-index "$@" || return 1
>     ++			git -C sparse-index "$@" \
>     ++			>sparse-index-out \
>     ++			2>sparse-index-error || return 1
>     + 	fi &&
>     + 	test_region ! index ensure_full_index trace2.txt
>     + }
>     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'rm pathspec outside sparse definition' '
>     + 	test_sparse_match git status --porcelain=v2
>     + '
>     + 
>     ++test_expect_failure 'rm pathspec expands index when necessary' '
>     ++	init_repos &&
>     ++
>     ++	# in-cone pathspec (do not expand)
>     ++	ensure_not_expanded rm "deep/deep*" &&
>     ++	test_must_be_empty sparse-index-err &&
>     ++
>     ++	# out-of-cone pathspec (expand)
>     ++	! ensure_not_expanded rm --sparse "folder1/a*" &&
>     ++	test_must_be_empty sparse-index-err &&
>     ++
>     ++	# pathspec that should expand index
>     ++	! ensure_not_expanded rm "*/a" &&
>     ++	test_must_be_empty sparse-index-err &&
>     ++
>     ++	! ensure_not_expanded rm "**a" &&
>     ++	test_must_be_empty sparse-index-err
>     ++'
>     ++
>     + test_done
> 4:  adb62ca9bf ! 4:  861be8a91e rm: integrate with sparse-index
>     @@ Commit message
>      
>          Enable the sparse index within the `git-rm` command.
>      
>     -    The `p2000` tests demonstrate a ~96% execution time reduction for
>     +    The `p2000` tests demonstrate a ~92% execution time reduction for
>          'git rm' using a sparse index.
>      
>     -    Test                                     before  after
>     -    -------------------------------------------------------------
>     -    2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
>     -    2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%
>     -    2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
>     -    2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%
>     +    Test                              HEAD~1            HEAD
>     +    --------------------------------------------------------------------------
>     +    2000.74: git rm ... (full-v3)     0.41(0.37+0.05)   0.43(0.36+0.07) +4.9%
>     +    2000.75: git rm ... (full-v4)     0.38(0.34+0.05)   0.39(0.35+0.05) +2.6%
>     +    2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01)   0.05(0.05+0.00) -91.2%
>     +    2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02)   0.03(0.03+0.00) -94.7%
>      
>          ----
>          Also, normalize a behavioral difference of `git-rm` under sparse-index.
>     @@ Commit message
>      
>          [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398
>      
>     +    Helped-by: Victoria Dye <vdye@github.com>
>     +    Helped-by: Derrick Stolee <derrickstolee@github.com>
>          Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>      
>       ## builtin/rm.c ##
>     @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git blame $SPARSE_CONE/f3/a
>       test_perf_on_all git read-tree -mu HEAD
>       test_perf_on_all git checkout-index -f --all
>       test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>     -+test_perf_on_all git rm -f $SPARSE_CONE/a
>     ++test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
>       
>       test_done
>      
>     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'rm pathspec outsi
>       	test_sparse_match git status --porcelain=v2
>       '
>       
>     --test_expect_failure 'sparse index is not expanded: rm' '
>     -+test_expect_success 'sparse index is not expanded: rm' '
>     +-test_expect_failure 'rm pathspec expands index when necessary' '
>     ++test_expect_success 'rm pathspec expands index when necessary' '
>       	init_repos &&
>       
>     - 	ensure_not_expanded rm deep/a &&
>     + 	# in-cone pathspec (do not expand)
>     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'rm pathspec expands index when necessary' '
>     + 	test_must_be_empty sparse-index-err
>     + '
>     + 
>     ++test_expect_success 'sparse index is not expanded: rm' '
>     ++	init_repos &&
>     ++
>     ++	ensure_not_expanded rm deep/a &&
>     ++
>     ++	# test in-cone wildcard
>     ++	git -C sparse-index reset --hard &&
>     ++	ensure_not_expanded rm deep/* &&
>     ++
>     ++	# test recursive rm
>     ++	git -C sparse-index reset --hard &&
>     ++	ensure_not_expanded rm -r deep
>     ++'
>     ++
>     + test_done
> 
> base-commit: 679aad9e82d0dfd8ef3d1f98fa4629665496cec9


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-10  0:27   ` Victoria Dye
@ 2022-08-10  0:31     ` Shaoxuan Yuan
  2022-08-12 18:36     ` Junio C Hamano
  1 sibling, 0 replies; 25+ messages in thread
From: Shaoxuan Yuan @ 2022-08-10  0:31 UTC (permalink / raw)
  To: Victoria Dye, git; +Cc: derrickstolee

On 8/10/2022 8:27 AM, Victoria Dye wrote:
 > Shaoxuan Yuan wrote:
 >> ## Changes since PATCH v1 ##
 >>
 >> 1. Move `ensure_not_expanded` test from the first patch to the last one.
 >>
 >> 2. Mention the parameter of `pathspec_needs_expanded_index()` is
 >>    changed to use `struct index_state`.
 >>
 >> 3. Modify `ensure_not_expanded` method to record Git commands' stderr
 >>    and stdout.
 >>
 >> 4. Add a test 'rm pathspec expands index when necessary' to test
 >>    the expected index expansion when different pathspec is supplied.
 >>
 >> 5. Modify p2000 test by resetting the index in each test loop, so the
 >>    index modification is properly tested. Update the perf stats using
 >>    the results from the modified test.
 >>
 >> ## PATCH v1 info ##
 >>
 >> Turn on sparse-index feature within `git-rm` command.
 >> Add necessary modifications and test them.
 >
 > Other than a completely optional recommendation on commit ordering 
[1], I didn't have any comments on any individual patches. This series 
looks good to me!
 >
 > [1] 
https://lore.kernel.org/git/2c0cb658-cd5a-420a-d313-6839149b9b40@github.com/

Thanks for reviewing! :)
I think I'll just leave the commit ordering as-is.

--
Thanks,
Shaoxuan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/4] t1092: add tests for `git-rm`
  2022-08-07  4:13   ` [PATCH v2 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
@ 2022-08-10 12:47     ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2022-08-10 12:47 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: vdye

On 8/7/22 12:13 AM, Shaoxuan Yuan wrote:

> +test_expect_failure 'rm pathspec outside sparse definition' '

My only concern with this version is a minor one, and I didn't
notice it until this version: this test_expect_failure.

test_expect_failure doesn't help too much except to say "something
fails in this test". It could be the very first command, or it
could be the last.

> +	init_repos &&
> +
> +	for file in folder1/a folder1/0/1
> +	do
> +		test_sparse_match test_must_fail git rm $file &&
> +		test_sparse_match test_must_fail git rm --cached $file &&
> +		test_sparse_match git rm --sparse $file &&
> +		test_sparse_match git status --porcelain=v2
> +	done &&
> +
> +	cat >folder1-full <<-EOF &&
> +	rm ${SQ}folder1/0/0/0${SQ}
> +	rm ${SQ}folder1/0/1${SQ}
> +	rm ${SQ}folder1/a${SQ}
> +	EOF
> +
> +	cat >folder1-sparse <<-EOF &&
> +	rm ${SQ}folder1/${SQ}
> +	EOF

The difference you are demonstrating is that this output is
different. I think that at the point of this patch, they are
the same. The goal of this patch is to establish a common
point of reference for the full index and sparse index cases.

If everything below was "test_sparse_match" in this patch,
then I believe the test would pass.

The behavior changes when we enable the sparse index in the
'rm' builtin. Demonstrating the changes to the test at that
time helps collect all of the different ways behavior changes
with a sparse index, making it really easy to audit what
exactly is different between the modes.

Another approach would be to integrate the sparse index with
the builtin early, but keep the ensure_full_index() calls in
certain places (so we still expand to a full index) and slowly
add modes that do not expand. This is even trickier to do than
to delay the test changes to the end.

That said, finding out how to organize these tests is very
difficult because there is a bit of a chicken-or-egg problem:
How can we test the custom integration logic without enabling
the sparse index across the entire builtin? How can we enable
the sparse index across the builtin without having all of the
integration logic implemented?

So please take my ramblings here as food for thought, but not
any need to make changes to this series. v2 looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] rm: integrate with sparse-index
  2022-08-10  0:27   ` Victoria Dye
  2022-08-10  0:31     ` Shaoxuan Yuan
@ 2022-08-12 18:36     ` Junio C Hamano
  1 sibling, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-08-12 18:36 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Shaoxuan Yuan, git, derrickstolee

Victoria Dye <vdye@github.com> writes:

> Other than a completely optional recommendation on commit ordering
> [1], I didn't have any comments on any individual patches. This
> series looks good to me!

Thanks, all, for working on and reviewing these patches.

Let's merge it down to 'next' soonish.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-08-12 18:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03  4:51 [PATCH v1 0/4] rm: integrate with sparse-index Shaoxuan Yuan
2022-08-03  4:51 ` [PATCH v1 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
2022-08-03 14:32   ` Derrick Stolee
2022-08-03  4:51 ` [PATCH v1 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
2022-08-03 14:35   ` Derrick Stolee
2022-08-05  7:53     ` Shaoxuan Yuan
2022-08-03  4:51 ` [PATCH v1 3/4] rm: expand the index only when necessary Shaoxuan Yuan
2022-08-03 14:40   ` Derrick Stolee
2022-08-05  8:07     ` Shaoxuan Yuan
2022-08-03  4:51 ` [PATCH v1 4/4] rm: integrate with sparse-index Shaoxuan Yuan
2022-08-04 14:48   ` Derrick Stolee
2022-08-06  3:18     ` Shaoxuan Yuan
2022-08-07  4:13 ` [PATCH v2 0/4] " Shaoxuan Yuan
2022-08-07  4:13   ` [PATCH v2 1/4] t1092: add tests for `git-rm` Shaoxuan Yuan
2022-08-10 12:47     ` Derrick Stolee
2022-08-07  4:13   ` [PATCH v2 2/4] pathspec.h: move pathspec_needs_expanded_index() from reset.c to here Shaoxuan Yuan
2022-08-07  4:13   ` [PATCH v2 3/4] rm: expand the index only when necessary Shaoxuan Yuan
2022-08-10  0:24     ` Victoria Dye
2022-08-07  4:13   ` [PATCH v2 4/4] rm: integrate with sparse-index Shaoxuan Yuan
2022-08-08 17:24   ` [PATCH v2 0/4] " Junio C Hamano
2022-08-08 17:51     ` Victoria Dye
2022-08-08 19:01       ` Junio C Hamano
2022-08-10  0:27   ` Victoria Dye
2022-08-10  0:31     ` Shaoxuan Yuan
2022-08-12 18:36     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).