git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Sparse Index: integrate with reset
@ 2021-09-30 14:50 Victoria Dye via GitGitGadget
  2021-09-30 14:50 ` [PATCH 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
                   ` (7 more replies)
  0 siblings, 8 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:50 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye

This series integrates the sparse index with git reset and provides
miscellaneous fixes and improvements to the command in sparse checkouts.
This includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the command
 2. repository settings to enable the sparse index with ensure_full_index
    guarding any code paths that break tests without other compatibility
    updates.
 3. modifications to remove or reduce the scope in which ensure_full_index
    must be called.

The sparse index updates are predicated on a fix originating from the
microsoft/git fork [1], correcting how git reset --mixed handles resetting
entries outside the sparse checkout definition. Additionally, a performance
"bug" in next_cache_entry with sparse index is corrected, preventing
repeatedly looping over already-searched entries.

The p2000 tests demonstrate an overall ~70% execution time reduction across
all tested usages of git reset using a sparse index:

Test                                               before   after       
------------------------------------------------------------------------
2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%


[1] microsoft@6b8a074

Thanks! -Victoria

Kevin Willford (1):
  reset: behave correctly with sparse-checkout

Victoria Dye (6):
  sparse-index: update command for expand/collapse test
  reset: expand test coverage for sparse checkouts
  reset: integrate with sparse index
  reset: make sparse-aware (except --mixed)
  reset: make --mixed sparse-aware
  unpack-trees: improve performance of next_cache_entry

 builtin/reset.c                          |  77 ++++++++++++-
 cache-tree.c                             |  43 ++++++-
 cache.h                                  |  10 ++
 read-cache.c                             |  22 ++--
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 139 ++++++++++++++++++++++-
 t/t7114-reset-sparse-checkout.sh         |  61 ++++++++++
 unpack-trees.c                           |  23 +++-
 8 files changed, 353 insertions(+), 25 deletions(-)
 create mode 100755 t/t7114-reset-sparse-checkout.sh


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1048
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 1/7] reset: behave correctly with sparse-checkout
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
@ 2021-09-30 14:50 ` Kevin Willford via GitGitGadget
  2021-09-30 18:34   ` Junio C Hamano
  2021-09-30 14:50 ` [PATCH 2/7] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 85+ messages in thread
From: Kevin Willford via GitGitGadget @ 2021-09-30 14:50 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Kevin Willford

From: Kevin Willford <kewillf@microsoft.com>

When using the sparse checkout feature, 'git reset' will add entries to
the index that will have the skip-worktree bit off but will leave the
working directory empty. File data is lost because the index version of
the files has been changed but there is nothing that is in the working
directory. This will cause the next 'git status' call to show either
deleted for files modified or deleting or nothing for files added. The
added files should be shown as untracked and modified files should be
shown as modified.

To fix this when the reset is running if there is not a file in the
working directory and if it will be missing with the new index entry or
was not missing in the previous version, we create the previous index
version of the file in the working directory so that status will report
correctly and the files will be availble for the user to deal with.

This fixes a documented failure from t1092 that was created in 19a0acc
(t1092: test interesting sparse-checkout scenarios, 2021-01-23).

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 39 +++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh |  4 +-
 t/t7114-reset-sparse-checkout.sh         | 61 ++++++++++++++++++++++++
 3 files changed, 101 insertions(+), 3 deletions(-)
 create mode 100755 t/t7114-reset-sparse-checkout.sh

diff --git a/builtin/reset.c b/builtin/reset.c
index 51c9e2f43ff..8ffcd713720 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -25,6 +25,8 @@
 #include "cache-tree.h"
 #include "submodule.h"
 #include "submodule-config.h"
+#include "dir.h"
+#include "entry.h"
 
 #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
 
@@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 		struct diff_options *opt, void *data)
 {
 	int i;
+	int pos;
 	int intent_to_add = *(int *)data;
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filespec *one = q->queue[i]->one;
+		struct diff_filespec *two = q->queue[i]->two;
 		int is_missing = !(one->mode && !is_null_oid(&one->oid));
+		int was_missing = !two->mode && is_null_oid(&two->oid);
 		struct cache_entry *ce;
+		struct cache_entry *ce_before;
+		struct checkout state = CHECKOUT_INIT;
+
+		/*
+		 * When using the sparse-checkout feature the cache entries
+		 * that are added here will not have the skip-worktree bit
+		 * set. Without this code there is data that is lost because
+		 * the files that would normally be in the working directory
+		 * are not there and show as deleted for the next status.
+		 * In the case of added files, they just disappear.
+		 *
+		 * We need to create the previous version of the files in
+		 * the working directory so that they will have the right
+		 * content and the next status call will show modified or
+		 * untracked files correctly.
+		 */
+		if (core_apply_sparse_checkout && !file_exists(two->path)) {
+			pos = cache_name_pos(two->path, strlen(two->path));
+			if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) &&
+			    (is_missing || !was_missing)) {
+				state.force = 1;
+				state.refresh_cache = 1;
+				state.istate = &the_index;
+
+				ce_before = make_cache_entry(&the_index, two->mode,
+							     &two->oid, two->path,
+							     0, 0);
+				if (!ce_before)
+					die(_("make_cache_entry failed for path '%s'"),
+						two->path);
+
+				checkout_entry(ce_before, &state, NULL, NULL);
+			}
+		}
 
 		if (is_missing && !intent_to_add) {
 			remove_file_from_cache(one->path);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 886e78715fe..c5977152661 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-# in this scenario, but it shouldn't.
-test_expect_failure 'checkout and reset (mixed)' '
+test_expect_success 'checkout and reset (mixed)' '
 	init_repos &&
 
 	test_all_match git checkout -b reset-test update-deep &&
diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh
new file mode 100755
index 00000000000..a8029707fb1
--- /dev/null
+++ b/t/t7114-reset-sparse-checkout.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='reset when using a sparse-checkout'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_tick &&
+	echo "checkout file" >c &&
+	echo "modify file" >m &&
+	echo "delete file" >d &&
+	git add . &&
+	git commit -m "initial commit" &&
+	echo "added file" >a &&
+	echo "modification of a file" >m &&
+	git rm d &&
+	git add . &&
+	git commit -m "second commit" &&
+	git checkout -b endCommit
+'
+
+test_expect_success 'reset when there is a sparse-checkout' '
+	echo "/c" >.git/info/sparse-checkout &&
+	test_config core.sparsecheckout true &&
+	git checkout -B resetBranch &&
+	test_path_is_missing m &&
+	test_path_is_missing a &&
+	test_path_is_missing d &&
+	git reset HEAD~1 &&
+	echo "checkout file" >expect &&
+	test_cmp expect c &&
+	echo "added file" >expect &&
+	test_cmp expect a &&
+	echo "modification of a file" >expect &&
+	test_cmp expect m &&
+	test_path_is_missing d
+'
+
+test_expect_success 'reset after deleting file without skip-worktree bit' '
+	git checkout -f endCommit &&
+	git clean -xdf &&
+	cat >.git/info/sparse-checkout <<-\EOF &&
+	/c
+	/m
+	EOF
+	test_config core.sparsecheckout true &&
+	git checkout -B resetAfterDelete &&
+	test_path_is_file m &&
+	test_path_is_missing a &&
+	test_path_is_missing d &&
+	rm -f m &&
+	git reset HEAD~1 &&
+	echo "checkout file" >expect &&
+	test_cmp expect c &&
+	echo "added file" >expect &&
+	test_cmp expect a &&
+	test_path_is_missing m &&
+	test_path_is_missing d
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  2021-09-30 14:50 ` [PATCH 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
@ 2021-09-30 14:50 ` Victoria Dye via GitGitGadget
  2021-09-30 19:17   ` Taylor Blau
  2021-10-01  9:14   ` Bagas Sanjaya
  2021-09-30 14:50 ` [PATCH 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:50 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

In anticipation of multiple commands being fully integrated with sparse
index, update the test for index expand and collapse for non-sparse index
integrated commands to use `mv`.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index c5977152661..aed8683e629 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	init_repos &&
 
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" reset --hard &&
+		git -C sparse-index -c core.fsmonitor="" mv a b &&
 	test_region index convert_to_sparse trace2.txt &&
 	test_region index ensure_full_index trace2.txt
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 3/7] reset: expand test coverage for sparse checkouts
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  2021-09-30 14:50 ` [PATCH 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
  2021-09-30 14:50 ` [PATCH 2/7] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-09-30 14:50 ` Victoria Dye via GitGitGadget
  2021-09-30 14:50 ` [PATCH 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:50 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
pathspecs both inside and outside of the sparse checkout definition. New
performance test cases exercise various execution paths for `reset`.

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 107 +++++++++++++++++++++++
 2 files changed, 110 insertions(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 597626276fb..bfd332120c8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,5 +110,8 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all git reset
+test_perf_on_all git reset --hard
+test_perf_on_all git reset -- does-not-exist
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index aed8683e629..e36fb18098d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,6 +479,113 @@ test_expect_success 'checkout and reset (mixed) [sparse]' '
 	test_sparse_match git reset update-folder2
 '
 
+# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit>
+# will be added to the work tree even if outside the sparse checkout
+# definition, and even if the file is modified to a state of having no local
+# changes. The file is "re-ignored" if a hard reset is executed. We may want to
+# change this behavior in the future and enforce that files are not written
+# outside of the sparse checkout definition.
+test_expect_success 'checkout and mixed reset file tracking [sparse]' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset update-folder1 &&
+	test_all_match git reset update-deep &&
+
+	# At this point, there are no changes in the working tree. However,
+	# folder1/a now exists locally (even though it is outside of the sparse
+	# paths).
+	run_on_sparse test_path_exists folder1 &&
+
+	run_on_all rm folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_sparse test_path_is_missing folder1 &&
+	test_path_exists full-checkout/folder1
+'
+
+test_expect_success 'checkout and reset (merge)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --merge deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --merge deepest
+'
+
+test_expect_success 'checkout and reset (keep)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --keep deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --keep deepest
+'
+
+test_expect_success 'reset with pathspecs inside sparse definition' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents deep/a &&
+
+	test_all_match git reset base -- deep/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset base -- nonexistent-file &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset deepest -- deep &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_success 'reset with sparse directory pathspec outside definition' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset base -- folder1 &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_success 'reset with pathspec match in sparse directory' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset base -- folder1/a &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_success 'reset with wildcard pathspec' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset base -- \*/a &&
+	test_all_match git status --porcelain=v2
+'
+
 test_expect_success 'merge, cherry-pick, and rebase' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 4/7] reset: integrate with sparse index
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-09-30 14:50 ` [PATCH 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
@ 2021-09-30 14:50 ` Victoria Dye via GitGitGadget
  2021-09-30 14:50 ` [PATCH 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:50 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

`reset --soft` does not modify the index, so no compatibility changes are
needed for it to function without expanding the index. For all other reset
modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is
explicitly expanded with `ensure_full_index` to maintain current behavior.

Additionally, the `read_cache()` check verifying an uncorrupted index is
moved after argument parsing and preparing the repo settings. The index is
not used by the preceding argument handling, but `read_cache()` does need to
be run after enabling sparse index for the command and before resetting.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c | 10 +++++++---
 cache-tree.c    |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 8ffcd713720..92b9a3815c7 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -205,6 +205,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.flags.override_submodule_config = 1;
 	opt.repo = the_repository;
 
+	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
@@ -282,9 +283,6 @@ static void parse_args(struct pathspec *pathspec,
 	}
 	*rev_ret = rev;
 
-	if (read_cache() < 0)
-		die(_("index file corrupt"));
-
 	parse_pathspec(pathspec, 0,
 		       PATHSPEC_PREFER_FULL |
 		       (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0),
@@ -430,6 +428,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
 	if (intent_to_add && reset_type != MIXED)
 		die(_("-N can only be used with --mixed"));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
+	if (read_cache() < 0)
+		die(_("index file corrupt"));
+
 	/* Soft reset does not touch the index file nor the working tree
 	 * at all, but requires them in a good order.  Other resets reset
 	 * the index file to the tree object we are switching to. */
diff --git a/cache-tree.c b/cache-tree.c
index 90919f9e345..9be19c85b66 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r,
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
+	ensure_full_index(istate);
 	prime_cache_tree_rec(r, istate->cache_tree, tree);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 5/7] reset: make sparse-aware (except --mixed)
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-09-30 14:50 ` [PATCH 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
@ 2021-09-30 14:50 ` Victoria Dye via GitGitGadget
  2021-09-30 14:51 ` [PATCH 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:50 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`,
the function must determine whether the currently-processing directory in
the tree is sparse or not. If it is not sparse, the tree is parsed and
subtree recursively constructed. If it is sparse, no subtrees are added to
the tree and the entry count is set to 1 (representing the sparse directory
itself).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache-tree.c                             | 44 +++++++++++++++++++++---
 cache.h                                  | 10 ++++++
 read-cache.c                             | 22 ++++++++----
 t/t1092-sparse-checkout-compatibility.sh | 15 ++++++--
 4 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 9be19c85b66..9021669d682 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -740,15 +740,29 @@ out:
 	return ret;
 }
 
+static void prime_cache_tree_sparse_dir(struct repository *r,
+					struct cache_tree *it,
+					struct tree *tree,
+					struct strbuf *tree_path)
+{
+
+	oidcpy(&it->oid, &tree->object.oid);
+	it->entry_count = 1;
+	return;
+}
+
 static void prime_cache_tree_rec(struct repository *r,
 				 struct cache_tree *it,
-				 struct tree *tree)
+				 struct tree *tree,
+				 struct strbuf *tree_path)
 {
+	struct strbuf subtree_path = STRBUF_INIT;
 	struct tree_desc desc;
 	struct name_entry entry;
 	int cnt;
 
 	oidcpy(&it->oid, &tree->object.oid);
+
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
@@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r,
 		else {
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
+
 			if (!subtree->object.parsed)
 				parse_tree(subtree);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-			prime_cache_tree_rec(r, sub->cache_tree, subtree);
+			strbuf_reset(&subtree_path);
+			strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1);
+			strbuf_addbuf(&subtree_path, tree_path);
+			strbuf_add(&subtree_path, entry.path, entry.pathlen);
+			strbuf_addch(&subtree_path, '/');
+
+			/*
+			 * If a sparse index is in use, the directory being processed may be
+			 * sparse. To confirm that, we can check whether an entry with that
+			 * exact name exists in the index. If it does, the created subtree
+			 * should be sparse. Otherwise, cache tree expansion should continue
+			 * as normal.
+			 */
+			if (r->index->sparse_index &&
+			    index_entry_exists(r->index, subtree_path.buf, subtree_path.len))
+				prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path);
+			else
+				prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path);
 			cnt += sub->cache_tree->entry_count;
 		}
 	}
 	it->entry_count = cnt;
+
+	strbuf_release(&subtree_path);
 }
 
 void prime_cache_tree(struct repository *r,
 		      struct index_state *istate,
 		      struct tree *tree)
 {
+	struct strbuf tree_path = STRBUF_INIT;
+
 	trace2_region_enter("cache-tree", "prime_cache_tree", the_repository);
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
-	ensure_full_index(istate);
-	prime_cache_tree_rec(r, istate->cache_tree, tree);
+	prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path);
+	strbuf_release(&tree_path);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
 }
diff --git a/cache.h b/cache.h
index f6295f3b048..1d3e4665562 100644
--- a/cache.h
+++ b/cache.h
@@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  */
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
+/*
+ * Determines whether an entry with the given name exists within the
+ * given index. The return value is 1 if an exact match is found, otherwise
+ * it is 0. Note that, unlike index_name_pos, this function does not expand
+ * the index if it is sparse. If an item exists within the full index but it
+ * is contained within a sparse directory (and not in the sparse index), 0 is
+ * returned.
+ */
+int index_entry_exists(struct index_state *, const char *name, int namelen);
+
 /*
  * Some functions return the negative complement of an insert position when a
  * precise match was not found but a position was found where the entry would
diff --git a/read-cache.c b/read-cache.c
index f5d4385c408..ea1166895f8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -551,7 +551,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate,
+				const char *name, int namelen,
+				int stage,
+				int search_sparse)
 {
 	int first, last;
 
@@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		first = next+1;
 	}
 
-	if (istate->sparse_index &&
+	if (search_sparse && istate->sparse_index &&
 	    first > 0) {
 		/* Note: first <= istate->cache_nr */
 		struct cache_entry *ce = istate->cache[first - 1];
@@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		    ce_namelen(ce) < namelen &&
 		    !strncmp(name, ce->name, ce_namelen(ce))) {
 			ensure_full_index(istate);
-			return index_name_stage_pos(istate, name, namelen, stage);
+			return index_name_stage_pos(istate, name, namelen, stage, search_sparse);
 		}
 	}
 
@@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 
 int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
-	return index_name_stage_pos(istate, name, namelen, 0);
+	return index_name_stage_pos(istate, name, namelen, 0, 1);
+}
+
+int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+{
+	return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0;
 }
 
 int remove_index_entry_at(struct index_state *istate, int pos)
@@ -1222,7 +1230,7 @@ static int has_dir_name(struct index_state *istate,
 			 */
 		}
 
-		pos = index_name_stage_pos(istate, name, len, stage);
+		pos = index_name_stage_pos(istate, name, len, stage, 1);
 		if (pos >= 0) {
 			/*
 			 * Found one, but not so fast.  This could
@@ -1322,7 +1330,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
 		pos = index_pos_to_insert_pos(istate->cache_nr);
 	else
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
@@ -1357,7 +1365,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		if (!ok_to_replace)
 			return error(_("'%s' appears as both a file and as a directory"),
 				     ce->name);
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
 		pos = -pos-1;
 	}
 	return pos + 1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e36fb18098d..0b6ff0de17d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -786,9 +786,9 @@ test_expect_success 'sparse-index is not expanded' '
 	ensure_not_expanded checkout - &&
 	ensure_not_expanded switch rename-out-to-out &&
 	ensure_not_expanded switch - &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
 
 	echo >>sparse-index/README.md &&
@@ -798,6 +798,17 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	ensure_not_expanded add . &&
 
+	for ref in update-deep update-folder1 update-folder2 update-deep
+	do
+		echo >>sparse-index/README.md &&
+		ensure_not_expanded reset --hard $ref || return 1
+	done &&
+
+	ensure_not_expanded reset --hard update-deep &&
+	ensure_not_expanded reset --keep base &&
+	ensure_not_expanded reset --merge update-deep &&
+	ensure_not_expanded reset --hard &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 6/7] reset: make --mixed sparse-aware
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-09-30 14:50 ` [PATCH 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
@ 2021-09-30 14:51 ` Victoria Dye via GitGitGadget
  2021-10-01 15:03   ` Victoria Dye
  2021-09-30 14:51 ` [PATCH 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  7 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:51 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Sparse directory entries are "diffed" as trees in `diff_cache` (used
internally by `reset --mixed`), following a code path separate from
individual file handling. The use of `diff_tree_oid` there requires setting
explicit `change` and `add_remove` functions to process the internal
contents of a sparse directory.

Additionally, the `recursive` diff option handles cases in which `reset
--mixed` must diff/merge files that are nested multiple levels deep in a
sparse directory.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 30 +++++++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++-
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 92b9a3815c7..2d95ce76f20 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -196,6 +196,8 @@ static int read_from_tree(const struct pathspec *pathspec,
 			  int intent_to_add)
 {
 	struct diff_options opt;
+	unsigned int i;
+	char *skip_worktree_seen = NULL;
 
 	memset(&opt, 0, sizeof(opt));
 	copy_pathspec(&opt.pathspec, pathspec);
@@ -203,9 +205,35 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.format_callback = update_index_from_diff;
 	opt.format_callback_data = &intent_to_add;
 	opt.flags.override_submodule_config = 1;
+	opt.flags.recursive = 1;
 	opt.repo = the_repository;
+	opt.change = diff_change;
+	opt.add_remove = diff_addremove;
+
+	/*
+	 * When pathspec is given for resetting a cone-mode sparse checkout, it may
+	 * identify entries that are nested in sparse directories, in which case the
+	 * index should be expanded. For the sake of efficiency, this check is
+	 * overly-cautious: anything with a wildcard or a magic prefix requires
+	 * expansion, as well as literal paths that aren't in the sparse checkout
+	 * definition AND don't match any directory in the index.
+	 */
+	if (pathspec->nr && the_index.sparse_index) {
+		if (pathspec->magic || pathspec->has_wildcard) {
+			ensure_full_index(&the_index);
+		} else {
+			for (i = 0; i < pathspec->nr; i++) {
+				if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
+				    !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+					ensure_full_index(&the_index);
+					break;
+				}
+			}
+		}
+	}
+
+	free(skip_worktree_seen);
 
-	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 0b6ff0de17d..c9b9ef4992c 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -801,14 +801,25 @@ test_expect_success 'sparse-index is not expanded' '
 	for ref in update-deep update-folder1 update-folder2 update-deep
 	do
 		echo >>sparse-index/README.md &&
+		ensure_not_expanded reset --mixed $ref
 		ensure_not_expanded reset --hard $ref || return 1
 	done &&
 
 	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded reset --keep base &&
 	ensure_not_expanded reset --merge update-deep &&
-	ensure_not_expanded reset --hard &&
 
+	ensure_not_expanded reset base -- deep/a &&
+	ensure_not_expanded reset base -- nonexistent-file &&
+	ensure_not_expanded reset deepest -- deep &&
+
+	# Although folder1 is outside the sparse definition, it exists as a
+	# directory entry in the index, so it will be reset without needing to
+	# expand the full index.
+	ensure_not_expanded reset --hard update-folder1 &&
+	ensure_not_expanded reset base -- folder1 &&
+
+	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] unpack-trees: improve performance of next_cache_entry
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-09-30 14:51 ` [PATCH 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
@ 2021-09-30 14:51 ` Victoria Dye via GitGitGadget
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-09-30 14:51 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

To find the first non-unpacked cache entry, `next_cache_entry` iterates
through index, starting at `cache_bottom`. The performance of this in full
indexes is helped by `cache_bottom` advancing with each invocation of
`mark_ce_used` (called by `unpack_index_entry`). However, the presence of
sparse directories can prevent the `cache_bottom` from advancing in a sparse
index case, effectively forcing `next_cache_entry` to search from the
beginning of the index each time it is called.

The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
(unpack-trees: preserve cache_bottom, 2021-07-14)).  Therefore, to retain
the benefit `cache_bottom` provides in non-sparse index cases, a separate
`hint` position indicates the first position `next_cache_entry` should
search, updated each execution with a new position.  The performance of `git
reset -- does-not-exist` (testing the "worst case" in which all entries in
the index are unpacked with `next_cache_entry`) is significantly improved
for the sparse index case:

Test          before            after
------------------------------------------------------
(full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
(full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
(sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
(sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 8ea0a542da8..b94733de6be 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }
 
-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;
 
+	if (*hint > pos)
+		pos = *hint;
+
 	while (pos < index->cache_nr) {
 		struct cache_entry *ce = index->cache[pos];
-		if (!(ce->ce_flags & CE_UNPACKED))
+		if (!(ce->ce_flags & CE_UNPACKED)) {
+			*hint = pos + 1;
 			return ce;
+		}
 		pos++;
 	}
+
+	*hint = pos;
 	return NULL;
 }
 
@@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Are we supposed to look at the index too? */
 	if (o->merge) {
+		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;
 
 			if (o->diff_index_cached)
-				ce = next_cache_entry(o);
+				ce = next_cache_entry(o, &hint);
 			else
 				ce = find_cache_entry(info, p);
 
@@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *,
 int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
 {
 	struct repository *repo = the_repository;
-	int i, ret;
+	int i, hint, ret;
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
@@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
+			hint = -1;
+
 			/*
 			 * Unpack existing index entries that sort before the
 			 * prefix the tree is spliced into.  Note that o->merge
 			 * is always true in this case.
 			 */
 			while (1) {
-				struct cache_entry *ce = next_cache_entry(o);
+				struct cache_entry *ce = next_cache_entry(o, &hint);
 				if (!ce)
 					break;
 				if (ce_in_traverse_path(ce, &info))
@@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	/* Any left-over entries in the index? */
 	if (o->merge) {
+		hint = -1;
 		while (1) {
-			struct cache_entry *ce = next_cache_entry(o);
+			struct cache_entry *ce = next_cache_entry(o, &hint);
 			if (!ce)
 				break;
 			if (unpack_index_entry(ce, o) < 0)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 1/7] reset: behave correctly with sparse-checkout
  2021-09-30 14:50 ` [PATCH 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
@ 2021-09-30 18:34   ` Junio C Hamano
  2021-10-01 14:55     ` Victoria Dye
  0 siblings, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-09-30 18:34 UTC (permalink / raw)
  To: Kevin Willford via GitGitGadget
  Cc: git, stolee, newren, Victoria Dye, Kevin Willford

"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>  		struct diff_options *opt, void *data)
>  {
>  	int i;
> +	int pos;
>  	int intent_to_add = *(int *)data;
>  
>  	for (i = 0; i < q->nr; i++) {
>  		struct diff_filespec *one = q->queue[i]->one;
> +		struct diff_filespec *two = q->queue[i]->two;
>  		int is_missing = !(one->mode && !is_null_oid(&one->oid));
> +		int was_missing = !two->mode && is_null_oid(&two->oid);

Not a problem introduced by this patch per-se, but is_missing is a
counter-intuitive name for what the boolean wants to represent, I
think, which was brought in by b4b313f9 (reset: support "--mixed
--intent-to-add" mode, 2014-02-04).  Before the commit, we used to
say

 	for (i = 0; i < q->nr; i++) {
 		struct diff_filespec *one = q->queue[i]->one;
		if (one->mode && !is_null_sha1(one->sha1)) {
			... create ce out of one and add to the	index ...
		} else
 			remove_file_from_cache(one->path);
		...

i.e. "if one is not missing, create a ce and add it, otherwise
remove the path".

It should have been called "one_is_missing" if we wanted to
literally express the condition the code checked, but an even better
name would have been given after the intent of what the code wants
to do with the information.  If the resetted-to tree (that is what
'one' side of the comparison in diff_cache() is) has a valid blob,
we want it to be in the index, and otherwise, we do not want it in
the index.

Now, the patch makes things worse and I had to do the above digging
to see why the new code is even more confusing.  The 'two' side of
the comparison is what is in the to-be-corrected-by-reset index.
"was_missing" in contrast to "is_missing" makes it sound as if it
was the state before whatever "is_missing" tries to represent, but
that is not what is happening.  "is_missing" does not mean "the
entry is currently not there in the index", but "was_missing" does
mean exactly that: "the entry is currently not there in the index".

There isn't any "was" missing about it.  It "is" missing in the
index.  Instead of renaming, I wonder if we can do without this new
variable.  Let's read on the patch.

Also, now the code uses both sides of the filepair, we must double
check that our do_diff_cache() is *not* doing any rename detection.
It might be even prudent to ensure that 

	if (strcmp(one->path, two->path))
		BUG("reset drove diff-cache with rename detection");

but it might be with too much paranoia.  I dunno.

>  		struct cache_entry *ce;
> +		struct cache_entry *ce_before;
> +		struct checkout state = CHECKOUT_INIT;

These two new variables do not need this wide a scope, I would
think.  Shouldn't it be inside the body of the new "if" statement
this patch adds?

> +		/*
> +		 * When using the sparse-checkout feature the cache entries
> +		 * that are added here will not have the skip-worktree bit
> +		 * set. Without this code there is data that is lost because
> +		 * the files that would normally be in the working directory
> +		 * are not there and show as deleted for the next status.
> +		 * In the case of added files, they just disappear.
> +		 *
> +		 * We need to create the previous version of the files in
> +		 * the working directory so that they will have the right
> +		 * content and the next status call will show modified or
> +		 * untracked files correctly.
> +		 */
> +		if (core_apply_sparse_checkout && !file_exists(two->path)) {

In a sparsely checked out working tree, there is nothing in the
working tree at the path.  It may be because it is sparse and we
didn't want to have anything there, or it may be because the user
wanted to get rid of it and said "rm path" (not "git rm path") and
this part of the tree were of interest even if the sparse checkout
feature was used to hide other parts of the tree.  With the above
two checks alone, we cannot tell which.  Let's read on.

> +			pos = cache_name_pos(two->path, strlen(two->path));

We check the index to see if there is an entry for it.  I suspect
that because we need to do this check anyway, we shouldn't even have
to look at 'two' (and add a new 'was_missing' variable), because
'one' and 'two' came from a comparison between the resetted-to tree
object and the current index, and if cache_name_pos() for the path
(we can use 'one->path') says there is an entry in the index, by
definition, 'two' would not be showing a "removed" state (i.e. "the
resetted-to tree had it, the index does not" is what "was_missing"
wants to say).

So I wonder if it is better to

 - use one->path for !file_exists() above and cache_name_pos() here
   instead of two->path.

 - drop the confusingly named 'was_missing', because (pos < 0) is
   equivalent to it after this point, and we didn't need it up to
   this point.

> +			if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) &&

And we do find an entry for it.  So this path is not something
sparse cone specifies not to check out (otherwise we would have a
tree-like entry that covers this path in the index and not an entry
for this specific path)?

Anyway, if it is marked with the skip-worktree bit, does that mean
there is no risk that the reason why two->path does not exist in the
working tree is because we earlier gave it in the working tree but
it was later removed by the user?  Just making sure that we are not
breaking the end-user's wish that the path should be removed by
resurrecting it in the working tree with a new call to
checkout_entry().

> +			    (is_missing || !was_missing)) {

And in such a case, if the resetted-to tree says we shouldn't have
the path in the resulting index, or if the original state in the
index had this path (but because (0 <= pos) must be true for us to 
reach this point, I am not sure if "was_missing" can ever be true
here), then do the following, which is ...

> +				state.force = 1;
> +				state.refresh_cache = 1;
> +				state.istate = &the_index;
> +
> +				ce_before = make_cache_entry(&the_index, two->mode,
> +							     &two->oid, two->path,
> +							     0, 0);
> +				if (!ce_before)
> +					die(_("make_cache_entry failed for path '%s'"),
> +						two->path);
> +
> +				checkout_entry(ce_before, &state, NULL, NULL);

... to resurrect the last "git add"ed state from the index and write
it out to the working tree.  As I suspected, ce_before and state
should be scoped inside this block and not visible outside, no?

I am not sure why this behaviour is desirable.  The "mixed" reset
should not have to touch the working tree in the first place.

The large comment before this block says "... will not have the
skip-worktree bit set", but we are dealing with a case where the
original index had a cache entry there with skip-worktree bit set,
so isn't the more desirable outcome that the cache entry added back
to the index has the skip-worktree bit still set and there is no
working tree file that the user did not desire to have?

And isn't it the matter of preserving the skip-worktree bit when the
code in the post context of this hunk this patch did not touch adds
the entry back to the index?

> +			}
> +		}
>  
>  		if (is_missing && !intent_to_add) {
>  			remove_file_from_cache(one->path);

If we look at the code after this point, we do use "is_missing"
information to tweak ce->ce_flags with the intent-to-add bit.

Perhaps we can do a similar tweak to the cache entry to mark it with
skip-worktree bit if the index had a cache entry at the path with
the bit set?  The code that needs to do so would only have to
remember if the one->path is in the current index and the cache
entry for the path has the skip-worktree bit in the body of the new
if() statement about (core_apply_sparse_checkout && !file_exists())
added by this patch (I am not sure if !file_exists() even matters,
though, as the approach I am suggesting is to preserve the skip bit
and not disturb the working tree files at all).

Thanks.





^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 14:50 ` [PATCH 2/7] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-09-30 19:17   ` Taylor Blau
  2021-09-30 20:11     ` Victoria Dye
  2021-10-01  9:14   ` Bagas Sanjaya
  1 sibling, 1 reply; 85+ messages in thread
From: Taylor Blau @ 2021-09-30 19:17 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, stolee, gitster, newren, Victoria Dye

On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> In anticipation of multiple commands being fully integrated with sparse
> index, update the test for index expand and collapse for non-sparse index
> integrated commands to use `mv`.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c5977152661..aed8683e629 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
>  	init_repos &&
>
>  	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
> +		git -C sparse-index -c core.fsmonitor="" mv a b &&

Double-checking my understanding as somebody who is not so familiar with
t1092: this test is to ensure that commands which don't yet understand
the sparse index can temporarily expand it in order to do their work?

If so, makes sense to me. And renaming 'a' to 'b' is arbitrary and fine
to do since we end up recreating the sparse-index repository each time
via init_repos.

Looks good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 19:17   ` Taylor Blau
@ 2021-09-30 20:11     ` Victoria Dye
  2021-09-30 21:32       ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye @ 2021-09-30 20:11 UTC (permalink / raw)
  To: Taylor Blau, Victoria Dye via GitGitGadget; +Cc: git, stolee, gitster, newren

Taylor Blau wrote:
> On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> In anticipation of multiple commands being fully integrated with sparse
>> index, update the test for index expand and collapse for non-sparse index
>> integrated commands to use `mv`.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index c5977152661..aed8683e629 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
>>  	init_repos &&
>>
>>  	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>> -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
>> +		git -C sparse-index -c core.fsmonitor="" mv a b &&
> 
> Double-checking my understanding as somebody who is not so familiar with
> t1092: this test is to ensure that commands which don't yet understand
> the sparse index can temporarily expand it in order to do their work?

Exactly - if a command doesn't explicitly enable use of the sparse index by
setting `command_requires_full_index` to 0, the index is expanded if/when it
is first read during the command's execution and collapsed if/when it is
written to disk. This test makes sure that mechanism works as intended.

-Victoria




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 20:11     ` Victoria Dye
@ 2021-09-30 21:32       ` Junio C Hamano
  2021-09-30 22:59         ` Victoria Dye
  0 siblings, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-09-30 21:32 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Taylor Blau, Victoria Dye via GitGitGadget, git, stolee, newren

Victoria Dye <vdye@github.com> writes:

> Taylor Blau wrote:
>> On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> In anticipation of multiple commands being fully integrated with sparse
>>> index, update the test for index expand and collapse for non-sparse index
>>> integrated commands to use `mv`.
>>> ...
>>>  	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>>> -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
>>> +		git -C sparse-index -c core.fsmonitor="" mv a b &&
>> 
>> Double-checking my understanding as somebody who is not so familiar with
>> t1092: this test is to ensure that commands which don't yet understand
>> the sparse index can temporarily expand it in order to do their work?
>
> Exactly - if a command doesn't explicitly enable use of the sparse index by
> setting `command_requires_full_index` to 0, the index is expanded if/when it
> is first read during the command's execution and collapsed if/when it is
> written to disk. This test makes sure that mechanism works as intended.

Sorry, I do not quite follow.  

So is this "before this series of patches, 'reset --hard' can be
used to as a sample of a command that expands and then collapses,
but because it no longer is a good sample of a command so we replace
it with 'mv a b'"?  Do we need to update this further when "mv a b"
learns to expand and then collapse?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 21:32       ` Junio C Hamano
@ 2021-09-30 22:59         ` Victoria Dye
  2021-10-01  0:04           ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye @ 2021-09-30 22:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, Victoria Dye via GitGitGadget, git, stolee, newren

Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> Taylor Blau wrote:
>>> On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
>>>> From: Victoria Dye <vdye@github.com>
>>>>
>>>> In anticipation of multiple commands being fully integrated with sparse
>>>> index, update the test for index expand and collapse for non-sparse index
>>>> integrated commands to use `mv`.
>>>> ...
>>>>  	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>>>> -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
>>>> +		git -C sparse-index -c core.fsmonitor="" mv a b &&
>>>
>>> Double-checking my understanding as somebody who is not so familiar with
>>> t1092: this test is to ensure that commands which don't yet understand
>>> the sparse index can temporarily expand it in order to do their work?
>>
>> Exactly - if a command doesn't explicitly enable use of the sparse index by
>> setting `command_requires_full_index` to 0, the index is expanded if/when it
>> is first read during the command's execution and collapsed if/when it is
>> written to disk. This test makes sure that mechanism works as intended.
> 
> Sorry, I do not quite follow.  
> 
> So is this "before this series of patches, 'reset --hard' can be
> used to as a sample of a command that expands and then collapses,
> but because it no longer is a good sample of a command so we replace
> it with 'mv a b'"?

Yes, because this series enables sparse index integration in `git reset`,
the test no longer applies to that command (but it does apply to `git mv`).

> Do we need to update this further when "mv a b"
> learns to expand and then collapse?

Unfortunately, yes. `git mv` was picked more-or-less at random from the set
of commands that read the index and don't already have sparse index
integrations (excluding those I know are planned for sparse index
integration in the near future). If `git mv` were to be updated to disable
`command_requires_full_index`, the command in the test would need to change
again.

For what it's worth, I do think the test itself is valuable, since it makes
sure a command's capability to use the sparse index is always the result of
an intentional update to (and review of) the code.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 22:59         ` Victoria Dye
@ 2021-10-01  0:04           ` Junio C Hamano
  2021-10-04 13:47             ` Victoria Dye
  0 siblings, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-10-01  0:04 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Taylor Blau, Victoria Dye via GitGitGadget, git, stolee, newren

Victoria Dye <vdye@github.com> writes:

>> Do we need to update this further when "mv a b"
>> learns to expand and then collapse?
>
> Unfortunately, yes. `git mv` was picked more-or-less at random from the set
> of commands that read the index and don't already have sparse index
> integrations (excluding those I know are planned for sparse index
> integration in the near future). If `git mv` were to be updated to disable
> `command_requires_full_index`, the command in the test would need to change
> again.
>
> For what it's worth, I do think the test itself is valuable, since it makes
> sure a command's capability to use the sparse index is always the result of
> an intentional update to (and review of) the code.

Oh, of course.  

I was actually wondering if it woudl be a good idea to leave a
command that will never be "converted" so that we can keep using it
for testing.

Perhaps a new option that is invented exactly for the purpose added
to a plumbing e.g. "git update-index --expand-collapse"?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-09-30 14:50 ` [PATCH 2/7] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
  2021-09-30 19:17   ` Taylor Blau
@ 2021-10-01  9:14   ` Bagas Sanjaya
  1 sibling, 0 replies; 85+ messages in thread
From: Bagas Sanjaya @ 2021-10-01  9:14 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: stolee, gitster, newren, Victoria Dye

On 30/09/21 21.50, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> In anticipation of multiple commands being fully integrated with sparse
> index, update the test for index expand and collapse for non-sparse index
> integrated commands to use `mv`.
> 

We can say "use git sparse-index mv instead of git sparse-index reset".

Why is mv used for this case?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 1/7] reset: behave correctly with sparse-checkout
  2021-09-30 18:34   ` Junio C Hamano
@ 2021-10-01 14:55     ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-01 14:55 UTC (permalink / raw)
  To: Junio C Hamano, Kevin Willford via GitGitGadget
  Cc: git, stolee, newren, Kevin Willford

Junio C Hamano wrote:
> "Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> @@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>>  		struct diff_options *opt, void *data)
>>  {
>>  	int i;
>> +	int pos;
>>  	int intent_to_add = *(int *)data;
>>  
>>  	for (i = 0; i < q->nr; i++) {
>>  		struct diff_filespec *one = q->queue[i]->one;
>> +		struct diff_filespec *two = q->queue[i]->two;
>>  		int is_missing = !(one->mode && !is_null_oid(&one->oid));
>> +		int was_missing = !two->mode && is_null_oid(&two->oid);
> 
> Not a problem introduced by this patch per-se, but is_missing is a
> counter-intuitive name for what the boolean wants to represent, I
> think, which was brought in by b4b313f9 (reset: support "--mixed
> --intent-to-add" mode, 2014-02-04).  Before the commit, we used to
> say
> 
>  	for (i = 0; i < q->nr; i++) {
>  		struct diff_filespec *one = q->queue[i]->one;
> 		if (one->mode && !is_null_sha1(one->sha1)) {
> 			... create ce out of one and add to the	index ...
> 		} else
>  			remove_file_from_cache(one->path);
> 		...
> 
> i.e. "if one is not missing, create a ce and add it, otherwise
> remove the path".
> 
> It should have been called "one_is_missing" if we wanted to
> literally express the condition the code checked, but an even better
> name would have been given after the intent of what the code wants
> to do with the information.  If the resetted-to tree (that is what
> 'one' side of the comparison in diff_cache() is) has a valid blob,
> we want it to be in the index, and otherwise, we do not want it in
> the index.
> 
> Now, the patch makes things worse and I had to do the above digging
> to see why the new code is even more confusing.  The 'two' side of
> the comparison is what is in the to-be-corrected-by-reset index.
> "was_missing" in contrast to "is_missing" makes it sound as if it
> was the state before whatever "is_missing" tries to represent, but
> that is not what is happening.  "is_missing" does not mean "the
> entry is currently not there in the index", but "was_missing" does
> mean exactly that: "the entry is currently not there in the index".
> 
> There isn't any "was" missing about it.  It "is" missing in the
> index.  Instead of renaming, I wonder if we can do without this new
> variable.  Let's read on the patch.

The new variable can most likely be refactored away, but based on this it's
probably worth renaming "is_missing" to "is_missing_in_reset_tree" (or
inverting the boolean and using "is_in_reset_tree").

> Also, now the code uses both sides of the filepair, we must double
> check that our do_diff_cache() is *not* doing any rename detection.
> It might be even prudent to ensure that 
> 
> 	if (strcmp(one->path, two->path))
> 		BUG("reset drove diff-cache with rename detection");
> 
> but it might be with too much paranoia.  I dunno.

I don't think a rename would break what this change intends to do (although
it does break some of the current assumptions in the patch). I'll make sure
to verify the rename case works before submitting a new version, just in 
case.

>>  		struct cache_entry *ce;
>> +		struct cache_entry *ce_before;
>> +		struct checkout state = CHECKOUT_INIT;
> 
> These two new variables do not need this wide a scope, I would
> think.  Shouldn't it be inside the body of the new "if" statement
> this patch adds?

I will likely need to make other changes to this patch and re-roll, so I'll
fix the scoping of all of the variables added here when I do.

>> +		/*
>> +		 * When using the sparse-checkout feature the cache entries
>> +		 * that are added here will not have the skip-worktree bit
>> +		 * set. Without this code there is data that is lost because
>> +		 * the files that would normally be in the working directory
>> +		 * are not there and show as deleted for the next status.
>> +		 * In the case of added files, they just disappear.
>> +		 *
>> +		 * We need to create the previous version of the files in
>> +		 * the working directory so that they will have the right
>> +		 * content and the next status call will show modified or
>> +		 * untracked files correctly.
>> +		 */
>> +		if (core_apply_sparse_checkout && !file_exists(two->path)) {
> 
> In a sparsely checked out working tree, there is nothing in the
> working tree at the path.  It may be because it is sparse and we
> didn't want to have anything there, or it may be because the user
> wanted to get rid of it and said "rm path" (not "git rm path") and
> this part of the tree were of interest even if the sparse checkout
> feature was used to hide other parts of the tree.  With the above
> two checks alone, we cannot tell which.  Let's read on.
> 
>> +			pos = cache_name_pos(two->path, strlen(two->path));
> 
> We check the index to see if there is an entry for it.  I suspect
> that because we need to do this check anyway, we shouldn't even have
> to look at 'two' (and add a new 'was_missing' variable), because
> 'one' and 'two' came from a comparison between the resetted-to tree
> object and the current index, and if cache_name_pos() for the path
> (we can use 'one->path') says there is an entry in the index, by
> definition, 'two' would not be showing a "removed" state (i.e. "the
> resetted-to tree had it, the index does not" is what "was_missing"
> wants to say).
> 
> So I wonder if it is better to
> 
>  - use one->path for !file_exists() above and cache_name_pos() here
>    instead of two->path.
> 
>  - drop the confusingly named 'was_missing', because (pos < 0) is
>    equivalent to it after this point, and we didn't need it up to
>    this point.
> 
>> +			if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) &&
> 
> And we do find an entry for it.  So this path is not something
> sparse cone specifies not to check out (otherwise we would have a
> tree-like entry that covers this path in the index and not an entry
> for this specific path)?
> 
> Anyway, if it is marked with the skip-worktree bit, does that mean
> there is no risk that the reason why two->path does not exist in the
> working tree is because we earlier gave it in the working tree but
> it was later removed by the user?  Just making sure that we are not
> breaking the end-user's wish that the path should be removed by
> resurrecting it in the working tree with a new call to
> checkout_entry().
> 
>> +			    (is_missing || !was_missing)) {
> 
> And in such a case, if the resetted-to tree says we shouldn't have
> the path in the resulting index, or if the original state in the
> index had this path (but because (0 <= pos) must be true for us to 
> reach this point, I am not sure if "was_missing" can ever be true
> here), then do the following, which is ...
> 
>> +				state.force = 1;
>> +				state.refresh_cache = 1;
>> +				state.istate = &the_index;
>> +
>> +				ce_before = make_cache_entry(&the_index, two->mode,
>> +							     &two->oid, two->path,
>> +							     0, 0);
>> +				if (!ce_before)
>> +					die(_("make_cache_entry failed for path '%s'"),
>> +						two->path);
>> +
>> +				checkout_entry(ce_before, &state, NULL, NULL);
> 
> ... to resurrect the last "git add"ed state from the index and write
> it out to the working tree.  As I suspected, ce_before and state
> should be scoped inside this block and not visible outside, no?
> 
> I am not sure why this behaviour is desirable.  The "mixed" reset
> should not have to touch the working tree in the first place.
> 
> The large comment before this block says "... will not have the
> skip-worktree bit set", but we are dealing with a case where the
> original index had a cache entry there with skip-worktree bit set,
> so isn't the more desirable outcome that the cache entry added back
> to the index has the skip-worktree bit still set and there is no
> working tree file that the user did not desire to have?
> 
> And isn't it the matter of preserving the skip-worktree bit when the
> code in the post context of this hunk this patch did not touch adds
> the entry back to the index?
> 
>> +			}
>> +		}
>>  
>>  		if (is_missing && !intent_to_add) {
>>  			remove_file_from_cache(one->path);
> 
> If we look at the code after this point, we do use "is_missing"
> information to tweak ce->ce_flags with the intent-to-add bit.
> 
> Perhaps we can do a similar tweak to the cache entry to mark it with
> skip-worktree bit if the index had a cache entry at the path with
> the bit set?  The code that needs to do so would only have to
> remember if the one->path is in the current index and the cache
> entry for the path has the skip-worktree bit in the body of the new
> if() statement about (core_apply_sparse_checkout && !file_exists())
> added by this patch (I am not sure if !file_exists() even matters,
> though, as the approach I am suggesting is to preserve the skip bit
> and not disturb the working tree files at all).

I think it might easier to address these points as a whole rather than
inline.

The problem this patch is attempting to solve is that, while (as you noted)
`git reset --mixed` should not touch the working tree, it is *also* expected
to preserve the files of the pre-reset state (both statements paraphrased
from the `--mixed` option doc). Normally these statements don't conflict,
but if `skip-worktree` is respected and nothing is done to the working tree
before resetting the index, `skip-worktree` files will effectively be `reset
--hard`. So, to force preservation of the pre-reset state, the files are
checked out.

Based on that high-level intent, the implementation here can be simplified
(and clarified). The condition on checking out a file (to avoid the `reset 
--hard`) would be "if the path exists in the current index and the entry 
in the index has `skip-worktree` enabled".

* "if the path exists in the current index" - if it does not exist in the
  index, there's nothing to preserve.
* "if the entry in the index has `skip-worktree` enabled" - if it does not,
  it's already in the working tree so we don't need to checkout.

Then, `checkout_entry()` can then be run on the index entry found (rather
than a "fake" one created with `make_cache_entry`). This eliminates a lot of
unnecessary usage of `one` and `two`, which hopefully addresses some of your
concerns about them. After that, the index reset proceeds as normal (without
manual changes to the `skip-worktree` bit).

As for the issue of ignoring `skip-worktree`: all of this could be
conditioned on a "--ignore-skip-worktree-bits" flag (or something like it)
if you'd prefer the default behavior is "don't touch the working tree".

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 6/7] reset: make --mixed sparse-aware
  2021-09-30 14:51 ` [PATCH 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
@ 2021-10-01 15:03   ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-01 15:03 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git; +Cc: stolee, gitster, newren

Victoria Dye via GitGitGadget wrote:
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 0b6ff0de17d..c9b9ef4992c 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -801,14 +801,25 @@ test_expect_success 'sparse-index is not expanded' '
>  	for ref in update-deep update-folder1 update-folder2 update-deep
>  	do
>  		echo >>sparse-index/README.md &&
> +		ensure_not_expanded reset --mixed $ref
>  		ensure_not_expanded reset --hard $ref || return 1
>  	done &&
This is a bug - it's missing `&&` at the end of the line (and adding it will
cause the test to fail). The index is expanded if a mixed reset modifies an
entry outside the sparse cone, so I'll update the test in V2 to verify reset
between two refs with only in-cone files changed between them. 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] sparse-index: update command for expand/collapse test
  2021-10-01  0:04           ` Junio C Hamano
@ 2021-10-04 13:47             ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-04 13:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, Victoria Dye via GitGitGadget, git, stolee, newren

Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>>> Do we need to update this further when "mv a b"
>>> learns to expand and then collapse?
>>
>> Unfortunately, yes. `git mv` was picked more-or-less at random from the set
>> of commands that read the index and don't already have sparse index
>> integrations (excluding those I know are planned for sparse index
>> integration in the near future). If `git mv` were to be updated to disable
>> `command_requires_full_index`, the command in the test would need to change
>> again.
>>
>> For what it's worth, I do think the test itself is valuable, since it makes
>> sure a command's capability to use the sparse index is always the result of
>> an intentional update to (and review of) the code.
> 
> Oh, of course.  
> 
> I was actually wondering if it woudl be a good idea to leave a
> command that will never be "converted" so that we can keep using it
> for testing.
> 
> Perhaps a new option that is invented exactly for the purpose added
> to a plumbing e.g. "git update-index --expand-collapse"?
> 

That sounds good to me! I'll add an `update-index --expand-collapse`
implementation and update the test in v2 of this series.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 0/7] Sparse Index: integrate with reset
  2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-09-30 14:51 ` [PATCH 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
@ 2021-10-05 13:20 ` Victoria Dye via GitGitGadget
  2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
                     ` (9 more replies)
  7 siblings, 10 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya, Victoria Dye

This series integrates the sparse index with git reset and provides
miscellaneous fixes and improvements to the command in sparse checkouts.
This includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the command
 2. repository settings to enable the sparse index with ensure_full_index
    guarding any code paths that break tests without other compatibility
    updates.
 3. modifications to remove or reduce the scope in which ensure_full_index
    must be called.

The sparse index updates are predicated on a fix originating from the
microsoft/git fork [1], correcting how git reset --mixed handles resetting
entries outside the sparse checkout definition. Additionally, a performance
"bug" in next_cache_entry with sparse index is corrected, preventing
repeatedly looping over already-searched entries.

The p2000 tests demonstrate an overall ~70% execution time reduction across
all tested usages of git reset using a sparse index:

Test                                               before   after       
------------------------------------------------------------------------
2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%



Changes since V1
================

 * Add --force-full-index option to update-index. The option is used
   circumvent changing command_requires_full_index from its default value -
   right now this is effectively a no-op, but will change once update-index
   is integrated with sparse index. By using this option in the t1092
   expand/collapse test, the command used to test will not need to be
   updated with subsequent sparse index integrations.
 * Update implementation of mixed reset for entries outside sparse checkout
   definition. The condition in which a file should be checked out before
   index reset is simplified to "if it has skip-worktree enabled and a reset
   would change the file, check it out".
   * After checking the behavior of update_index_from_diff with renames,
     found that the diff used by reset does not produce diff queue entries
     with different pathnames for one and two. Because of this, and that
     nothing in the implementation seems to rely on identical path names, no
     BUG check is added.
 * Correct a bug in the sparse index is not expanded tests in t1092 where
   failure of a git reset --mixed test was not being reported. Test now
   verifies an appropriate scenario with corrected failure-checking.

Thanks! -Victoria

[1] microsoft@6b8a074

Kevin Willford (1):
  reset: behave correctly with sparse-checkout

Victoria Dye (6):
  update-index: add --force-full-index option for expand/collapse test
  reset: expand test coverage for sparse checkouts
  reset: integrate with sparse index
  reset: make sparse-aware (except --mixed)
  reset: make --mixed sparse-aware
  unpack-trees: improve performance of next_cache_entry

 Documentation/git-update-index.txt       |   5 +
 builtin/reset.c                          |  62 +++++++++-
 builtin/update-index.c                   |  11 ++
 cache-tree.c                             |  43 ++++++-
 cache.h                                  |  10 ++
 read-cache.c                             |  22 ++--
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 139 ++++++++++++++++++++++-
 t/t7114-reset-sparse-checkout.sh         |  61 ++++++++++
 unpack-trees.c                           |  23 +++-
 10 files changed, 351 insertions(+), 28 deletions(-)
 create mode 100755 t/t7114-reset-sparse-checkout.sh


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1048

Range-diff vs v1:

 1:  65905bf4e00 ! 1:  22c69bc6030 reset: behave correctly with sparse-checkout
     @@ builtin/reset.c
       #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
       
      @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
     - 		struct diff_options *opt, void *data)
     - {
     - 	int i;
     -+	int pos;
       	int intent_to_add = *(int *)data;
       
       	for (i = 0; i < q->nr; i++) {
     ++		int pos;
       		struct diff_filespec *one = q->queue[i]->one;
     +-		int is_missing = !(one->mode && !is_null_oid(&one->oid));
      +		struct diff_filespec *two = q->queue[i]->two;
     - 		int is_missing = !(one->mode && !is_null_oid(&one->oid));
     -+		int was_missing = !two->mode && is_null_oid(&two->oid);
     ++		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
       		struct cache_entry *ce;
     -+		struct cache_entry *ce_before;
     -+		struct checkout state = CHECKOUT_INIT;
     -+
     + 
     +-		if (is_missing && !intent_to_add) {
      +		/*
     -+		 * When using the sparse-checkout feature the cache entries
     -+		 * that are added here will not have the skip-worktree bit
     -+		 * set. Without this code there is data that is lost because
     -+		 * the files that would normally be in the working directory
     -+		 * are not there and show as deleted for the next status.
     -+		 * In the case of added files, they just disappear.
     -+		 *
     -+		 * We need to create the previous version of the files in
     -+		 * the working directory so that they will have the right
     -+		 * content and the next status call will show modified or
     -+		 * untracked files correctly.
     ++		 * If the file being reset has `skip-worktree` enabled, we need
     ++		 * to check it out to prevent the file from being hard reset.
      +		 */
     -+		if (core_apply_sparse_checkout && !file_exists(two->path)) {
     -+			pos = cache_name_pos(two->path, strlen(two->path));
     -+			if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) &&
     -+			    (is_missing || !was_missing)) {
     -+				state.force = 1;
     -+				state.refresh_cache = 1;
     -+				state.istate = &the_index;
     -+
     -+				ce_before = make_cache_entry(&the_index, two->mode,
     -+							     &two->oid, two->path,
     -+							     0, 0);
     -+				if (!ce_before)
     -+					die(_("make_cache_entry failed for path '%s'"),
     -+						two->path);
     ++		pos = cache_name_pos(two->path, strlen(two->path));
     ++		if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
     ++			struct checkout state = CHECKOUT_INIT;
     ++			state.force = 1;
     ++			state.refresh_cache = 1;
     ++			state.istate = &the_index;
      +
     -+				checkout_entry(ce_before, &state, NULL, NULL);
     -+			}
     ++			checkout_entry(active_cache[pos], &state, NULL, NULL);
      +		}
     - 
     - 		if (is_missing && !intent_to_add) {
     ++
     ++		if (!is_in_reset_tree && !intent_to_add) {
       			remove_file_from_cache(one->path);
     + 			continue;
     + 		}
     +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
     + 		if (!ce)
     + 			die(_("make_cache_entry failed for path '%s'"),
     + 			    one->path);
     +-		if (is_missing) {
     ++		if (!is_in_reset_tree) {
     + 			ce->ce_flags |= CE_INTENT_TO_ADD;
     + 			set_object_name_for_intent_to_add_entry(ce);
     + 		}
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathspec outside sparse definition' '
 2:  a1fa7c080ae ! 2:  f7cb9013d46 sparse-index: update command for expand/collapse test
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    sparse-index: update command for expand/collapse test
     +    update-index: add --force-full-index option for expand/collapse test
      
     -    In anticipation of multiple commands being fully integrated with sparse
     -    index, update the test for index expand and collapse for non-sparse index
     -    integrated commands to use `mv`.
     +    Add a new `--force-full-index` option to `git update-index`, which skips
     +    explicitly setting `command_requires_full_index`. This lets `git
     +    update-index --force-full-index` run as a command without sparse index
     +    compatibility implemented, even after it receives sparse index compatibility
     +    updates.
     +
     +    By using `git update-index --force-full-index` in the `t1092` test
     +    `sparse-index is expanded and converted back`, commands can continue to
     +    integrate with the sparse index without the need to keep modifying the
     +    command used in the test.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
     + ## Documentation/git-update-index.txt ##
     +@@ Documentation/git-update-index.txt: SYNOPSIS
     + 	     [--[no-]fsmonitor]
     + 	     [--really-refresh] [--unresolve] [--again | -g]
     + 	     [--info-only] [--index-info]
     ++	     [--force-full-index]
     + 	     [-z] [--stdin] [--index-version <n>]
     + 	     [--verbose]
     + 	     [--] [<file>...]
     +@@ Documentation/git-update-index.txt: time. Version 4 is relatively young (first released in 1.8.0 in
     + October 2012). Other Git implementations such as JGit and libgit2
     + may not support it yet.
     + 
     ++--force-full-index::
     ++	Force the command to operate on a full index, expanding a sparse
     ++	index if necessary.
     ++
     + -z::
     + 	Only meaningful with `--stdin` or `--index-info`; paths are
     + 	separated with NUL character instead of LF.
     +
     + ## builtin/update-index.c ##
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 	int split_index = -1;
     + 	int force_write = 0;
     + 	int fsmonitor = -1;
     ++	int use_default_full_index = 0;
     + 	struct lock_file lock_file = LOCK_INIT;
     + 	struct parse_opt_ctx_t ctx;
     + 	strbuf_getline_fn getline_fn;
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 		{OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL,
     + 			N_("clear fsmonitor valid bit"),
     + 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
     ++		OPT_SET_INT(0, "force-full-index", &use_default_full_index,
     ++			N_("run with full index explicitly required"), 1),
     + 		OPT_END()
     + 	};
     + 
     +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     + 	if (newfd < 0)
     + 		lock_error = errno;
     + 
     ++	/*
     ++	 * If --force-full-index is set, the command should skip manually
     ++	 * setting `command_requires_full_index`.
     ++	 */
     ++	prepare_repo_settings(r);
     ++	if (!use_default_full_index)
     ++		r->settings.command_requires_full_index = 1;
     ++
     + 	entries = read_cache();
     + 	if (entries < 0)
     + 		die("cache corrupted");
     +
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' '
       	init_repos &&
       
       	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
     -+		git -C sparse-index -c core.fsmonitor="" mv a b &&
     ++		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
       	test_region index convert_to_sparse trace2.txt &&
       	test_region index ensure_full_index trace2.txt
       '
 3:  d033c5e365f = 3:  c7e9d9f4e03 reset: expand test coverage for sparse checkouts
 4:  2d63a250637 = 4:  49813c8d9ed reset: integrate with sparse index
 5:  e919e6d3270 = 5:  78cd85d8dcc reset: make sparse-aware (except --mixed)
 6:  e7cda32efb6 ! 6:  5eaae0825af reset: make --mixed sparse-aware
     @@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec,
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
     - 	for ref in update-deep update-folder1 update-folder2 update-deep
     - 	do
     - 		echo >>sparse-index/README.md &&
     -+		ensure_not_expanded reset --mixed $ref
       		ensure_not_expanded reset --hard $ref || return 1
       	done &&
       
     ++	ensure_not_expanded reset --mixed base &&
       	ensure_not_expanded reset --hard update-deep &&
       	ensure_not_expanded reset --keep base &&
       	ensure_not_expanded reset --merge update-deep &&
 7:  8637ec1660e = 7:  aa963eefae7 unpack-trees: improve performance of next_cache_entry

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
@ 2021-10-05 13:20   ` Kevin Willford via GitGitGadget
  2021-10-05 19:30     ` Junio C Hamano
                       ` (2 more replies)
  2021-10-05 13:20   ` [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
                     ` (8 subsequent siblings)
  9 siblings, 3 replies; 85+ messages in thread
From: Kevin Willford via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Kevin Willford

From: Kevin Willford <kewillf@microsoft.com>

When using the sparse checkout feature, 'git reset' will add entries to
the index that will have the skip-worktree bit off but will leave the
working directory empty. File data is lost because the index version of
the files has been changed but there is nothing that is in the working
directory. This will cause the next 'git status' call to show either
deleted for files modified or deleting or nothing for files added. The
added files should be shown as untracked and modified files should be
shown as modified.

To fix this when the reset is running if there is not a file in the
working directory and if it will be missing with the new index entry or
was not missing in the previous version, we create the previous index
version of the file in the working directory so that status will report
correctly and the files will be availble for the user to deal with.

This fixes a documented failure from t1092 that was created in 19a0acc
(t1092: test interesting sparse-checkout scenarios, 2021-01-23).

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 24 ++++++++--
 t/t1092-sparse-checkout-compatibility.sh |  4 +-
 t/t7114-reset-sparse-checkout.sh         | 61 ++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 6 deletions(-)
 create mode 100755 t/t7114-reset-sparse-checkout.sh

diff --git a/builtin/reset.c b/builtin/reset.c
index 51c9e2f43ff..3b75d3b2f20 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -25,6 +25,8 @@
 #include "cache-tree.h"
 #include "submodule.h"
 #include "submodule-config.h"
+#include "dir.h"
+#include "entry.h"
 
 #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
 
@@ -130,11 +132,27 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	int intent_to_add = *(int *)data;
 
 	for (i = 0; i < q->nr; i++) {
+		int pos;
 		struct diff_filespec *one = q->queue[i]->one;
-		int is_missing = !(one->mode && !is_null_oid(&one->oid));
+		struct diff_filespec *two = q->queue[i]->two;
+		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
 		struct cache_entry *ce;
 
-		if (is_missing && !intent_to_add) {
+		/*
+		 * If the file being reset has `skip-worktree` enabled, we need
+		 * to check it out to prevent the file from being hard reset.
+		 */
+		pos = cache_name_pos(two->path, strlen(two->path));
+		if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
+			struct checkout state = CHECKOUT_INIT;
+			state.force = 1;
+			state.refresh_cache = 1;
+			state.istate = &the_index;
+
+			checkout_entry(active_cache[pos], &state, NULL, NULL);
+		}
+
+		if (!is_in_reset_tree && !intent_to_add) {
 			remove_file_from_cache(one->path);
 			continue;
 		}
@@ -144,7 +162,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 		if (!ce)
 			die(_("make_cache_entry failed for path '%s'"),
 			    one->path);
-		if (is_missing) {
+		if (!is_in_reset_tree) {
 			ce->ce_flags |= CE_INTENT_TO_ADD;
 			set_object_name_for_intent_to_add_entry(ce);
 		}
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 886e78715fe..c5977152661 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-# in this scenario, but it shouldn't.
-test_expect_failure 'checkout and reset (mixed)' '
+test_expect_success 'checkout and reset (mixed)' '
 	init_repos &&
 
 	test_all_match git checkout -b reset-test update-deep &&
diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh
new file mode 100755
index 00000000000..a8029707fb1
--- /dev/null
+++ b/t/t7114-reset-sparse-checkout.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='reset when using a sparse-checkout'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_tick &&
+	echo "checkout file" >c &&
+	echo "modify file" >m &&
+	echo "delete file" >d &&
+	git add . &&
+	git commit -m "initial commit" &&
+	echo "added file" >a &&
+	echo "modification of a file" >m &&
+	git rm d &&
+	git add . &&
+	git commit -m "second commit" &&
+	git checkout -b endCommit
+'
+
+test_expect_success 'reset when there is a sparse-checkout' '
+	echo "/c" >.git/info/sparse-checkout &&
+	test_config core.sparsecheckout true &&
+	git checkout -B resetBranch &&
+	test_path_is_missing m &&
+	test_path_is_missing a &&
+	test_path_is_missing d &&
+	git reset HEAD~1 &&
+	echo "checkout file" >expect &&
+	test_cmp expect c &&
+	echo "added file" >expect &&
+	test_cmp expect a &&
+	echo "modification of a file" >expect &&
+	test_cmp expect m &&
+	test_path_is_missing d
+'
+
+test_expect_success 'reset after deleting file without skip-worktree bit' '
+	git checkout -f endCommit &&
+	git clean -xdf &&
+	cat >.git/info/sparse-checkout <<-\EOF &&
+	/c
+	/m
+	EOF
+	test_config core.sparsecheckout true &&
+	git checkout -B resetAfterDelete &&
+	test_path_is_file m &&
+	test_path_is_missing a &&
+	test_path_is_missing d &&
+	rm -f m &&
+	git reset HEAD~1 &&
+	echo "checkout file" >expect &&
+	test_cmp expect c &&
+	echo "added file" >expect &&
+	test_cmp expect a &&
+	test_path_is_missing m &&
+	test_path_is_missing d
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
@ 2021-10-05 13:20   ` Victoria Dye via GitGitGadget
  2021-10-06  2:00     ` Elijah Newren
  2021-10-06 10:33     ` Bagas Sanjaya
  2021-10-05 13:20   ` [PATCH v2 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
                     ` (7 subsequent siblings)
  9 siblings, 2 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a new `--force-full-index` option to `git update-index`, which skips
explicitly setting `command_requires_full_index`. This lets `git
update-index --force-full-index` run as a command without sparse index
compatibility implemented, even after it receives sparse index compatibility
updates.

By using `git update-index --force-full-index` in the `t1092` test
`sparse-index is expanded and converted back`, commands can continue to
integrate with the sparse index without the need to keep modifying the
command used in the test.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-update-index.txt       |  5 +++++
 builtin/update-index.c                   | 11 +++++++++++
 t/t1092-sparse-checkout-compatibility.sh |  2 +-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 2853f168d97..06255e321a3 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -24,6 +24,7 @@ SYNOPSIS
 	     [--[no-]fsmonitor]
 	     [--really-refresh] [--unresolve] [--again | -g]
 	     [--info-only] [--index-info]
+	     [--force-full-index]
 	     [-z] [--stdin] [--index-version <n>]
 	     [--verbose]
 	     [--] [<file>...]
@@ -170,6 +171,10 @@ time. Version 4 is relatively young (first released in 1.8.0 in
 October 2012). Other Git implementations such as JGit and libgit2
 may not support it yet.
 
+--force-full-index::
+	Force the command to operate on a full index, expanding a sparse
+	index if necessary.
+
 -z::
 	Only meaningful with `--stdin` or `--index-info`; paths are
 	separated with NUL character instead of LF.
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..32ada3ead77 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -964,6 +964,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	int split_index = -1;
 	int force_write = 0;
 	int fsmonitor = -1;
+	int use_default_full_index = 0;
 	struct lock_file lock_file = LOCK_INIT;
 	struct parse_opt_ctx_t ctx;
 	strbuf_getline_fn getline_fn;
@@ -1069,6 +1070,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		{OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL,
 			N_("clear fsmonitor valid bit"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
+		OPT_SET_INT(0, "force-full-index", &use_default_full_index,
+			N_("run with full index explicitly required"), 1),
 		OPT_END()
 	};
 
@@ -1082,6 +1085,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (newfd < 0)
 		lock_error = errno;
 
+	/*
+	 * If --force-full-index is set, the command should skip manually
+	 * setting `command_requires_full_index`.
+	 */
+	prepare_repo_settings(r);
+	if (!use_default_full_index)
+		r->settings.command_requires_full_index = 1;
+
 	entries = read_cache();
 	if (entries < 0)
 		die("cache corrupted");
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index c5977152661..b3c0d3b98ee 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	init_repos &&
 
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" reset --hard &&
+		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
 	test_region index convert_to_sparse trace2.txt &&
 	test_region index ensure_full_index trace2.txt
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 3/7] reset: expand test coverage for sparse checkouts
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
  2021-10-05 13:20   ` [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-10-05 13:20   ` Victoria Dye via GitGitGadget
  2021-10-06  2:04     ` Elijah Newren
  2021-10-05 13:20   ` [PATCH v2 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
pathspecs both inside and outside of the sparse checkout definition. New
performance test cases exercise various execution paths for `reset`.

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 107 +++++++++++++++++++++++
 2 files changed, 110 insertions(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 597626276fb..bfd332120c8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,5 +110,8 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all git reset
+test_perf_on_all git reset --hard
+test_perf_on_all git reset -- does-not-exist
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index b3c0d3b98ee..f0723a6ac97 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -479,6 +479,113 @@ test_expect_success 'checkout and reset (mixed) [sparse]' '
 	test_sparse_match git reset update-folder2
 '
 
+# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit>
+# will be added to the work tree even if outside the sparse checkout
+# definition, and even if the file is modified to a state of having no local
+# changes. The file is "re-ignored" if a hard reset is executed. We may want to
+# change this behavior in the future and enforce that files are not written
+# outside of the sparse checkout definition.
+test_expect_success 'checkout and mixed reset file tracking [sparse]' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset update-folder1 &&
+	test_all_match git reset update-deep &&
+
+	# At this point, there are no changes in the working tree. However,
+	# folder1/a now exists locally (even though it is outside of the sparse
+	# paths).
+	run_on_sparse test_path_exists folder1 &&
+
+	run_on_all rm folder1/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_sparse test_path_is_missing folder1 &&
+	test_path_exists full-checkout/folder1
+'
+
+test_expect_success 'checkout and reset (merge)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --merge deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --merge deepest
+'
+
+test_expect_success 'checkout and reset (keep)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --keep deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --keep deepest
+'
+
+test_expect_success 'reset with pathspecs inside sparse definition' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents deep/a &&
+
+	test_all_match git reset base -- deep/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset base -- nonexistent-file &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset deepest -- deep &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_success 'reset with sparse directory pathspec outside definition' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset base -- folder1 &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_success 'reset with pathspec match in sparse directory' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset base -- folder1/a &&
+	test_all_match git status --porcelain=v2
+'
+
+test_expect_success 'reset with wildcard pathspec' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset --hard update-folder1 &&
+	test_all_match git reset base -- \*/a &&
+	test_all_match git status --porcelain=v2
+'
+
 test_expect_success 'merge, cherry-pick, and rebase' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 4/7] reset: integrate with sparse index
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-10-05 13:20   ` [PATCH v2 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
@ 2021-10-05 13:20   ` Victoria Dye via GitGitGadget
  2021-10-06  2:15     ` Elijah Newren
  2021-10-05 13:20   ` [PATCH v2 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

`reset --soft` does not modify the index, so no compatibility changes are
needed for it to function without expanding the index. For all other reset
modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is
explicitly expanded with `ensure_full_index` to maintain current behavior.

Additionally, the `read_cache()` check verifying an uncorrupted index is
moved after argument parsing and preparing the repo settings. The index is
not used by the preceding argument handling, but `read_cache()` does need to
be run after enabling sparse index for the command and before resetting.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c | 10 +++++++---
 cache-tree.c    |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 3b75d3b2f20..e1f2a2bb2c4 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -184,6 +184,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.flags.override_submodule_config = 1;
 	opt.repo = the_repository;
 
+	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
@@ -261,9 +262,6 @@ static void parse_args(struct pathspec *pathspec,
 	}
 	*rev_ret = rev;
 
-	if (read_cache() < 0)
-		die(_("index file corrupt"));
-
 	parse_pathspec(pathspec, 0,
 		       PATHSPEC_PREFER_FULL |
 		       (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0),
@@ -409,6 +407,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
 	if (intent_to_add && reset_type != MIXED)
 		die(_("-N can only be used with --mixed"));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
+	if (read_cache() < 0)
+		die(_("index file corrupt"));
+
 	/* Soft reset does not touch the index file nor the working tree
 	 * at all, but requires them in a good order.  Other resets reset
 	 * the index file to the tree object we are switching to. */
diff --git a/cache-tree.c b/cache-tree.c
index 90919f9e345..9be19c85b66 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r,
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
+	ensure_full_index(istate);
 	prime_cache_tree_rec(r, istate->cache_tree, tree);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 5/7] reset: make sparse-aware (except --mixed)
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-10-05 13:20   ` [PATCH v2 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
@ 2021-10-05 13:20   ` Victoria Dye via GitGitGadget
  2021-10-06  3:43     ` Elijah Newren
  2021-10-06 10:34     ` Bagas Sanjaya
  2021-10-05 13:20   ` [PATCH v2 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
                     ` (4 subsequent siblings)
  9 siblings, 2 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`,
the function must determine whether the currently-processing directory in
the tree is sparse or not. If it is not sparse, the tree is parsed and
subtree recursively constructed. If it is sparse, no subtrees are added to
the tree and the entry count is set to 1 (representing the sparse directory
itself).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache-tree.c                             | 44 +++++++++++++++++++++---
 cache.h                                  | 10 ++++++
 read-cache.c                             | 22 ++++++++----
 t/t1092-sparse-checkout-compatibility.sh | 15 ++++++--
 4 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 9be19c85b66..9021669d682 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -740,15 +740,29 @@ out:
 	return ret;
 }
 
+static void prime_cache_tree_sparse_dir(struct repository *r,
+					struct cache_tree *it,
+					struct tree *tree,
+					struct strbuf *tree_path)
+{
+
+	oidcpy(&it->oid, &tree->object.oid);
+	it->entry_count = 1;
+	return;
+}
+
 static void prime_cache_tree_rec(struct repository *r,
 				 struct cache_tree *it,
-				 struct tree *tree)
+				 struct tree *tree,
+				 struct strbuf *tree_path)
 {
+	struct strbuf subtree_path = STRBUF_INIT;
 	struct tree_desc desc;
 	struct name_entry entry;
 	int cnt;
 
 	oidcpy(&it->oid, &tree->object.oid);
+
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
@@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r,
 		else {
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
+
 			if (!subtree->object.parsed)
 				parse_tree(subtree);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-			prime_cache_tree_rec(r, sub->cache_tree, subtree);
+			strbuf_reset(&subtree_path);
+			strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1);
+			strbuf_addbuf(&subtree_path, tree_path);
+			strbuf_add(&subtree_path, entry.path, entry.pathlen);
+			strbuf_addch(&subtree_path, '/');
+
+			/*
+			 * If a sparse index is in use, the directory being processed may be
+			 * sparse. To confirm that, we can check whether an entry with that
+			 * exact name exists in the index. If it does, the created subtree
+			 * should be sparse. Otherwise, cache tree expansion should continue
+			 * as normal.
+			 */
+			if (r->index->sparse_index &&
+			    index_entry_exists(r->index, subtree_path.buf, subtree_path.len))
+				prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path);
+			else
+				prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path);
 			cnt += sub->cache_tree->entry_count;
 		}
 	}
 	it->entry_count = cnt;
+
+	strbuf_release(&subtree_path);
 }
 
 void prime_cache_tree(struct repository *r,
 		      struct index_state *istate,
 		      struct tree *tree)
 {
+	struct strbuf tree_path = STRBUF_INIT;
+
 	trace2_region_enter("cache-tree", "prime_cache_tree", the_repository);
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
-	ensure_full_index(istate);
-	prime_cache_tree_rec(r, istate->cache_tree, tree);
+	prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path);
+	strbuf_release(&tree_path);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
 }
diff --git a/cache.h b/cache.h
index f6295f3b048..1d3e4665562 100644
--- a/cache.h
+++ b/cache.h
@@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  */
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
+/*
+ * Determines whether an entry with the given name exists within the
+ * given index. The return value is 1 if an exact match is found, otherwise
+ * it is 0. Note that, unlike index_name_pos, this function does not expand
+ * the index if it is sparse. If an item exists within the full index but it
+ * is contained within a sparse directory (and not in the sparse index), 0 is
+ * returned.
+ */
+int index_entry_exists(struct index_state *, const char *name, int namelen);
+
 /*
  * Some functions return the negative complement of an insert position when a
  * precise match was not found but a position was found where the entry would
diff --git a/read-cache.c b/read-cache.c
index f5d4385c408..ea1166895f8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -551,7 +551,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate,
+				const char *name, int namelen,
+				int stage,
+				int search_sparse)
 {
 	int first, last;
 
@@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		first = next+1;
 	}
 
-	if (istate->sparse_index &&
+	if (search_sparse && istate->sparse_index &&
 	    first > 0) {
 		/* Note: first <= istate->cache_nr */
 		struct cache_entry *ce = istate->cache[first - 1];
@@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		    ce_namelen(ce) < namelen &&
 		    !strncmp(name, ce->name, ce_namelen(ce))) {
 			ensure_full_index(istate);
-			return index_name_stage_pos(istate, name, namelen, stage);
+			return index_name_stage_pos(istate, name, namelen, stage, search_sparse);
 		}
 	}
 
@@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 
 int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
-	return index_name_stage_pos(istate, name, namelen, 0);
+	return index_name_stage_pos(istate, name, namelen, 0, 1);
+}
+
+int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+{
+	return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0;
 }
 
 int remove_index_entry_at(struct index_state *istate, int pos)
@@ -1222,7 +1230,7 @@ static int has_dir_name(struct index_state *istate,
 			 */
 		}
 
-		pos = index_name_stage_pos(istate, name, len, stage);
+		pos = index_name_stage_pos(istate, name, len, stage, 1);
 		if (pos >= 0) {
 			/*
 			 * Found one, but not so fast.  This could
@@ -1322,7 +1330,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
 		pos = index_pos_to_insert_pos(istate->cache_nr);
 	else
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
@@ -1357,7 +1365,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		if (!ok_to_replace)
 			return error(_("'%s' appears as both a file and as a directory"),
 				     ce->name);
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
 		pos = -pos-1;
 	}
 	return pos + 1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f0723a6ac97..e301ef5633a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -786,9 +786,9 @@ test_expect_success 'sparse-index is not expanded' '
 	ensure_not_expanded checkout - &&
 	ensure_not_expanded switch rename-out-to-out &&
 	ensure_not_expanded switch - &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
 
 	echo >>sparse-index/README.md &&
@@ -798,6 +798,17 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	ensure_not_expanded add . &&
 
+	for ref in update-deep update-folder1 update-folder2 update-deep
+	do
+		echo >>sparse-index/README.md &&
+		ensure_not_expanded reset --hard $ref || return 1
+	done &&
+
+	ensure_not_expanded reset --hard update-deep &&
+	ensure_not_expanded reset --keep base &&
+	ensure_not_expanded reset --merge update-deep &&
+	ensure_not_expanded reset --hard &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 6/7] reset: make --mixed sparse-aware
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-10-05 13:20   ` [PATCH v2 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
@ 2021-10-05 13:20   ` Victoria Dye via GitGitGadget
  2021-10-06  4:43     ` Elijah Newren
  2021-10-05 13:20   ` [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Sparse directory entries are "diffed" as trees in `diff_cache` (used
internally by `reset --mixed`), following a code path separate from
individual file handling. The use of `diff_tree_oid` there requires setting
explicit `change` and `add_remove` functions to process the internal
contents of a sparse directory.

Additionally, the `recursive` diff option handles cases in which `reset
--mixed` must diff/merge files that are nested multiple levels deep in a
sparse directory.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 30 +++++++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++-
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index e1f2a2bb2c4..ceb9b122897 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec,
 			  int intent_to_add)
 {
 	struct diff_options opt;
+	unsigned int i;
+	char *skip_worktree_seen = NULL;
 
 	memset(&opt, 0, sizeof(opt));
 	copy_pathspec(&opt.pathspec, pathspec);
@@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.format_callback = update_index_from_diff;
 	opt.format_callback_data = &intent_to_add;
 	opt.flags.override_submodule_config = 1;
+	opt.flags.recursive = 1;
 	opt.repo = the_repository;
+	opt.change = diff_change;
+	opt.add_remove = diff_addremove;
+
+	/*
+	 * When pathspec is given for resetting a cone-mode sparse checkout, it may
+	 * identify entries that are nested in sparse directories, in which case the
+	 * index should be expanded. For the sake of efficiency, this check is
+	 * overly-cautious: anything with a wildcard or a magic prefix requires
+	 * expansion, as well as literal paths that aren't in the sparse checkout
+	 * definition AND don't match any directory in the index.
+	 */
+	if (pathspec->nr && the_index.sparse_index) {
+		if (pathspec->magic || pathspec->has_wildcard) {
+			ensure_full_index(&the_index);
+		} else {
+			for (i = 0; i < pathspec->nr; i++) {
+				if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
+				    !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+					ensure_full_index(&the_index);
+					break;
+				}
+			}
+		}
+	}
+
+	free(skip_worktree_seen);
 
-	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e301ef5633a..4afcbc2d673 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' '
 		ensure_not_expanded reset --hard $ref || return 1
 	done &&
 
+	ensure_not_expanded reset --mixed base &&
 	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded reset --keep base &&
 	ensure_not_expanded reset --merge update-deep &&
-	ensure_not_expanded reset --hard &&
 
+	ensure_not_expanded reset base -- deep/a &&
+	ensure_not_expanded reset base -- nonexistent-file &&
+	ensure_not_expanded reset deepest -- deep &&
+
+	# Although folder1 is outside the sparse definition, it exists as a
+	# directory entry in the index, so it will be reset without needing to
+	# expand the full index.
+	ensure_not_expanded reset --hard update-folder1 &&
+	ensure_not_expanded reset base -- folder1 &&
+
+	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-10-05 13:20   ` [PATCH v2 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
@ 2021-10-05 13:20   ` Victoria Dye via GitGitGadget
  2021-10-06 10:37     ` Bagas Sanjaya
  2021-10-05 15:34   ` [PATCH v2 0/7] Sparse Index: integrate with reset Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-05 13:20 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

To find the first non-unpacked cache entry, `next_cache_entry` iterates
through index, starting at `cache_bottom`. The performance of this in full
indexes is helped by `cache_bottom` advancing with each invocation of
`mark_ce_used` (called by `unpack_index_entry`). However, the presence of
sparse directories can prevent the `cache_bottom` from advancing in a sparse
index case, effectively forcing `next_cache_entry` to search from the
beginning of the index each time it is called.

The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
(unpack-trees: preserve cache_bottom, 2021-07-14)).  Therefore, to retain
the benefit `cache_bottom` provides in non-sparse index cases, a separate
`hint` position indicates the first position `next_cache_entry` should
search, updated each execution with a new position.  The performance of `git
reset -- does-not-exist` (testing the "worst case" in which all entries in
the index are unpacked with `next_cache_entry`) is significantly improved
for the sparse index case:

Test          before            after
------------------------------------------------------
(full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
(full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
(sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
(sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 8ea0a542da8..b94733de6be 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }
 
-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;
 
+	if (*hint > pos)
+		pos = *hint;
+
 	while (pos < index->cache_nr) {
 		struct cache_entry *ce = index->cache[pos];
-		if (!(ce->ce_flags & CE_UNPACKED))
+		if (!(ce->ce_flags & CE_UNPACKED)) {
+			*hint = pos + 1;
 			return ce;
+		}
 		pos++;
 	}
+
+	*hint = pos;
 	return NULL;
 }
 
@@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Are we supposed to look at the index too? */
 	if (o->merge) {
+		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;
 
 			if (o->diff_index_cached)
-				ce = next_cache_entry(o);
+				ce = next_cache_entry(o, &hint);
 			else
 				ce = find_cache_entry(info, p);
 
@@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *,
 int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
 {
 	struct repository *repo = the_repository;
-	int i, ret;
+	int i, hint, ret;
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
@@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
+			hint = -1;
+
 			/*
 			 * Unpack existing index entries that sort before the
 			 * prefix the tree is spliced into.  Note that o->merge
 			 * is always true in this case.
 			 */
 			while (1) {
-				struct cache_entry *ce = next_cache_entry(o);
+				struct cache_entry *ce = next_cache_entry(o, &hint);
 				if (!ce)
 					break;
 				if (ce_in_traverse_path(ce, &info))
@@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	/* Any left-over entries in the index? */
 	if (o->merge) {
+		hint = -1;
 		while (1) {
-			struct cache_entry *ce = next_cache_entry(o);
+			struct cache_entry *ce = next_cache_entry(o, &hint);
 			if (!ce)
 				break;
 			if (unpack_index_entry(ce, o) < 0)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/7] Sparse Index: integrate with reset
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-10-05 13:20   ` [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
@ 2021-10-05 15:34   ` Ævar Arnfjörð Bjarmason
  2021-10-05 20:44     ` Victoria Dye
  2021-10-06  5:46   ` Elijah Newren
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
  9 siblings, 1 reply; 85+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-05 15:34 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, stolee, gitster, newren, Taylor Blau, Bagas Sanjaya, Victoria Dye


On Tue, Oct 05 2021, Victoria Dye via GitGitGadget wrote:

> The p2000 tests demonstrate an overall ~70% execution time reduction across
> all tested usages of git reset using a sparse index:

[...]

> Test                                               before   after       
> ------------------------------------------------------------------------
> 2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
> 2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
> 2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
> 2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
> 2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
> 2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
> 2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
> 2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
> 2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
> 2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
> 2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
> 2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%

This series looks like it really improves some cases, but at the cost of
that -70% improvement we've got a ~5% regression in 7/7 for the full-v3
--does-not-exist cases. As noted in your 7/7 (which improves all other
cases):

    (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
    (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%

Which b.t.w. I had to read a couple of times before realizig that its
quoted:
    
    Test          before            after
    ------------------------------------------------------
    (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
    (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
    (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
    (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

Is just the does-not-exist part of this bigger table, are the other
cases all ~0% changed, or ...?
    
Anyway, until 7/7 the v3 had been sped up, but a ~10% increase landed us
at ~+6%, and full-v4 had been ~0% but got ~6% worse?

Is there a way we can get those improvements in performance without
regressing on the full-* cases?

Also, these tests only check sparse performance, but isn't some of the
code being modified here general enough to not be used exclusively by
the sparse mode, full checkout cone or not?

It looks fairly easy to extend p2000-sparse-operations.sh to run the
same tests but just pretend that it's running in a "full" mode without
actually setting up anyting sparse-specific (the meat of those tests
just runs "git status" etc. How does that look with this series?

Since only the CL and 7/7 quote numbers from p2000, and 7/7 is at least
a partial regression, it would be nice to have perf numbers on each
commit (if only as a one-off for ML consumption). Are there any more
improvements followed by regressions followed by improvements as we go
along? Would be useful to know...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
@ 2021-10-05 19:30     ` Junio C Hamano
  2021-10-05 21:59       ` Victoria Dye
  2021-10-06  1:46     ` Elijah Newren
  2021-10-06 10:31     ` Bagas Sanjaya
  2 siblings, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-10-05 19:30 UTC (permalink / raw)
  To: Kevin Willford via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Bagas Sanjaya, Victoria Dye,
	Kevin Willford

"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes:

> When using the sparse checkout feature, 'git reset' will add entries to
> the index that will have the skip-worktree bit off but will leave the
> working directory empty. File data is lost because the index version of
> the files has been changed but there is nothing that is in the working
> directory. This will cause the next 'git status' call to show either
> deleted for files modified or deleting or nothing for files added. The
> added files should be shown as untracked and modified files should be
> shown as modified.

I am on vacation today, so let me be brief.

Let me see if I am understanding the situation correctly.

We have the index, with a path that records a blob, but the path is
marked with skip-wortree bit.

    $ rm -fr test && mkdir test && cd test
    $ git init .
    $ date >no-skip
    $ date >skip
    $ git add no-skip skip
    $ git commit -m initial
     2 files changed, 2 insertions(+)
     create mode 100644 no-skip
     create mode 100644 skip
    $ date >no-skip
    $ date >skip
    $ git add no-skip skip
    $ git update-index --skip-wortree skip
    $ rm skip
    $ git commit -m second
    [master e9088ad] second
     2 files changed, 2 insertions(+), 2 deletions(-)
    $ ls *skip
    no-skip
    $ git ls-files -t
    H no-skip
    S skip
    $ git status
    On branch master
    nothing to commit, working tree clean

Note.  There is no 'reset' done yet so far.

The user is happy with the state because

 (1) The user marked the path "skip" with skip-worktree bit, and
     thanks to that, even though "skip" is absent in the working
     tree, the "git status" does not complain.

 (2) The user marked the path "skip" with skip-worktree bit because
     the user did not want to see such a file in the working tree.
     And "git commit -m second", "git ls-files -t", or "git status"
     that were done to get here did not make it materialize in the
     working tree all of sudden.

And then the user says "git reset HEAD^" to switch to a different
commit.

    $ git reset HEAD^
    $ ls *skip
    no-skip
    $ git ls-files -t
    M no-skip
    D skip
    $ git status -suno
     M no-skip
     D skip

The user is unhappy with the state because "skip" is shown as lost.

Do I understand the situation you are trying to deal with correctly?

> To fix this when the reset is running if there is not a file in the
> working directory and if it will be missing with the new index entry or
> was not missing in the previous version, we create the previous index
> version of the file in the working directory so that status will report
> correctly and the files will be availble for the user to deal with.

Assuming I read the problem description correctly, I am highly
skeptical that the above is a correct approach to keep the user
happy.  Yes, if you created a working tree file with contents that
match the blob recorded for the path in the initial commit when
"reset HEAD^" is done, you may keep "git status" quiet, so (1) above
will be kept, but what about (2)?  The user marked the path with
"skip" but, because the path should not appear on the working tree.
The "fix" is countermanding that wish by the user, isn't it?

Wouldn't a fix to the situation be to 

 * Add the blob for "skip" taken from the initial commit to the
   index, just like the entry for "no-skip" is updated;

 * But remember that "skip" was marked with "skip-worktree" bit
   immediately before "git reset" was asked to do its thing, and
   re-add the bit to the path in the index before "git reset" gives
   the control back to the usre;

 * And keep the working tree untouched, without writing anything out
   to "skip".  If the user had a (possibly unrelated) file there, it
   will not be overwritten, and if the user left the path absent, it
   will still be absent.

so that the last three diagnostic commands in the above sample
sequence would instead read:

    $ ls *skip
    no-skip
    $ git ls-files -t
    M no-skip
    S skip
    $ git status -suno
     M no-skip

i.e. skip gets updated in the index only, nothing changes in the
working tree for "skip" or "no-skip", and status reports that
"no-skip" is different from the index but "skip" hasn't changed in
the working tree since the index (thanks to its skip-worktree bit).

Then the user will be happy in the same way as the user was happy
immediately after the state marked with "There is no 'reset' done
yet so far." above, on both counts, not just for "status does not
report something got changed" part but also "user didn't want to see
'skip' in the working tree, and 'skip' did not materialize" part.

Thanks.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/7] Sparse Index: integrate with reset
  2021-10-05 15:34   ` [PATCH v2 0/7] Sparse Index: integrate with reset Ævar Arnfjörð Bjarmason
@ 2021-10-05 20:44     ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-05 20:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Victoria Dye via GitGitGadget
  Cc: git, stolee, gitster, newren, Taylor Blau, Bagas Sanjaya

Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Oct 05 2021, Victoria Dye via GitGitGadget wrote:
> 
>> The p2000 tests demonstrate an overall ~70% execution time reduction across
>> all tested usages of git reset using a sparse index:
> 
> [...]
> 
>> Test                                               before   after       
>> ------------------------------------------------------------------------
>> 2000.22: git reset (full-v3)                       0.48     0.51 +6.3% 
>> 2000.23: git reset (full-v4)                       0.47     0.50 +6.4% 
>> 2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
>> 2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
>> 2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4% 
>> 2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3% 
>> 2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
>> 2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
>> 2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6% 
>> 2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7% 
>> 2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
>> 2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%
> 
> This series looks like it really improves some cases, but at the cost of
> that -70% improvement we've got a ~5% regression in 7/7 for the full-v3
> --does-not-exist cases. As noted in your 7/7 (which improves all other
> cases):
> 
>     (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
>     (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
> 

New performance numbers at the end - I think I have an explanation for this.

> Which b.t.w. I had to read a couple of times before realizig that its
> quoted:
>     
>     Test          before            after
>     ------------------------------------------------------
>     (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
>     (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
>     (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
>     (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%
> 
> Is just the does-not-exist part of this bigger table, are the other
> cases all ~0% changed, or ...?
>     

These numbers were for the `git reset -- does-not-exist` case only. If I end
up needing to send a V3, though, I'll probably remove the performance
numbers from 7/7 altogether - looking at them now, they make the commit
message somewhat cluttered. That said, performance numbers *are* helpful for
reviews on the mailing list, so I'd keep the information in the cover
letter at the very least.

> Anyway, until 7/7 the v3 had been sped up, but a ~10% increase landed us
> at ~+6%, and full-v4 had been ~0% but got ~6% worse?
> 
> Is there a way we can get those improvements in performance without
> regressing on the full-* cases?
> 
> Also, these tests only check sparse performance, but isn't some of the
> code being modified here general enough to not be used exclusively by
> the sparse mode, full checkout cone or not?
> 
> It looks fairly easy to extend p2000-sparse-operations.sh to run the
> same tests but just pretend that it's running in a "full" mode without
> actually setting up anyting sparse-specific (the meat of those tests
> just runs "git status" etc. How does that look with this series?
> 

I updated `p2000` locally to do this but the setup was substantially slower
for the full checkout, to the point that it was infeasible to run the
complete test for all relevant commits. Looking at the changes in this
series, nothing appears to affect the full checkout case differently than
the sparse checkout/full index case, so I'm fairly confident there won't be
a regression specific to full checkouts.

> Since only the CL and 7/7 quote numbers from p2000, and 7/7 is at least
> a partial regression, it would be nice to have perf numbers on each
> commit (if only as a one-off for ML consumption). Are there any more
> improvements followed by regressions followed by improvements as we go
> along? Would be useful to know...
> 

I don't think any of the apparent slowdowns seen in these results represent
real regressions. After re-running the performance tests, I saw variability
of up to ~20% execution time across changes with commands that should see no
effect on their execution time (e.g. sparse-v* from 1/7 to 4/7).
Additionally, I saw different increases & decreases each time for each
end-to-end run of the tests. The most reliable, noticeable changes across
the test executions were:

1. When each variant of `git reset` was integrated with sparse index, a
   65-75% execution time reduction in relevant sparse-v* tests.
2. `git reset -- does-not-exist` slower than `git reset` in 6/7,
   then matching its speed after 7/7.
3. As of 7/7, full-v* to sparse-v* showing a 50% execution time reduction.

My guess is that the variability comes from general "uncontrolled" factors
when running the tests (e.g., background processes on my system). The good
news is, when the tests are re-run with more trials (and the recent bugfix
to `t/perf/perf-lib.sh` [1]), the execution times look a lot less worrisome 
(apologies for the table width, but I'd like to err on the side of providing
more complete information):

Test                                               base              [1/7]                    [4/7]                    [5/7]                    [6/7]                    [7/7]            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2000.22: git reset (full-v3)                       0.44(0.16+0.19)   0.44(0.17+0.18) +0.0%    0.44(0.17+0.19) +0.0%    0.45(0.17+0.18) +2.3%    0.44(0.17+0.19) +0.0%    0.45(0.17+0.18) +2.3% 
2000.23: git reset (full-v4)                       0.43(0.16+0.18)   0.43(0.16+0.19) +0.0%    0.45(0.17+0.18) +4.7%    0.44(0.17+0.18) +2.3%    0.44(0.17+0.18) +2.3%    0.44(0.18+0.18) +2.3% 
2000.24: git reset (sparse-v3)                     0.82(0.54+0.19)   0.84(0.56+0.19) +2.4%    0.81(0.54+0.19) -1.2%    0.88(0.60+0.19) +7.3%    0.27(0.03+0.45) -67.1%   0.27(0.03+0.47) -67.1%
2000.25: git reset (sparse-v4)                     0.82(0.55+0.18)   0.82(0.53+0.20) +0.0%    0.83(0.55+0.19) +1.2%    0.82(0.54+0.19) +0.0%    0.27(0.03+0.50) -67.1%   0.27(0.03+0.48) -67.1%
2000.26: git reset --hard (full-v3)                0.71(0.38+0.24)   0.69(0.37+0.23) -2.8%    0.70(0.37+0.24) -1.4%    0.78(0.41+0.27) +9.9%    0.71(0.38+0.25) +0.0%    0.70(0.37+0.23) -1.4% 
2000.27: git reset --hard (full-v4)                0.71(0.38+0.23)   0.77(0.42+0.25) +8.5%    0.76(0.41+0.26) +7.0%    0.72(0.40+0.24) +1.4%    0.68(0.37+0.23) -4.2%    0.67(0.36+0.22) -5.6%
2000.28: git reset --hard (sparse-v3)              1.29(0.93+0.26)   1.33(0.95+0.27) +3.1%    1.11(0.76+0.25) -14.0%   0.38(0.05+0.25) -70.5%   0.36(0.04+0.22) -72.1%   0.34(0.04+0.21) -73.6%
2000.29: git reset --hard (sparse-v4)              1.17(0.84+0.24)   1.10(0.79+0.23) -6.0%    1.01(0.69+0.24) -13.7%   0.42(0.05+0.26) -64.1%   0.39(0.05+0.25) -66.7%   0.38(0.05+0.23) -67.5%
2000.30: git reset -- does-not-exist (full-v3)     0.50(0.19+0.20)   0.50(0.19+0.20) +0.0%    0.53(0.21+0.22) +6.0%    0.47(0.18+0.19) -6.0%    0.45(0.18+0.18) -10.0%   0.45(0.18+0.19) -10.0%
2000.31: git reset -- does-not-exist (full-v4)     0.45(0.18+0.18)   0.46(0.18+0.19) +2.2%    0.47(0.19+0.19) +4.4%    0.45(0.18+0.19) +0.0%    0.45(0.18+0.18) +0.0%    0.45(0.18+0.18) +0.0% 
2000.32: git reset -- does-not-exist (sparse-v3)   1.01(0.70+0.21)   0.91(0.62+0.20) -9.9%    0.93(0.64+0.20) -7.9%    0.89(0.61+0.20) -11.9%   0.48(0.23+0.46) -52.5%   0.27(0.03+0.49) -73.3%
2000.33: git reset -- does-not-exist (sparse-v4)   0.99(0.67+0.21)   1.02(0.70+0.22) +3.0%    1.04(0.70+0.22) +5.1%    0.83(0.55+0.19) -16.2%   0.48(0.24+0.48) -51.5%   0.27(0.03+0.49) -72.7%

Note that some commits in this series are not included because they don't
touch any code used by `git reset`.

[1] https://lore.kernel.org/git/pull.1051.git.1633386543759.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-05 19:30     ` Junio C Hamano
@ 2021-10-05 21:59       ` Victoria Dye
  2021-10-06 12:44         ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye @ 2021-10-05 21:59 UTC (permalink / raw)
  To: Junio C Hamano, Kevin Willford via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Bagas Sanjaya, Kevin Willford

Junio C Hamano wrote:
> Wouldn't a fix to the situation be to 
> 
>  * Add the blob for "skip" taken from the initial commit to the
>    index, just like the entry for "no-skip" is updated;
> 
>  * But remember that "skip" was marked with "skip-worktree" bit
>    immediately before "git reset" was asked to do its thing, and
>    re-add the bit to the path in the index before "git reset" gives
>    the control back to the usre;
> 
>  * And keep the working tree untouched, without writing anything out
>    to "skip".  If the user had a (possibly unrelated) file there, it
>    will not be overwritten, and if the user left the path absent, it
>    will still be absent.
> 
> so that the last three diagnostic commands in the above sample
> sequence would instead read:
> 
>     $ ls *skip
>     no-skip
>     $ git ls-files -t
>     M no-skip
>     S skip
>     $ git status -suno
>      M no-skip
> 
> i.e. skip gets updated in the index only, nothing changes in the
> working tree for "skip" or "no-skip", and status reports that
> "no-skip" is different from the index but "skip" hasn't changed in
> the working tree since the index (thanks to its skip-worktree bit).
> 
> Then the user will be happy in the same way as the user was happy
> immediately after the state marked with "There is no 'reset' done
> yet so far." above, on both counts, not just for "status does not
> report something got changed" part but also "user didn't want to see
> 'skip' in the working tree, and 'skip' did not materialize" part.
> 
> Thanks.
> 

Thanks for the thorough explanation, I'm on-board with your approach (and
will re-roll the series with that implemented). A lot of my thought process
(and confusion) came from a comment in e5ca291076 (t1092: document bad
sparse-checkout behavior, 2021-07-14) suggesting that full and sparse
checkouts should have the same result in scenarios like the one you
outlined above. The problem is, as noted earlier, it's impossible to tell
whether (using your example):

1. the user deleted `skip` because they intentionally want to remove it from
   the worktree, and it should continue to be deleted after a reset.
2. `skip` doesn't exist in the worktree because it's excluded from the
   sparse checkout definition and the user does not want its current state
   "deleted" after a reset.

As a result, there's no way `git reset --mixed` could be expected to behave
the same way in full checkouts as it does in sparse, and the most consistent
solution is that the worktree should remain untouched with `skip-worktree`
preserved.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
  2021-10-05 19:30     ` Junio C Hamano
@ 2021-10-06  1:46     ` Elijah Newren
  2021-10-06 20:09       ` Victoria Dye
  2021-10-06 10:31     ` Bagas Sanjaya
  2 siblings, 1 reply; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  1:46 UTC (permalink / raw)
  To: Kevin Willford via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye, Kevin Willford

Hi!

It appears Junio has already commented on this patch and in more
detail, but since I already typed up some comments I'll send them
along in case they are useful.

On Tue, Oct 5, 2021 at 6:20 AM Kevin Willford via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Kevin Willford <kewillf@microsoft.com>
>
> When using the sparse checkout feature, 'git reset' will add entries to
> the index that will have the skip-worktree bit off but will leave the
> working directory empty.

Yes, that seems like a problem.

> File data is lost because the index version of
> the files has been changed but there is nothing that is in the working
> directory. This will cause the next 'git status' call to show either
> deleted for files modified or deleting or nothing for files added. The
> added files should be shown as untracked and modified files should be
> shown as modified.

Why is the solution to add the files to the working tree rather than
to make sure the files have the skip-worktree bit set?  That's not at
all what I would have expected.

> To fix this when the reset is running if there is not a file in the
> working directory and if it will be missing with the new index entry or
> was not missing in the previous version, we create the previous index
> version of the file in the working directory so that status will report
> correctly and the files will be availble for the user to deal with.

s/availble/available/

>
> This fixes a documented failure from t1092 that was created in 19a0acc
> (t1092: test interesting sparse-checkout scenarios, 2021-01-23).
>
> Signed-off-by: Kevin Willford <kewillf@microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/reset.c                          | 24 ++++++++--
>  t/t1092-sparse-checkout-compatibility.sh |  4 +-
>  t/t7114-reset-sparse-checkout.sh         | 61 ++++++++++++++++++++++++
>  3 files changed, 83 insertions(+), 6 deletions(-)
>  create mode 100755 t/t7114-reset-sparse-checkout.sh
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 51c9e2f43ff..3b75d3b2f20 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -25,6 +25,8 @@
>  #include "cache-tree.h"
>  #include "submodule.h"
>  #include "submodule-config.h"
> +#include "dir.h"
> +#include "entry.h"
>
>  #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
>
> @@ -130,11 +132,27 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>         int intent_to_add = *(int *)data;
>
>         for (i = 0; i < q->nr; i++) {
> +               int pos;
>                 struct diff_filespec *one = q->queue[i]->one;
> -               int is_missing = !(one->mode && !is_null_oid(&one->oid));
> +               struct diff_filespec *two = q->queue[i]->two;
> +               int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);

Isn't !is_null_oid(&one->oid) redundant to checking one->mode?  When
does the diff machinery ever give you a non-zero mode with a null oid?

Also, is_in_reset_tree == !is_missing; I'll note that below.

>                 struct cache_entry *ce;
>
> +               /*
> +                * If the file being reset has `skip-worktree` enabled, we need
> +                * to check it out to prevent the file from being hard reset.

I don't understand this comment.  If the file wasn't originally in the
index (is_missing), and is being added to it, and is correctly marked
as skip_worktree, and the file isn't in the working tree, then it
sounds like everything is already in a good state.  Files outside the
sparse checkout are meant to have the skip_worktree bit set and be
missing from the working tree.

Also, I don't know what you mean by 'hard reset' here.

> +                */
> +               pos = cache_name_pos(two->path, strlen(two->path));
> +               if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
> +                       struct checkout state = CHECKOUT_INIT;
> +                       state.force = 1;
> +                       state.refresh_cache = 1;
> +                       state.istate = &the_index;
> +
> +                       checkout_entry(active_cache[pos], &state, NULL, NULL);

Does this introduce an error in the opposite direction from the one
stated in the commit message?  Namely we have two things that should
be in sync: the skip_worktree flag stating whether the file should be
present in the working directory (skip_worktree), and the question of
whether the file is actually in the working directory.  In the commit
message, you pointed out a case where the y were out of sync one way:
the skip_worktree flag was not set but the file was missing.  Here you
say the skip_worktree flag is set, but you add it to the working tree
anyway.

Or am I misunderstanding the code?

> +               }
> +

[I did some slight editing to the diff to make the next two parts
appear next to each other]

> -               if (is_missing && !intent_to_add) {
> +               if (!is_in_reset_tree && !intent_to_add) {

I thought this was some subtle bugfix or something, and spent a while
trying to figure it out, before realizing that is_in_reset_tree was
simply defined as !is_missing (for some reason I was assuming it was
dealing with two->mode while is_missing was looking at one->mode).  So
this is a simple variable renaming, which I think is probably good,
but I'd prefer if this was separated into a different patch to make it
easier to review.

>                         remove_file_from_cache(one->path);
>                         continue;
>                 }
> @@ -144,7 +162,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>                 if (!ce)
>                         die(_("make_cache_entry failed for path '%s'"),
>                             one->path);
> -               if (is_missing) {
> +               if (!is_in_reset_tree) {

same note as above; the variable rename is good, but should be a separate patch.

>                         ce->ce_flags |= CE_INTENT_TO_ADD;
>                         set_object_name_for_intent_to_add_entry(ce);
>                 }
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 886e78715fe..c5977152661 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
>         test_all_match git blame deep/deeper2/deepest/a
>  '
>
> -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> -# in this scenario, but it shouldn't.
> -test_expect_failure 'checkout and reset (mixed)' '
> +test_expect_success 'checkout and reset (mixed)' '
>         init_repos &&
>
>         test_all_match git checkout -b reset-test update-deep &&
> diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh
> new file mode 100755
> index 00000000000..a8029707fb1
> --- /dev/null
> +++ b/t/t7114-reset-sparse-checkout.sh
> @@ -0,0 +1,61 @@
> +#!/bin/sh
> +
> +test_description='reset when using a sparse-checkout'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +       test_tick &&
> +       echo "checkout file" >c &&
> +       echo "modify file" >m &&
> +       echo "delete file" >d &&
> +       git add . &&
> +       git commit -m "initial commit" &&
> +       echo "added file" >a &&
> +       echo "modification of a file" >m &&
> +       git rm d &&
> +       git add . &&
> +       git commit -m "second commit" &&
> +       git checkout -b endCommit
> +'
> +
> +test_expect_success 'reset when there is a sparse-checkout' '
> +       echo "/c" >.git/info/sparse-checkout &&
> +       test_config core.sparsecheckout true &&
> +       git checkout -B resetBranch &&
> +       test_path_is_missing m &&
> +       test_path_is_missing a &&
> +       test_path_is_missing d &&
> +       git reset HEAD~1 &&
> +       echo "checkout file" >expect &&
> +       test_cmp expect c &&
> +       echo "added file" >expect &&
> +       test_cmp expect a &&
> +       echo "modification of a file" >expect &&
> +       test_cmp expect m &&
> +       test_path_is_missing d
> +'
> +
> +test_expect_success 'reset after deleting file without skip-worktree bit' '
> +       git checkout -f endCommit &&
> +       git clean -xdf &&
> +       cat >.git/info/sparse-checkout <<-\EOF &&
> +       /c
> +       /m
> +       EOF
> +       test_config core.sparsecheckout true &&
> +       git checkout -B resetAfterDelete &&
> +       test_path_is_file m &&
> +       test_path_is_missing a &&
> +       test_path_is_missing d &&
> +       rm -f m &&
> +       git reset HEAD~1 &&
> +       echo "checkout file" >expect &&
> +       test_cmp expect c &&
> +       echo "added file" >expect &&
> +       test_cmp expect a &&
> +       test_path_is_missing m &&
> +       test_path_is_missing d
> +'
> +
> +test_done
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test
  2021-10-05 13:20   ` [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-10-06  2:00     ` Elijah Newren
  2021-10-06 20:40       ` Victoria Dye
  2021-10-06 10:33     ` Bagas Sanjaya
  1 sibling, 1 reply; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  2:00 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye

On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Add a new `--force-full-index` option to `git update-index`, which skips
> explicitly setting `command_requires_full_index`. This lets `git
> update-index --force-full-index` run as a command without sparse index
> compatibility implemented, even after it receives sparse index compatibility
> updates.
>
> By using `git update-index --force-full-index` in the `t1092` test
> `sparse-index is expanded and converted back`, commands can continue to
> integrate with the sparse index without the need to keep modifying the
> command used in the test.

So...we're adding a permanent user-facing command line flag, whose
purpose is just to help us with the transition work of implementing
sparse indexes everywhere?  Am I reading that right, or is that just
the reason for t1092 and there are more reasons for it elsewhere?

Also, I'm curious if update-index is the right place to add this.  If
you don't want a sparse index anymore, wouldn't a user want to run
   git sparse-checkout disable
?  Or is the point that you do want to keep the sparse checkout, but
you just don't want the index to also be sparse?  Still, even in that
case, it seems like adding a subcommand or flag to an existing
sparse-checkout subcommand would feel more natural, since
sparse-checkout is the command the user uses to request to get into a
sparse-checkout and sparse index.


> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  Documentation/git-update-index.txt       |  5 +++++
>  builtin/update-index.c                   | 11 +++++++++++
>  t/t1092-sparse-checkout-compatibility.sh |  2 +-
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
> index 2853f168d97..06255e321a3 100644
> --- a/Documentation/git-update-index.txt
> +++ b/Documentation/git-update-index.txt
> @@ -24,6 +24,7 @@ SYNOPSIS
>              [--[no-]fsmonitor]
>              [--really-refresh] [--unresolve] [--again | -g]
>              [--info-only] [--index-info]
> +            [--force-full-index]
>              [-z] [--stdin] [--index-version <n>]
>              [--verbose]
>              [--] [<file>...]
> @@ -170,6 +171,10 @@ time. Version 4 is relatively young (first released in 1.8.0 in
>  October 2012). Other Git implementations such as JGit and libgit2
>  may not support it yet.
>
> +--force-full-index::
> +       Force the command to operate on a full index, expanding a sparse
> +       index if necessary.
> +
>  -z::
>         Only meaningful with `--stdin` or `--index-info`; paths are
>         separated with NUL character instead of LF.
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index 187203e8bb5..32ada3ead77 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -964,6 +964,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>         int split_index = -1;
>         int force_write = 0;
>         int fsmonitor = -1;
> +       int use_default_full_index = 0;
>         struct lock_file lock_file = LOCK_INIT;
>         struct parse_opt_ctx_t ctx;
>         strbuf_getline_fn getline_fn;
> @@ -1069,6 +1070,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>                 {OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL,
>                         N_("clear fsmonitor valid bit"),
>                         PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
> +               OPT_SET_INT(0, "force-full-index", &use_default_full_index,
> +                       N_("run with full index explicitly required"), 1),
>                 OPT_END()
>         };
>
> @@ -1082,6 +1085,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>         if (newfd < 0)
>                 lock_error = errno;
>
> +       /*
> +        * If --force-full-index is set, the command should skip manually
> +        * setting `command_requires_full_index`.
> +        */
> +       prepare_repo_settings(r);
> +       if (!use_default_full_index)
> +               r->settings.command_requires_full_index = 1;
> +
>         entries = read_cache();
>         if (entries < 0)
>                 die("cache corrupted");
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c5977152661..b3c0d3b98ee 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
>         init_repos &&
>
>         GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -               git -C sparse-index -c core.fsmonitor="" reset --hard &&
> +               git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
>         test_region index convert_to_sparse trace2.txt &&
>         test_region index ensure_full_index trace2.txt
>  '
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 3/7] reset: expand test coverage for sparse checkouts
  2021-10-05 13:20   ` [PATCH v2 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
@ 2021-10-06  2:04     ` Elijah Newren
  0 siblings, 0 replies; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  2:04 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye

On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
> pathspecs both inside and outside of the sparse checkout definition. New
> performance test cases exercise various execution paths for `reset`.
>
> Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  t/perf/p2000-sparse-operations.sh        |   3 +
>  t/t1092-sparse-checkout-compatibility.sh | 107 +++++++++++++++++++++++
>  2 files changed, 110 insertions(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 597626276fb..bfd332120c8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -110,5 +110,8 @@ test_perf_on_all git add -A
>  test_perf_on_all git add .
>  test_perf_on_all git commit -a -m A
>  test_perf_on_all git checkout -f -
> +test_perf_on_all git reset
> +test_perf_on_all git reset --hard
> +test_perf_on_all git reset -- does-not-exist
>
>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index b3c0d3b98ee..f0723a6ac97 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -479,6 +479,113 @@ test_expect_success 'checkout and reset (mixed) [sparse]' '
>         test_sparse_match git reset update-folder2
>  '
>
> +# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit>
> +# will be added to the work tree even if outside the sparse checkout
> +# definition, and even if the file is modified to a state of having no local
> +# changes. The file is "re-ignored" if a hard reset is executed. We may want to
> +# change this behavior in the future and enforce that files are not written
> +# outside of the sparse checkout definition.

Yeah, I think this comment highlights some of the reasons that writing
the file to the working directory for those files isn't the way I'd
prefer to resolve the inconsistency between the skip-worktree bit and
the presence of the file in the working directory.

> +test_expect_success 'checkout and mixed reset file tracking [sparse]' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       test_all_match git reset update-folder1 &&
> +       test_all_match git reset update-deep &&
> +
> +       # At this point, there are no changes in the working tree. However,
> +       # folder1/a now exists locally (even though it is outside of the sparse
> +       # paths).
> +       run_on_sparse test_path_exists folder1 &&
> +
> +       run_on_all rm folder1/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset --hard update-deep &&
> +       run_on_sparse test_path_is_missing folder1 &&
> +       test_path_exists full-checkout/folder1
> +'
> +
> +test_expect_success 'checkout and reset (merge)' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       run_on_all ../edit-contents a &&
> +       test_all_match git reset --merge deepest &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset --hard update-deep &&
> +       run_on_all ../edit-contents deep/a &&
> +       test_all_match test_must_fail git reset --merge deepest
> +'
> +
> +test_expect_success 'checkout and reset (keep)' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       run_on_all ../edit-contents a &&
> +       test_all_match git reset --keep deepest &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset --hard update-deep &&
> +       run_on_all ../edit-contents deep/a &&
> +       test_all_match test_must_fail git reset --keep deepest
> +'
> +
> +test_expect_success 'reset with pathspecs inside sparse definition' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       run_on_all ../edit-contents deep/a &&
> +
> +       test_all_match git reset base -- deep/a &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset base -- nonexistent-file &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       test_all_match git reset deepest -- deep &&
> +       test_all_match git status --porcelain=v2
> +'
> +
> +test_expect_success 'reset with sparse directory pathspec outside definition' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       test_all_match git reset --hard update-folder1 &&
> +       test_all_match git reset base -- folder1 &&
> +       test_all_match git status --porcelain=v2
> +'
> +
> +test_expect_success 'reset with pathspec match in sparse directory' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       test_all_match git reset --hard update-folder1 &&
> +       test_all_match git reset base -- folder1/a &&
> +       test_all_match git status --porcelain=v2
> +'
> +
> +test_expect_success 'reset with wildcard pathspec' '
> +       init_repos &&
> +
> +       test_all_match git checkout -b reset-test update-deep &&
> +       test_all_match git reset --hard update-folder1 &&
> +       test_all_match git reset base -- \*/a &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  test_expect_success 'merge, cherry-pick, and rebase' '
>         init_repos &&
>
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/7] reset: integrate with sparse index
  2021-10-05 13:20   ` [PATCH v2 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
@ 2021-10-06  2:15     ` Elijah Newren
  2021-10-06 17:48       ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  2:15 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye

On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> `reset --soft` does not modify the index, so no compatibility changes are
> needed for it to function without expanding the index. For all other reset
> modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is
> explicitly expanded with `ensure_full_index` to maintain current behavior.

"to maintain current behavior"?  You are changing code here, which
suggests some kind of behavior is changing, but that description seems
to be claiming the opposite.  Is it some kind of preventative change
to add ensure_full_index calls in an additional place, with a later
patch in the series intending to remove the other one(s), so you're
making sure that later changes won't cause unwanted behavioral
changes?  Or was something else meant here?

If the above wasn't what you meant, but you're adding
ensure_full_index calls, does that suggest that we had some important
code paths that were not protected by such calls?  I thought Stolee
said we had them all covered (at least to the best of our knowledge),
so I'm curious if we just discovered we missed some.  If so, are there
other codepaths like this one where we missed protective
ensure_full_index calls?

> Additionally, the `read_cache()` check verifying an uncorrupted index is
> moved after argument parsing and preparing the repo settings. The index is
> not used by the preceding argument handling, but `read_cache()` does need to
> be run after enabling sparse index for the command and before resetting.

This seems to be discussing what code changes are being made, but not
why.  I'm guessing at the reasoning, but is it something along the
lines of:

"""
Also, make sure to read_cache() after setting
command_requires_full_index = 0, so that we don't unnecessarily expand
the index as part of our early index-corruption check.
"""

?

>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/reset.c | 10 +++++++---
>  cache-tree.c    |  1 +
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 3b75d3b2f20..e1f2a2bb2c4 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -184,6 +184,7 @@ static int read_from_tree(const struct pathspec *pathspec,
>         opt.flags.override_submodule_config = 1;
>         opt.repo = the_repository;
>
> +       ensure_full_index(&the_index);
>         if (do_diff_cache(tree_oid, &opt))
>                 return 1;
>         diffcore_std(&opt);
> @@ -261,9 +262,6 @@ static void parse_args(struct pathspec *pathspec,
>         }
>         *rev_ret = rev;
>
> -       if (read_cache() < 0)
> -               die(_("index file corrupt"));
> -
>         parse_pathspec(pathspec, 0,
>                        PATHSPEC_PREFER_FULL |
>                        (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0),
> @@ -409,6 +407,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
>         if (intent_to_add && reset_type != MIXED)
>                 die(_("-N can only be used with --mixed"));
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
> +       if (read_cache() < 0)
> +               die(_("index file corrupt"));
> +
>         /* Soft reset does not touch the index file nor the working tree
>          * at all, but requires them in a good order.  Other resets reset
>          * the index file to the tree object we are switching to. */
> diff --git a/cache-tree.c b/cache-tree.c
> index 90919f9e345..9be19c85b66 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r,
>         cache_tree_free(&istate->cache_tree);
>         istate->cache_tree = cache_tree();
>
> +       ensure_full_index(istate);
>         prime_cache_tree_rec(r, istate->cache_tree, tree);
>         istate->cache_changed |= CACHE_TREE_CHANGED;
>         trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 5/7] reset: make sparse-aware (except --mixed)
  2021-10-05 13:20   ` [PATCH v2 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
@ 2021-10-06  3:43     ` Elijah Newren
  2021-10-06 20:56       ` Victoria Dye
  2021-10-06 10:34     ` Bagas Sanjaya
  1 sibling, 1 reply; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  3:43 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye

On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`,
> the function must determine whether the currently-processing directory in
> the tree is sparse or not. If it is not sparse, the tree is parsed and
> subtree recursively constructed. If it is sparse, no subtrees are added to
> the tree and the entry count is set to 1 (representing the sparse directory
> itself).
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  cache-tree.c                             | 44 +++++++++++++++++++++---
>  cache.h                                  | 10 ++++++
>  read-cache.c                             | 22 ++++++++----
>  t/t1092-sparse-checkout-compatibility.sh | 15 ++++++--
>  4 files changed, 78 insertions(+), 13 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 9be19c85b66..9021669d682 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -740,15 +740,29 @@ out:
>         return ret;
>  }
>
> +static void prime_cache_tree_sparse_dir(struct repository *r,
> +                                       struct cache_tree *it,
> +                                       struct tree *tree,
> +                                       struct strbuf *tree_path)
> +{
> +
> +       oidcpy(&it->oid, &tree->object.oid);
> +       it->entry_count = 1;
> +       return;

Why are 'r' and 'tree_path' passed to this function?

> +}
> +
>  static void prime_cache_tree_rec(struct repository *r,
>                                  struct cache_tree *it,
> -                                struct tree *tree)
> +                                struct tree *tree,
> +                                struct strbuf *tree_path)
>  {
> +       struct strbuf subtree_path = STRBUF_INIT;
>         struct tree_desc desc;
>         struct name_entry entry;
>         int cnt;
>
>         oidcpy(&it->oid, &tree->object.oid);
> +

Why the blank line addition here?

>         init_tree_desc(&desc, tree->buffer, tree->size);
>         cnt = 0;
>         while (tree_entry(&desc, &entry)) {
> @@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r,
>                 else {
>                         struct cache_tree_sub *sub;
>                         struct tree *subtree = lookup_tree(r, &entry.oid);
> +
>                         if (!subtree->object.parsed)
>                                 parse_tree(subtree);
>                         sub = cache_tree_sub(it, entry.path);
>                         sub->cache_tree = cache_tree();
> -                       prime_cache_tree_rec(r, sub->cache_tree, subtree);

> +                       strbuf_reset(&subtree_path);
> +                       strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1);
> +                       strbuf_addbuf(&subtree_path, tree_path);
> +                       strbuf_add(&subtree_path, entry.path, entry.pathlen);
> +                       strbuf_addch(&subtree_path, '/');

Reconstructing the full path each time?  And despite only being useful
for the sparse-index case?

Would it be better to drop subtree_path from this function, then
append entry.path + '/' here to tree_path, and then after the if-block
below, call strbuf_setlen to remove the part that this function call
added?  That way, we don't need subtree_path, and don't have to copy
the leading path every time.

Also, maybe it'd be better to only do this strbuf manipulation if
r->index->sparse_index, since it's not ever used otherwise?

> +
> +                       /*
> +                        * If a sparse index is in use, the directory being processed may be
> +                        * sparse. To confirm that, we can check whether an entry with that
> +                        * exact name exists in the index. If it does, the created subtree
> +                        * should be sparse. Otherwise, cache tree expansion should continue
> +                        * as normal.
> +                        */
> +                       if (r->index->sparse_index &&
> +                           index_entry_exists(r->index, subtree_path.buf, subtree_path.len))
> +                               prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path);
> +                       else
> +                               prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path);
>                         cnt += sub->cache_tree->entry_count;
>                 }
>         }
>         it->entry_count = cnt;
> +
> +       strbuf_release(&subtree_path);
>  }
>
>  void prime_cache_tree(struct repository *r,
>                       struct index_state *istate,
>                       struct tree *tree)
>  {
> +       struct strbuf tree_path = STRBUF_INIT;
> +
>         trace2_region_enter("cache-tree", "prime_cache_tree", the_repository);
>         cache_tree_free(&istate->cache_tree);
>         istate->cache_tree = cache_tree();
>
> -       ensure_full_index(istate);
> -       prime_cache_tree_rec(r, istate->cache_tree, tree);
> +       prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path);
> +       strbuf_release(&tree_path);
>         istate->cache_changed |= CACHE_TREE_CHANGED;
>         trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
>  }
> diff --git a/cache.h b/cache.h
> index f6295f3b048..1d3e4665562 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
>   */
>  int index_name_pos(struct index_state *, const char *name, int namelen);
>
> +/*
> + * Determines whether an entry with the given name exists within the
> + * given index. The return value is 1 if an exact match is found, otherwise
> + * it is 0. Note that, unlike index_name_pos, this function does not expand
> + * the index if it is sparse. If an item exists within the full index but it
> + * is contained within a sparse directory (and not in the sparse index), 0 is
> + * returned.
> + */
> +int index_entry_exists(struct index_state *, const char *name, int namelen);
> +
>  /*
>   * Some functions return the negative complement of an insert position when a
>   * precise match was not found but a position was found where the entry would
> diff --git a/read-cache.c b/read-cache.c
> index f5d4385c408..ea1166895f8 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -551,7 +551,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
>         return 0;
>  }
>
> -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
> +static int index_name_stage_pos(struct index_state *istate,
> +                               const char *name, int namelen,
> +                               int stage,
> +                               int search_sparse)

It'd be nicer to make search_sparse an enum defined within this file, so that...

>  {
>         int first, last;
>
> @@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>                 first = next+1;
>         }
>
> -       if (istate->sparse_index &&
> +       if (search_sparse && istate->sparse_index &&
>             first > 0) {
>                 /* Note: first <= istate->cache_nr */
>                 struct cache_entry *ce = istate->cache[first - 1];
> @@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>                     ce_namelen(ce) < namelen &&
>                     !strncmp(name, ce->name, ce_namelen(ce))) {
>                         ensure_full_index(istate);
> -                       return index_name_stage_pos(istate, name, namelen, stage);
> +                       return index_name_stage_pos(istate, name, namelen, stage, search_sparse);
>                 }
>         }
>
> @@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>
>  int index_name_pos(struct index_state *istate, const char *name, int namelen)
>  {
> -       return index_name_stage_pos(istate, name, namelen, 0);
> +       return index_name_stage_pos(istate, name, namelen, 0, 1);

...this could use SEARCH_SPARSE or some name like that which is more
meaningful than "1" here.

> +}
> +
> +int index_entry_exists(struct index_state *istate, const char *name, int namelen)
> +{
> +       return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0;

...and likewise this spot could use SEARCH_FULL or some name like
that, which is more meaningful than the second "0".

Similarly for multiple call sites below...


>  }
>
>  int remove_index_entry_at(struct index_state *istate, int pos)
> @@ -1222,7 +1230,7 @@ static int has_dir_name(struct index_state *istate,
>                          */
>                 }
>
> -               pos = index_name_stage_pos(istate, name, len, stage);
> +               pos = index_name_stage_pos(istate, name, len, stage, 1);
>                 if (pos >= 0) {
>                         /*
>                          * Found one, but not so fast.  This could
> @@ -1322,7 +1330,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>                 strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
>                 pos = index_pos_to_insert_pos(istate->cache_nr);
>         else
> -               pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> +               pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
>
>         /* existing match? Just replace it. */
>         if (pos >= 0) {
> @@ -1357,7 +1365,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>                 if (!ok_to_replace)
>                         return error(_("'%s' appears as both a file and as a directory"),
>                                      ce->name);
> -               pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> +               pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
>                 pos = -pos-1;
>         }
>         return pos + 1;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f0723a6ac97..e301ef5633a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -786,9 +786,9 @@ test_expect_success 'sparse-index is not expanded' '
>         ensure_not_expanded checkout - &&
>         ensure_not_expanded switch rename-out-to-out &&
>         ensure_not_expanded switch - &&
> -       git -C sparse-index reset --hard &&
> +       ensure_not_expanded reset --hard &&
>         ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
> -       git -C sparse-index reset --hard &&
> +       ensure_not_expanded reset --hard &&
>         ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
>
>         echo >>sparse-index/README.md &&
> @@ -798,6 +798,17 @@ test_expect_success 'sparse-index is not expanded' '
>         echo >>sparse-index/untracked.txt &&
>         ensure_not_expanded add . &&
>
> +       for ref in update-deep update-folder1 update-folder2 update-deep
> +       do
> +               echo >>sparse-index/README.md &&
> +               ensure_not_expanded reset --hard $ref || return 1
> +       done &&
> +
> +       ensure_not_expanded reset --hard update-deep &&
> +       ensure_not_expanded reset --keep base &&
> +       ensure_not_expanded reset --merge update-deep &&
> +       ensure_not_expanded reset --hard &&
> +
>         ensure_not_expanded checkout -f update-deep &&
>         test_config -C sparse-index pull.twohead ort &&
>         (
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 6/7] reset: make --mixed sparse-aware
  2021-10-05 13:20   ` [PATCH v2 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
@ 2021-10-06  4:43     ` Elijah Newren
  2021-10-07 14:34       ` Victoria Dye
  0 siblings, 1 reply; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  4:43 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye

On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Sparse directory entries are "diffed" as trees in `diff_cache` (used
> internally by `reset --mixed`), following a code path separate from
> individual file handling. The use of `diff_tree_oid` there requires setting
> explicit `change` and `add_remove` functions to process the internal
> contents of a sparse directory.
>
> Additionally, the `recursive` diff option handles cases in which `reset
> --mixed` must diff/merge files that are nested multiple levels deep in a
> sparse directory.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  builtin/reset.c                          | 30 +++++++++++++++++++++++-
>  t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++-
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index e1f2a2bb2c4..ceb9b122897 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec,
>                           int intent_to_add)
>  {
>         struct diff_options opt;
> +       unsigned int i;
> +       char *skip_worktree_seen = NULL;
>
>         memset(&opt, 0, sizeof(opt));
>         copy_pathspec(&opt.pathspec, pathspec);
> @@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec,
>         opt.format_callback = update_index_from_diff;
>         opt.format_callback_data = &intent_to_add;
>         opt.flags.override_submodule_config = 1;
> +       opt.flags.recursive = 1;
>         opt.repo = the_repository;
> +       opt.change = diff_change;
> +       opt.add_remove = diff_addremove;
> +
> +       /*
> +        * When pathspec is given for resetting a cone-mode sparse checkout, it may
> +        * identify entries that are nested in sparse directories, in which case the
> +        * index should be expanded. For the sake of efficiency, this check is
> +        * overly-cautious: anything with a wildcard or a magic prefix requires
> +        * expansion, as well as literal paths that aren't in the sparse checkout
> +        * definition AND don't match any directory in the index.

s/efficiency/efficiency of checking/ ?  Being overly-cautious suggests
you'll expand to a full index more than is needed, and full indexes
are more expensive.  But perhaps the checking would be expensive too
so you have a tradeoff?

Or maybe s/efficiency/simplicity/?

> +        */
> +       if (pathspec->nr && the_index.sparse_index) {
> +               if (pathspec->magic || pathspec->has_wildcard) {
> +                       ensure_full_index(&the_index);

dir.c has the notion of matching the characters preceding the wildcard
characters; look for "no_wildcard_len".  If the pathspec doesn't match
a path up to no_wildcard_len, then the wildcard character(s) later in
the pathspec can't make the pathspec match that path.

It might at least be worth mentioning this as a possible future optimization.

> +               } else {
> +                       for (i = 0; i < pathspec->nr; i++) {
> +                               if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
> +                                   !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {

What if the pathspec corresponds to a sparse-directory in the index,
but possibly without the trailing '/' character?  e.g.:

   git reset HEAD~1 -- sparse-directory

One should be able to reset that directory without recursing into
it...does this code handle that?  Does it handle it if we add the
trailing slash on the path for the reset command line?

> +                                       ensure_full_index(&the_index);
> +                                       break;
> +                               }
> +                       }
> +               }
> +       }
> +
> +       free(skip_worktree_seen);
>
> -       ensure_full_index(&the_index);
>         if (do_diff_cache(tree_oid, &opt))
>                 return 1;
>         diffcore_std(&opt);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e301ef5633a..4afcbc2d673 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' '
>                 ensure_not_expanded reset --hard $ref || return 1
>         done &&
>
> +       ensure_not_expanded reset --mixed base &&
>         ensure_not_expanded reset --hard update-deep &&
>         ensure_not_expanded reset --keep base &&
>         ensure_not_expanded reset --merge update-deep &&
> -       ensure_not_expanded reset --hard &&

This commit was only touching the --mixed case; why is it removing one
of the tests for --hard?

>
> +       ensure_not_expanded reset base -- deep/a &&
> +       ensure_not_expanded reset base -- nonexistent-file &&
> +       ensure_not_expanded reset deepest -- deep &&
> +
> +       # Although folder1 is outside the sparse definition, it exists as a
> +       # directory entry in the index, so it will be reset without needing to
> +       # expand the full index.

Ah, I think this answers one of my earlier questions.  Does it also
work with 'folder1/' as well as 'folder1'?

> +       ensure_not_expanded reset --hard update-folder1 &&

Wait...is update-folder1 a branch or a path?  And if this commit is
about --mixed, why are --hard testcases being added?

> +       ensure_not_expanded reset base -- folder1 &&
> +
> +       ensure_not_expanded reset --hard update-deep &&

another --hard testcase...was this an accidental squash by chance?



>         ensure_not_expanded checkout -f update-deep &&
>         test_config -C sparse-index pull.twohead ort &&
>         (
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/7] Sparse Index: integrate with reset
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-10-05 15:34   ` [PATCH v2 0/7] Sparse Index: integrate with reset Ævar Arnfjörð Bjarmason
@ 2021-10-06  5:46   ` Elijah Newren
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
  9 siblings, 0 replies; 85+ messages in thread
From: Elijah Newren @ 2021-10-06  5:46 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Victoria Dye

Hi Victoria,

On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series integrates the sparse index with git reset and provides
> miscellaneous fixes and improvements to the command in sparse checkouts.
> This includes:
>
>  1. tests added to t1092 and p2000 to establish the baseline functionality
>     of the command
>  2. repository settings to enable the sparse index with ensure_full_index
>     guarding any code paths that break tests without other compatibility
>     updates.
>  3. modifications to remove or reduce the scope in which ensure_full_index
>     must be called.
>
> The sparse index updates are predicated on a fix originating from the
> microsoft/git fork [1], correcting how git reset --mixed handles resetting
> entries outside the sparse checkout definition. Additionally, a performance
> "bug" in next_cache_entry with sparse index is corrected, preventing
> repeatedly looping over already-searched entries.
>
> The p2000 tests demonstrate an overall ~70% execution time reduction across
> all tested usages of git reset using a sparse index:
>
> Test                                               before   after
> ------------------------------------------------------------------------
> 2000.22: git reset (full-v3)                       0.48     0.51 +6.3%
> 2000.23: git reset (full-v4)                       0.47     0.50 +6.4%
> 2000.24: git reset (sparse-v3)                     0.93     0.30 -67.7%
> 2000.25: git reset (sparse-v4)                     0.94     0.29 -69.1%
> 2000.26: git reset --hard (full-v3)                0.69     0.68 -1.4%
> 2000.27: git reset --hard (full-v4)                0.75     0.68 -9.3%
> 2000.28: git reset --hard (sparse-v3)              1.29     0.34 -73.6%
> 2000.29: git reset --hard (sparse-v4)              1.31     0.34 -74.0%
> 2000.30: git reset -- does-not-exist (full-v3)     0.54     0.51 -5.6%
> 2000.31: git reset -- does-not-exist (full-v4)     0.54     0.52 -3.7%
> 2000.32: git reset -- does-not-exist (sparse-v3)   1.02     0.31 -69.6%
> 2000.33: git reset -- does-not-exist (sparse-v4)   1.07     0.30 -72.0%
>
>
>
> Changes since V1
> ================
>
>  * Add --force-full-index option to update-index. The option is used
>    circumvent changing command_requires_full_index from its default value -
>    right now this is effectively a no-op, but will change once update-index
>    is integrated with sparse index. By using this option in the t1092
>    expand/collapse test, the command used to test will not need to be
>    updated with subsequent sparse index integrations.
>  * Update implementation of mixed reset for entries outside sparse checkout
>    definition. The condition in which a file should be checked out before
>    index reset is simplified to "if it has skip-worktree enabled and a reset
>    would change the file, check it out".
>    * After checking the behavior of update_index_from_diff with renames,
>      found that the diff used by reset does not produce diff queue entries
>      with different pathnames for one and two. Because of this, and that
>      nothing in the implementation seems to rely on identical path names, no
>      BUG check is added.
>  * Correct a bug in the sparse index is not expanded tests in t1092 where
>    failure of a git reset --mixed test was not being reported. Test now
>    verifies an appropriate scenario with corrected failure-checking.

I read over the first six patches.  I tried to read over the seventh,
but I've never figured out cache_bottom for some reason and I did
nothing beyond spot checking when Stolee touched that area either.

Anyway, I had lots of little comments, tweaks to the way to fix the
inconsistency in patch 1, various questions, etc.  It probably adds up
to a lot, but it's all small fixable stuff; overall it looks like you
(and Kevin) are making a solid contribution on the sparse-checkout
stuff; I look forward to reading the next round.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
  2021-10-05 19:30     ` Junio C Hamano
  2021-10-06  1:46     ` Elijah Newren
@ 2021-10-06 10:31     ` Bagas Sanjaya
  2 siblings, 0 replies; 85+ messages in thread
From: Bagas Sanjaya @ 2021-10-06 10:31 UTC (permalink / raw)
  To: Kevin Willford via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Victoria Dye, Kevin Willford

On 05/10/21 20.20, Kevin Willford via GitGitGadget wrote:
> When using the sparse checkout feature, 'git reset' will add entries to
> the index that will have the skip-worktree bit off but will leave the
> working directory empty. File data is lost because the index version of
> the files has been changed but there is nothing that is in the working
> directory. This will cause the next 'git status' call to show either
> deleted for files modified or deleting or nothing for files added. The
> added files should be shown as untracked and modified files should be
> shown as modified.
> 

Better say `... but there is nothing in the working directory`.

> To fix this when the reset is running if there is not a file in the
> working directory and if it will be missing with the new index entry or
> was not missing in the previous version, we create the previous index
> version of the file in the working directory so that status will report
> correctly and the files will be availble for the user to deal with.
> 

s/availble/available

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test
  2021-10-05 13:20   ` [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
  2021-10-06  2:00     ` Elijah Newren
@ 2021-10-06 10:33     ` Bagas Sanjaya
  1 sibling, 0 replies; 85+ messages in thread
From: Bagas Sanjaya @ 2021-10-06 10:33 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Victoria Dye

On 05/10/21 20.20, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add a new `--force-full-index` option to `git update-index`, which skips
> explicitly setting `command_requires_full_index`. This lets `git
> update-index --force-full-index` run as a command without sparse index
> compatibility implemented, even after it receives sparse index compatibility
> updates.

`... explicitly setting ...` or `... explicitly set ...`? I thought of 
the latter.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 5/7] reset: make sparse-aware (except --mixed)
  2021-10-05 13:20   ` [PATCH v2 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
  2021-10-06  3:43     ` Elijah Newren
@ 2021-10-06 10:34     ` Bagas Sanjaya
  1 sibling, 0 replies; 85+ messages in thread
From: Bagas Sanjaya @ 2021-10-06 10:34 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Victoria Dye

On 05/10/21 20.20, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`,
> the function must determine whether the currently-processing directory in
> the tree is sparse or not. If it is not sparse, the tree is parsed and
> subtree recursively constructed. If it is sparse, no subtrees are added to
> the tree and the entry count is set to 1 (representing the sparse directory
> itself).
> 

Better say `If it is sparse, no subtrees ..., else the tree ...`

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry
  2021-10-05 13:20   ` [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
@ 2021-10-06 10:37     ` Bagas Sanjaya
  0 siblings, 0 replies; 85+ messages in thread
From: Bagas Sanjaya @ 2021-10-06 10:37 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Victoria Dye

On 05/10/21 20.20, Victoria Dye via GitGitGadget wrote:
> The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
> (unpack-trees: preserve cache_bottom, 2021-07-14)).  Therefore, to retain
> the benefit `cache_bottom` provides in non-sparse index cases, a separate
> `hint` position indicates the first position `next_cache_entry` should
> search, updated each execution with a new position.  The performance of `git
> reset -- does-not-exist` (testing the "worst case" in which all entries in
> the index are unpacked with `next_cache_entry`) is significantly improved
> for the sparse index case:

Did you mean `a separate `hint` ... should be searched`?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-05 21:59       ` Victoria Dye
@ 2021-10-06 12:44         ` Junio C Hamano
  0 siblings, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-06 12:44 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Kevin Willford via GitGitGadget, git, stolee, newren,
	Taylor Blau, Bagas Sanjaya, Kevin Willford

Victoria Dye <vdye@github.com> writes:

> Thanks for the thorough explanation, I'm on-board with your approach (and
> will re-roll the series with that implemented). A lot of my thought process
> (and confusion) came from a comment in e5ca291076 (t1092: document bad
> sparse-checkout behavior, 2021-07-14) suggesting that full and sparse
> checkouts should have the same result in scenarios like the one you
> outlined above.

Thanks for bringing this up.  I agree that it is crucial to clarify
what use case we are aiming for.  If the objective were to make a
sparse checkout behave just like full checkout, the desired
behaviour would be very different from a system whose objective is
to allow users to pretend as if the hidden parts of sparse checkout
do not even exist, which was the model my example was after.  I
agree with you that the "comment" in an earlier commit may have been
unhelpful in that they stopped at "should behave the same but they
shouldn't" without saying "why they should behave the same".

If the goal were to make sparse behave like full, continuing with
the previous example, after a

    $ git reset --mixed HEAD^

the user should be able to say

    $ git commit -a --amend

to replace the original two-commit history with a single commit
history that records the same resulting tree.  If the path "skip"
were to be reset to the blob from the first commit, just like the
path "no-skip" is, for such a "commit -a --amend" to work, we would
need to have a working tree file for "skip" magically materialized
with the contents from the *second* commit.  After all, the whole
point of mixed (and soft) reset is that they do not (logically)
change the files in the working tree, so if you are resetting from
the second commit to the first, if you were to have a working tree
file, it should come from the second commit, so that both "skip"
and "no-skip" should show "changed in the working tree relative to
the index", i.e.

    $ git reset --mixed HEAD^
    $ git ls-files -t
    M no-skip
    M skip

While such a "make sparse behave the same way as full" can be made
internally consistent, however, as the above example shows, it would
make the resulting "sparse checkout" practically unusable.

By stepping back a bit and realizing that the reason why the user
wanted to mark some path as "skip-worktree" was because the user had
no intention to make any change to them, we can make it usable again,
by not insisting that sparse should behave the same way as full.

When we redesign these patches, I would like to see what we failed
short the last time gets improved.  Instead of saying "skip-worktree
entries should stay so" and stopping there, we should leave a note
for later readers to explain why they should.

Thanks.



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/7] reset: integrate with sparse index
  2021-10-06  2:15     ` Elijah Newren
@ 2021-10-06 17:48       ` Junio C Hamano
  0 siblings, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-06 17:48 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Victoria Dye via GitGitGadget, Git Mailing List, Derrick Stolee,
	Taylor Blau, Bagas Sanjaya, Victoria Dye

Elijah Newren <newren@gmail.com> writes:

> On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> `reset --soft` does not modify the index, so no compatibility changes are
>> needed for it to function without expanding the index. For all other reset
>> modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is
>> explicitly expanded with `ensure_full_index` to maintain current behavior.
>
> "to maintain current behavior"?  You are changing code here, which
> suggests some kind of behavior is changing, but that description seems
> to be claiming the opposite.  Is it some kind of preventative change
> to add ensure_full_index calls in an additional place, with a later
> patch in the series intending to remove the other one(s), so you're
> making sure that later changes won't cause unwanted behavioral
> changes?  Or was something else meant here?
>
> If the above wasn't what you meant, but you're adding
> ensure_full_index calls, does that suggest that we had some important
> code paths that were not protected by such calls?

The original called read_cache() before we know which mode we
operate in, near the end of parse_args(), which resulted in an
unconditional call to ensure_full_index() in repo_read_index().

This patch delays the call to read_cache().  If parse_pathspec()
and everything the original called after the point where it called
read_cache() needed to have a populated in-core index, the change
can break things---I didn't check thoroughly, but I am guessing
it is OK.

>> Additionally, the `read_cache()` check verifying an uncorrupted index is
>> moved after argument parsing and preparing the repo settings. The index is
>> not used by the preceding argument handling, but `read_cache()` does need to
>> be run after enabling sparse index for the command and before resetting.
>
> This seems to be discussing what code changes are being made, but not
> why.  I'm guessing at the reasoning, but is it something along the
> lines of:
>
> """
> Also, make sure to read_cache() after setting
> command_requires_full_index = 0, so that we don't unnecessarily expand
> the index as part of our early index-corruption check.
> """

I think it is more like "we used to expand very early for all modes,
but with this change we move the read_cache() call to much later,
and force it not to expand.  The modes that call read_from_tree()
needs in-core index fully expanded, so we do so there, but the soft
reset does not call it and would stop expanding."


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/7] reset: behave correctly with sparse-checkout
  2021-10-06  1:46     ` Elijah Newren
@ 2021-10-06 20:09       ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-06 20:09 UTC (permalink / raw)
  To: Elijah Newren, Kevin Willford via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya, Kevin Willford

Elijah Newren wrote:
>> -               int is_missing = !(one->mode && !is_null_oid(&one->oid));
>> +               struct diff_filespec *two = q->queue[i]->two;
>> +               int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
> 
> Isn't !is_null_oid(&one->oid) redundant to checking one->mode?  When
> does the diff machinery ever give you a non-zero mode with a null oid?
> 

It looks like this originally only checked the mode, and the extra OID check
was introduced in ff00b682f2 (reset [<commit>] paths...: do not mishandle
unmerged paths, 2011-07-13). I was able to remove `!is_null_oid(&one->oid)`
from the condition and run the `t71*` tests without any failures, but I'm
hesitant to remove it on the off chance that this handles a case I'm not
thinking of.

> Also, is_in_reset_tree == !is_missing; I'll note that below.
> 
>>                 struct cache_entry *ce;
>>
>> +               /*
>> +                * If the file being reset has `skip-worktree` enabled, we need
>> +                * to check it out to prevent the file from being hard reset.
> 
> I don't understand this comment.  If the file wasn't originally in the
> index (is_missing), and is being added to it, and is correctly marked
> as skip_worktree, and the file isn't in the working tree, then it
> sounds like everything is already in a good state.  Files outside the
> sparse checkout are meant to have the skip_worktree bit set and be
> missing from the working tree.
> 
> Also, I don't know what you mean by 'hard reset' here.
> 
>> +                */
>> +               pos = cache_name_pos(two->path, strlen(two->path));
>> +               if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
>> +                       struct checkout state = CHECKOUT_INIT;
>> +                       state.force = 1;
>> +                       state.refresh_cache = 1;
>> +                       state.istate = &the_index;
>> +
>> +                       checkout_entry(active_cache[pos], &state, NULL, NULL);
> 
> Does this introduce an error in the opposite direction from the one
> stated in the commit message?  Namely we have two things that should
> be in sync: the skip_worktree flag stating whether the file should be
> present in the working directory (skip_worktree), and the question of
> whether the file is actually in the working directory.  In the commit
> message, you pointed out a case where the y were out of sync one way:
> the skip_worktree flag was not set but the file was missing.  Here you
> say the skip_worktree flag is set, but you add it to the working tree
> anyway.
> 
> Or am I misunderstanding the code?
> 

Most of this is addressed in [1], and you're right that what's in this 
patch isn't the right fix for the problem. This patch tried to solve the
issue of "skip-worktree is being ignored and reset files are showing up 
deleted" by continuing to ignore `skip-worktree`, but now checking out the
`skip-worktree` files based on their pre-reset state in the index (unless
they, for some reason, were already present in the worktree). However, that
completely disregards the reasoning for having `skip-worktree` in the first
place (the user wants the file *ignored* in the worktree) and violates the
premise of `git reset --mixed` not modifying the worktree, so the better
solution is to set `skip-worktree` in the resulting index entry and not
check out anything.

[1] https://lore.kernel.org/git/9b99e856-24cc-03fd-7871-de92dc6e39b6@github.com/

>> +               }
>> +
> 
> [I did some slight editing to the diff to make the next two parts
> appear next to each other]
> 
>> -               if (is_missing && !intent_to_add) {
>> +               if (!is_in_reset_tree && !intent_to_add) {
> 
> I thought this was some subtle bugfix or something, and spent a while
> trying to figure it out, before realizing that is_in_reset_tree was
> simply defined as !is_missing (for some reason I was assuming it was
> dealing with two->mode while is_missing was looking at one->mode).  So
> this is a simple variable renaming, which I think is probably good,
> but I'd prefer if this was separated into a different patch to make it
> easier to review.
> 

Good call, I'll include this in V3.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test
  2021-10-06  2:00     ` Elijah Newren
@ 2021-10-06 20:40       ` Victoria Dye
  2021-10-08  3:42         ` Elijah Newren
  0 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye @ 2021-10-06 20:40 UTC (permalink / raw)
  To: Elijah Newren, Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya

Elijah Newren wrote:
> On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> Add a new `--force-full-index` option to `git update-index`, which skips
>> explicitly setting `command_requires_full_index`. This lets `git
>> update-index --force-full-index` run as a command without sparse index
>> compatibility implemented, even after it receives sparse index compatibility
>> updates.
>>
>> By using `git update-index --force-full-index` in the `t1092` test
>> `sparse-index is expanded and converted back`, commands can continue to
>> integrate with the sparse index without the need to keep modifying the
>> command used in the test.
> 
> So...we're adding a permanent user-facing command line flag, whose
> purpose is just to help us with the transition work of implementing
> sparse indexes everywhere?  Am I reading that right, or is that just
> the reason for t1092 and there are more reasons for it elsewhere?
> 
> Also, I'm curious if update-index is the right place to add this.  If
> you don't want a sparse index anymore, wouldn't a user want to run
>    git sparse-checkout disable
> ?  Or is the point that you do want to keep the sparse checkout, but
> you just don't want the index to also be sparse?  Still, even in that
> case, it seems like adding a subcommand or flag to an existing
> sparse-checkout subcommand would feel more natural, since
> sparse-checkout is the command the user uses to request to get into a
> sparse-checkout and sparse index.
> 

This came out of a conversation [1] on an earlier version of this patch.
Because the `t1092 - sparse-index is expanded and converted back` test
verifies sparse index compatibility (i.e., expand the index when reading,
collapse back to sparse when writing) on commands that don't have any sparse
index integration, it needed to be changed from `git reset` to something
else. However, as we keep integrating commands with sparse index we'd need
to keep changing the command in the test, creating a bunch of patches doing
effectively the same thing for no long-term benefit. 

The `--force-full-index` flag isn't meant to be used externally or modify
the index in any "new" way - it's really just a "test" version of `git
update-index` that we guarantee will accurately represent a command using
the default settings. Right now, it does exactly what `git update-index`
(without the flag) does, and will only behave differently once `git
update-index` is integrated with sparse index. Using `--force-full-index`,
the test won't need to be regularly updated and will continue to catch
errors like:

1. Changing the default value of `command_requires_full_index` to 0
2. Not expanding a sparse index to full when `command_requires_full_index`
   is 1
3. Not collapsing the index back to sparse if sparse index is enabled

I see the issue of introducing a test-only option (when sparse index is
integrated everywhere, shouldn't it be deprecated?). If there's a way to
make this more obviously internal/temporary, I'm happy to modify it. Or, if
semi-frequent updates of the command in the test aren't a huge issue, I can
revert to V1.

[1] https://lore.kernel.org/git/xmqqr1d58v9x.fsf@gitster.g/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 5/7] reset: make sparse-aware (except --mixed)
  2021-10-06  3:43     ` Elijah Newren
@ 2021-10-06 20:56       ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-06 20:56 UTC (permalink / raw)
  To: Elijah Newren, Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya

Elijah Newren wrote:
>> +static void prime_cache_tree_sparse_dir(struct repository *r,
>> +                                       struct cache_tree *it,
>> +                                       struct tree *tree,
>> +                                       struct strbuf *tree_path)
>> +{
>> +
>> +       oidcpy(&it->oid, &tree->object.oid);
>> +       it->entry_count = 1;
>> +       return;
> 
> Why are 'r' and 'tree_path' passed to this function?
> 

I mindlessly copied the function signature of `prime_cache_tree_rec` and
didn't notice those variables weren't needed (I'll remove them in V3).

>> +}
>> +
>>  static void prime_cache_tree_rec(struct repository *r,
>>                                  struct cache_tree *it,
>> -                                struct tree *tree)
>> +                                struct tree *tree,
>> +                                struct strbuf *tree_path)
>>  {
>> +       struct strbuf subtree_path = STRBUF_INIT;
>>         struct tree_desc desc;
>>         struct name_entry entry;
>>         int cnt;
>>
>>         oidcpy(&it->oid, &tree->object.oid);
>> +
> 
> Why the blank line addition here?
> 

My goal was to visually separate the parts of `prime_cache_tree_rec` that
update the properties of the `tree` itself and the parts that deal with its
entries. For me, it was helpful when reading and understanding what this
function does and seemed like an good (minor) readability change.

>>         init_tree_desc(&desc, tree->buffer, tree->size);
>>         cnt = 0;
>>         while (tree_entry(&desc, &entry)) {
>> @@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r,
>>                 else {
>>                         struct cache_tree_sub *sub;
>>                         struct tree *subtree = lookup_tree(r, &entry.oid);
>> +
>>                         if (!subtree->object.parsed)
>>                                 parse_tree(subtree);
>>                         sub = cache_tree_sub(it, entry.path);
>>                         sub->cache_tree = cache_tree();
>> -                       prime_cache_tree_rec(r, sub->cache_tree, subtree);
> 
>> +                       strbuf_reset(&subtree_path);
>> +                       strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1);
>> +                       strbuf_addbuf(&subtree_path, tree_path);
>> +                       strbuf_add(&subtree_path, entry.path, entry.pathlen);
>> +                       strbuf_addch(&subtree_path, '/');
> 
> Reconstructing the full path each time?  And despite only being useful
> for the sparse-index case?
> 
> Would it be better to drop subtree_path from this function, then
> append entry.path + '/' here to tree_path, and then after the if-block
> below, call strbuf_setlen to remove the part that this function call
> added?  That way, we don't need subtree_path, and don't have to copy
> the leading path every time.
> 
> Also, maybe it'd be better to only do this strbuf manipulation if
> r->index->sparse_index, since it's not ever used otherwise?
> 

[...]

>> -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
>> +static int index_name_stage_pos(struct index_state *istate,
>> +                               const char *name, int namelen,
>> +                               int stage,
>> +                               int search_sparse)
> 
> It'd be nicer to make search_sparse an enum defined within this file, so that...
> 
>>  {
>>         int first, last;
>>
>> @@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>>                 first = next+1;
>>         }
>>
>> -       if (istate->sparse_index &&
>> +       if (search_sparse && istate->sparse_index &&
>>             first > 0) {
>>                 /* Note: first <= istate->cache_nr */
>>                 struct cache_entry *ce = istate->cache[first - 1];
>> @@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>>                     ce_namelen(ce) < namelen &&
>>                     !strncmp(name, ce->name, ce_namelen(ce))) {
>>                         ensure_full_index(istate);
>> -                       return index_name_stage_pos(istate, name, namelen, stage);
>> +                       return index_name_stage_pos(istate, name, namelen, stage, search_sparse);
>>                 }
>>         }
>>
>> @@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>>
>>  int index_name_pos(struct index_state *istate, const char *name, int namelen)
>>  {
>> -       return index_name_stage_pos(istate, name, namelen, 0);
>> +       return index_name_stage_pos(istate, name, namelen, 0, 1);
> 
> ...this could use SEARCH_SPARSE or some name like that which is more
> meaningful than "1" here.
> 
>> +}
>> +
>> +int index_entry_exists(struct index_state *istate, const char *name, int namelen)
>> +{
>> +       return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0;
> 
> ...and likewise this spot could use SEARCH_FULL or some name like
> that, which is more meaningful than the second "0".
> 
> Similarly for multiple call sites below...
> 
> 

I like all of these suggestions and will include them in the next version. Thanks!

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 6/7] reset: make --mixed sparse-aware
  2021-10-06  4:43     ` Elijah Newren
@ 2021-10-07 14:34       ` Victoria Dye
  0 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-07 14:34 UTC (permalink / raw)
  To: Elijah Newren, Victoria Dye via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Bagas Sanjaya

Elijah Newren wrote:
>> +       /*
>> +        * When pathspec is given for resetting a cone-mode sparse checkout, it may
>> +        * identify entries that are nested in sparse directories, in which case the
>> +        * index should be expanded. For the sake of efficiency, this check is
>> +        * overly-cautious: anything with a wildcard or a magic prefix requires
>> +        * expansion, as well as literal paths that aren't in the sparse checkout
>> +        * definition AND don't match any directory in the index.
> 
> s/efficiency/efficiency of checking/ ?  Being overly-cautious suggests
> you'll expand to a full index more than is needed, and full indexes
> are more expensive.  But perhaps the checking would be expensive too
> so you have a tradeoff?
> 
> Or maybe s/efficiency/simplicity/?
> 

"Simplicity" is probably more appropriate, although the original intent was
"efficiency of checking". I wanted to avoid repeated iteration over the
index (for example, matching the `no_wildcard_len` of each wildcard pathspec
item against each sparse directory in the index). However, to your point,
expanding the index is a more expensive operation anyway, so it's probably
worth the more involved checks.

>> +        */
>> +       if (pathspec->nr && the_index.sparse_index) {
>> +               if (pathspec->magic || pathspec->has_wildcard) {
>> +                       ensure_full_index(&the_index);
> 
> dir.c has the notion of matching the characters preceding the wildcard
> characters; look for "no_wildcard_len".  If the pathspec doesn't match
> a path up to no_wildcard_len, then the wildcard character(s) later in
> the pathspec can't make the pathspec match that path.
> 
> It might at least be worth mentioning this as a possible future optimization.
> 

I'll incorporate a something like this into the next version.

>> +               } else {
>> +                       for (i = 0; i < pathspec->nr; i++) {
>> +                               if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
>> +                                   !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> 
> What if the pathspec corresponds to a sparse-directory in the index,
> but possibly without the trailing '/' character?  e.g.:
> 
>    git reset HEAD~1 -- sparse-directory
> 
> One should be able to reset that directory without recursing into
> it...does this code handle that?  Does it handle it if we add the
> trailing slash on the path for the reset command line?
> 

It handles both cases (with and without trailing slash), the former due to
`!matches_skip_worktree(...)` and the latter due to
`!path_in_cone_mode_sparse_checkout(...)`.

>> +                                       ensure_full_index(&the_index);
>> +                                       break;
>> +                               }
>> +                       }
>> +               }
>> +       }
>> +
>> +       free(skip_worktree_seen);
>>
>> -       ensure_full_index(&the_index);
>>         if (do_diff_cache(tree_oid, &opt))
>>                 return 1;
>>         diffcore_std(&opt);
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index e301ef5633a..4afcbc2d673 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' '
>>                 ensure_not_expanded reset --hard $ref || return 1
>>         done &&
>>
>> +       ensure_not_expanded reset --mixed base &&
>>         ensure_not_expanded reset --hard update-deep &&
>>         ensure_not_expanded reset --keep base &&
>>         ensure_not_expanded reset --merge update-deep &&
>> -       ensure_not_expanded reset --hard &&
> 
> This commit was only touching the --mixed case; why is it removing one
> of the tests for --hard?
> 

[...]

>> +       ensure_not_expanded reset --hard update-folder1 &&
> 
> Wait...is update-folder1 a branch or a path?  And if this commit is
> about --mixed, why are --hard testcases being added?
> 
>> +       ensure_not_expanded reset base -- folder1 &&
>> +
>> +       ensure_not_expanded reset --hard update-deep &&
> 
> another --hard testcase...was this an accidental squash by chance?
> 

I included `git reset --hard` between the "actual" test cases so that the
`git reset --mixed` tests would start in a "clean" state (clear out any
modified files), but it's unnecessary in most cases so I'll remove them in
V3. To answer your other question, `update-folder1` is a branch.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 0/8] Sparse Index: integrate with reset
  2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-10-06  5:46   ` Elijah Newren
@ 2021-10-07 21:15   ` Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
                       ` (8 more replies)
  9 siblings, 9 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye

This series integrates the sparse index with git reset and provides
miscellaneous fixes and improvements to the command in sparse checkouts.
This includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the command
 2. repository settings to enable the sparse index with ensure_full_index
    guarding any code paths that break tests without other compatibility
    updates.
 3. modifications to remove or reduce the scope in which ensure_full_index
    must be called.

The sparse index updates are predicated on a fix originating from the
microsoft/git fork [1], correcting how git reset --mixed handles resetting
entries outside the sparse checkout definition. Additionally, a performance
"bug" in next_cache_entry with sparse index is corrected, preventing
repeatedly looping over already-searched entries.

The p2000 tests demonstrate a ~70% execution time reduction in git reset
using a sparse index, and no change (within expected variability [2]) using
a full index. Results summarized below [3, 4]:

Test                           base              [5/8]                 
-----------------------------------------------------------------------
git reset --hard (full-v3)     1.00(0.50+0.39)   0.97(0.50+0.37) -3.0% 
git reset --hard (full-v4)     1.00(0.51+0.38)   0.96(0.50+0.36) -4.0% 
git reset --hard (sparse-v3)   1.68(1.17+0.39)   1.37(0.91+0.35) -18.5%
git reset --hard (sparse-v4)   1.70(1.18+0.40)   1.41(0.94+0.35) -17.1%

Test                           base              [6/8]   
-----------------------------------------------------------------------
git reset --hard (full-v3)     1.00(0.50+0.39)   0.94(0.48+0.34) -6.0% 
git reset --hard (full-v4)     1.00(0.51+0.38)   0.95(0.51+0.34) -5.0% 
git reset --hard (sparse-v3)   1.68(1.17+0.39)   0.46(0.05+0.29) -72.6%
git reset --hard (sparse-v4)   1.70(1.18+0.40)   0.46(0.06+0.29) -72.9%

Test                               base              [7/8]
---------------------------------------------------------------------------
git reset (full-v3)                0.77(0.27+0.37)   0.72(0.26+0.32) -6.5%
git reset (full-v4)                0.75(0.27+0.34)   0.73(0.26+0.32) -2.7%
git reset (sparse-v3)              1.44(0.96+0.36)   0.43(0.04+0.96) -70.1%
git reset (sparse-v4)              1.46(0.97+0.36)   0.43(0.05+0.79) -70.5%
git reset -- missing (full-v3)     0.72(0.26+0.32)   0.69(0.26+0.30) -4.2%
git reset -- missing (full-v4)     0.74(0.28+0.33)   0.71(0.27+0.32) -4.1% 
git reset -- missing (sparse-v3)   1.45(0.97+0.35)   0.81(0.42+0.90) -44.1%
git reset -- missing (sparse-v4)   1.41(0.94+0.34)   0.79(0.42+0.76) -44.0%

Test                               base              [8/8]            
---------------------------------------------------------------------------
git reset -- missing (full-v3)     0.72(0.26+0.32)   0.73(0.26+0.33) +1.4% 
git reset -- missing (full-v4)     0.74(0.28+0.33)   0.74(0.27+0.32) +0.0% 
git reset -- missing (sparse-v3)   1.45(0.97+0.35)   0.43(0.05+0.80) -70.3%
git reset -- missing (sparse-v4)   1.41(0.94+0.34)   0.44(0.05+0.76) -68.8%



Changes since V1
================

 * Add --force-full-index option to update-index. The option is used
   circumvent changing command_requires_full_index from its default value -
   right now this is effectively a no-op, but will change once update-index
   is integrated with sparse index. By using this option in the t1092
   expand/collapse test, the command used to test will not need to be
   updated with subsequent sparse index integrations.
 * Update implementation of mixed reset for entries outside sparse checkout
   definition. The condition in which a file should be checked out before
   index reset is simplified to "if it has skip-worktree enabled and a reset
   would change the file, check it out".
   * After checking the behavior of update_index_from_diff with renames,
     found that the diff used by reset does not produce diff queue entries
     with different pathnames for one and two. Because of this, and that
     nothing in the implementation seems to rely on identical path names, no
     BUG check is added.
 * Correct a bug in the sparse index is not expanded tests in t1092 where
   failure of a git reset --mixed test was not being reported. Test now
   verifies an appropriate scenario with corrected failure-checking.


Changes since V2
================

 * Replace patch adding checkouts for git reset --mixed with sparse checkout
   with preserving the skip-worktree flag (including a new test for git
   reset --mixed and update to t1092 - checkout and reset (mixed))
 * Move rename of is_missing into its own patch
 * Further extend t1092 tests and remove unnecessary commands/tests where
   possible
 * Refine logic determining which pathspecs require ensure_full_index in git
   reset --mixed, add related ensure_not_expanded tests
 * Add index_search_mode enum to index_name_stage_pos
 * Clean up variable usage & remove unnecessary subtree_path in
   prime_cache_tree_rec
 * Update cover letter performance data
 * More thoroughly explain changes in each commit message

Thanks! -Victoria

[1] microsoft@6b8a074 [2]
https://lore.kernel.org/git/8b9fe3f8-f0e3-4567-b20b-17c92bd1a5c5@github.com/
[3] If a test and/or commit is not mentioned, there is no significant change
to performance [4] Pathspec "does-not-exist" is changed to "missing" to save
space in performance report

Kevin Willford (1):
  reset: preserve skip-worktree bit in mixed reset

Victoria Dye (7):
  reset: rename is_missing to !is_in_reset_tree
  update-index: add --force-full-index option for expand/collapse test
  reset: expand test coverage for sparse checkouts
  reset: integrate with sparse index
  reset: make sparse-aware (except --mixed)
  reset: make --mixed sparse-aware
  unpack-trees: improve performance of next_cache_entry

 Documentation/git-update-index.txt       |   5 +
 builtin/reset.c                          | 104 +++++++++++++++++-
 builtin/update-index.c                   |  11 ++
 cache-tree.c                             |  46 +++++++-
 cache.h                                  |  10 ++
 read-cache.c                             |  27 +++--
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 133 ++++++++++++++++++++---
 t/t7102-reset.sh                         |  17 +++
 unpack-trees.c                           |  23 +++-
 10 files changed, 342 insertions(+), 37 deletions(-)


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1048

Range-diff vs v2:

 -:  ----------- > 1:  ad7013a31aa reset: rename is_missing to !is_in_reset_tree
 1:  22c69bc6030 ! 2:  1f6da84830b reset: behave correctly with sparse-checkout
     @@ Metadata
      Author: Kevin Willford <kewillf@microsoft.com>
      
       ## Commit message ##
     -    reset: behave correctly with sparse-checkout
     +    reset: preserve skip-worktree bit in mixed reset
      
     -    When using the sparse checkout feature, 'git reset' will add entries to
     -    the index that will have the skip-worktree bit off but will leave the
     -    working directory empty. File data is lost because the index version of
     -    the files has been changed but there is nothing that is in the working
     -    directory. This will cause the next 'git status' call to show either
     -    deleted for files modified or deleting or nothing for files added. The
     -    added files should be shown as untracked and modified files should be
     -    shown as modified.
     +    Change `update_index_from_diff` to set `skip-worktree` when applicable for
     +    new index entries. When `git reset --mixed <tree-ish>` is run, entries in
     +    the index with differences between the pre-reset HEAD and reset <tree-ish>
     +    are identified and handled with `update_index_from_diff`. For each file, a
     +    new cache entry in inserted into the index, created from the <tree-ish> side
     +    of the reset (without changing the working tree). However, the newly-created
     +    entry must have `skip-worktree` explicitly set in either of the following
     +    scenarios:
      
     -    To fix this when the reset is running if there is not a file in the
     -    working directory and if it will be missing with the new index entry or
     -    was not missing in the previous version, we create the previous index
     -    version of the file in the working directory so that status will report
     -    correctly and the files will be availble for the user to deal with.
     +    1. the file is in the current index and has `skip-worktree` set
     +    2. the file is not in the current index but is outside of a defined sparse
     +       checkout definition
      
     -    This fixes a documented failure from t1092 that was created in 19a0acc
     -    (t1092: test interesting sparse-checkout scenarios, 2021-01-23).
     +    Not setting the `skip-worktree` bit leads to likely-undesirable results for
     +    a user. It causes `skip-worktree` settings to disappear on the
     +    "diff"-containing files (but *only* the diff-containing files), leading to
     +    those files now showing modifications in `git status`. For example, when
     +    running `git reset --mixed` in a sparse checkout, some file entries outside
     +    of sparse checkout could show up as deleted, despite the user never deleting
     +    anything (and not wanting them on-disk anyway).
      
     -    Signed-off-by: Kevin Willford <kewillf@microsoft.com>
     -    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved
     +    in a basic `git reset --mixed` scenario and update a failure-documenting
     +    test from 19a0acc (t1092: test interesting sparse-checkout scenarios,
     +    2021-01-23) with new expected behavior.
     +
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## builtin/reset.c ##
     @@ builtin/reset.c
       #include "submodule.h"
       #include "submodule-config.h"
      +#include "dir.h"
     -+#include "entry.h"
       
       #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
       
     @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
       	for (i = 0; i < q->nr; i++) {
      +		int pos;
       		struct diff_filespec *one = q->queue[i]->one;
     --		int is_missing = !(one->mode && !is_null_oid(&one->oid));
     -+		struct diff_filespec *two = q->queue[i]->two;
     -+		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
     + 		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
       		struct cache_entry *ce;
     +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
       
     --		if (is_missing && !intent_to_add) {
     + 		ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path,
     + 				      0, 0);
     ++
      +		/*
     -+		 * If the file being reset has `skip-worktree` enabled, we need
     -+		 * to check it out to prevent the file from being hard reset.
     ++		 * If the file 1) corresponds to an existing index entry with
     ++		 * skip-worktree set, or 2) does not exist in the index but is
     ++		 * outside the sparse checkout definition, add a skip-worktree bit
     ++		 * to the new index entry.
      +		 */
     -+		pos = cache_name_pos(two->path, strlen(two->path));
     -+		if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
     -+			struct checkout state = CHECKOUT_INIT;
     -+			state.force = 1;
     -+			state.refresh_cache = 1;
     -+			state.istate = &the_index;
     ++		pos = cache_name_pos(one->path, strlen(one->path));
     ++		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
     ++		    (pos < 0 && !path_in_sparse_checkout(one->path, &the_index)))
     ++			ce->ce_flags |= CE_SKIP_WORKTREE;
      +
     -+			checkout_entry(active_cache[pos], &state, NULL, NULL);
     -+		}
     -+
     -+		if (!is_in_reset_tree && !intent_to_add) {
     - 			remove_file_from_cache(one->path);
     - 			continue;
     - 		}
     -@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
       		if (!ce)
       			die(_("make_cache_entry failed for path '%s'"),
       			    one->path);
     --		if (is_missing) {
     -+		if (!is_in_reset_tree) {
     - 			ce->ce_flags |= CE_INTENT_TO_ADD;
     - 			set_object_name_for_intent_to_add_entry(ce);
     - 		}
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathspec outside sparse definition' '
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathsp
       	init_repos &&
       
       	test_all_match git checkout -b reset-test update-deep &&
     + 	test_all_match git reset deepest &&
     +-	test_all_match git reset update-folder1 &&
     +-	test_all_match git reset update-folder2
     +-'
     +-
     +-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
     +-# in this scenario, but it shouldn't.
     +-test_expect_success 'checkout and reset (mixed) [sparse]' '
     +-	init_repos &&
     + 
     +-	test_sparse_match git checkout -b reset-test update-deep &&
     +-	test_sparse_match git reset deepest &&
     ++	# Because skip-worktree is preserved, resetting to update-folder1
     ++	# will show worktree changes for full-checkout that are not present
     ++	# in sparse-checkout or sparse-index.
     + 	test_sparse_match git reset update-folder1 &&
     +-	test_sparse_match git reset update-folder2
     ++	run_on_sparse test_path_is_missing folder1
     + '
     + 
     + test_expect_success 'merge, cherry-pick, and rebase' '
      
     - ## t/t7114-reset-sparse-checkout.sh (new) ##
     -@@
     -+#!/bin/sh
     -+
     -+test_description='reset when using a sparse-checkout'
     -+
     -+. ./test-lib.sh
     -+
     -+test_expect_success 'setup' '
     -+	test_tick &&
     -+	echo "checkout file" >c &&
     -+	echo "modify file" >m &&
     -+	echo "delete file" >d &&
     -+	git add . &&
     -+	git commit -m "initial commit" &&
     -+	echo "added file" >a &&
     -+	echo "modification of a file" >m &&
     -+	git rm d &&
     -+	git add . &&
     -+	git commit -m "second commit" &&
     -+	git checkout -b endCommit
     -+'
     -+
     -+test_expect_success 'reset when there is a sparse-checkout' '
     -+	echo "/c" >.git/info/sparse-checkout &&
     -+	test_config core.sparsecheckout true &&
     -+	git checkout -B resetBranch &&
     -+	test_path_is_missing m &&
     -+	test_path_is_missing a &&
     -+	test_path_is_missing d &&
     -+	git reset HEAD~1 &&
     -+	echo "checkout file" >expect &&
     -+	test_cmp expect c &&
     -+	echo "added file" >expect &&
     -+	test_cmp expect a &&
     -+	echo "modification of a file" >expect &&
     -+	test_cmp expect m &&
     -+	test_path_is_missing d
     -+'
     + ## t/t7102-reset.sh ##
     +@@ t/t7102-reset.sh: test_expect_success '--mixed refreshes the index' '
     + 	test_cmp expect output
     + '
     + 
     ++test_expect_success '--mixed preserves skip-worktree' '
     ++	echo 123 >>file2 &&
     ++	git add file2 &&
     ++	git update-index --skip-worktree file2 &&
     ++	git reset --mixed HEAD >output &&
     ++	test_must_be_empty output &&
      +
     -+test_expect_success 'reset after deleting file without skip-worktree bit' '
     -+	git checkout -f endCommit &&
     -+	git clean -xdf &&
     -+	cat >.git/info/sparse-checkout <<-\EOF &&
     -+	/c
     -+	/m
     ++	cat >expect <<-\EOF &&
     ++	Unstaged changes after reset:
     ++	M	file2
      +	EOF
     -+	test_config core.sparsecheckout true &&
     -+	git checkout -B resetAfterDelete &&
     -+	test_path_is_file m &&
     -+	test_path_is_missing a &&
     -+	test_path_is_missing d &&
     -+	rm -f m &&
     -+	git reset HEAD~1 &&
     -+	echo "checkout file" >expect &&
     -+	test_cmp expect c &&
     -+	echo "added file" >expect &&
     -+	test_cmp expect a &&
     -+	test_path_is_missing m &&
     -+	test_path_is_missing d
     ++	git update-index --no-skip-worktree file2 &&
     ++	git add file2 &&
     ++	git reset --mixed HEAD >output &&
     ++	test_cmp expect output
      +'
      +
     -+test_done
     + test_expect_success 'resetting specific path that is unmerged' '
     + 	git rm --cached file2 &&
     + 	F1=$(git rev-parse HEAD:file1) &&
 2:  f7cb9013d46 ! 3:  014a408ea5d update-index: add --force-full-index option for expand/collapse test
     @@ Commit message
          update-index: add --force-full-index option for expand/collapse test
      
          Add a new `--force-full-index` option to `git update-index`, which skips
     -    explicitly setting `command_requires_full_index`. This lets `git
     -    update-index --force-full-index` run as a command without sparse index
     -    compatibility implemented, even after it receives sparse index compatibility
     -    updates.
     +    explicitly setting `command_requires_full_index`. This option, intended for
     +    use in internal testing purposes only, lets `git update-index` run as a
     +    command without sparse index compatibility implemented, even after it
     +    receives updates to otherwise use the sparse index.
      
     -    By using `git update-index --force-full-index` in the `t1092` test
     -    `sparse-index is expanded and converted back`, commands can continue to
     -    integrate with the sparse index without the need to keep modifying the
     -    command used in the test.
     +    The specific test `--force-full-index` is intended for - `t1092 -
     +    sparse-index is expanded and converted back` - verifies index compatibility
     +    in commands that do not change the default (enabled)
     +    `command_requires_full_index` repo setting. In the past, the test used `git
     +    reset`. However, as `reset` and other commands are integrated with the
     +    sparse index, the command used in the test would need to keep changing.
     +    Conversely, the `--force-full-index` option makes `git update-index` behave
     +    like a not-yet-sparse-aware command, and can be used in the test
     +    indefinitely without interfering with future sparse index integrations.
      
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## Documentation/git-update-index.txt ##
 3:  c7e9d9f4e03 ! 4:  7f21cf53e9d reset: expand test coverage for sparse checkouts
     @@ Commit message
          reset: expand test coverage for sparse checkouts
      
          Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
     -    pathspecs both inside and outside of the sparse checkout definition. New
     -    performance test cases exercise various execution paths for `reset`.
     +    pathspecs. New performance test cases exercise various execution paths for
     +    `reset`.
      
          Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git add -A
       test_done
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and reset (mixed) [sparse]' '
     - 	test_sparse_match git reset update-folder2
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and reset (mixed)' '
     + 	run_on_sparse test_path_is_missing folder1
       '
       
     -+# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit>
     -+# will be added to the work tree even if outside the sparse checkout
     -+# definition, and even if the file is modified to a state of having no local
     -+# changes. The file is "re-ignored" if a hard reset is executed. We may want to
     -+# change this behavior in the future and enforce that files are not written
     -+# outside of the sparse checkout definition.
     -+test_expect_success 'checkout and mixed reset file tracking [sparse]' '
     -+	init_repos &&
     -+
     -+	test_all_match git checkout -b reset-test update-deep &&
     -+	test_all_match git reset update-folder1 &&
     -+	test_all_match git reset update-deep &&
     -+
     -+	# At this point, there are no changes in the working tree. However,
     -+	# folder1/a now exists locally (even though it is outside of the sparse
     -+	# paths).
     -+	run_on_sparse test_path_exists folder1 &&
     -+
     -+	run_on_all rm folder1/a &&
     -+	test_all_match git status --porcelain=v2 &&
     -+
     -+	test_all_match git reset --hard update-deep &&
     -+	run_on_sparse test_path_is_missing folder1 &&
     -+	test_path_exists full-checkout/folder1
     -+'
     -+
      +test_expect_success 'checkout and reset (merge)' '
      +	init_repos &&
      +
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and rese
      +	test_all_match git status --porcelain=v2
      +'
      +
     -+test_expect_success 'reset with sparse directory pathspec outside definition' '
     ++# Although the working tree differs between full and sparse checkouts after
     ++# reset, the state of the index is the same.
     ++test_expect_success 'reset with pathspecs outside sparse definition' '
      +	init_repos &&
     ++	test_all_match git checkout -b reset-test base &&
      +
     -+	test_all_match git checkout -b reset-test update-deep &&
     -+	test_all_match git reset --hard update-folder1 &&
     -+	test_all_match git reset base -- folder1 &&
     -+	test_all_match git status --porcelain=v2
     -+'
     ++	test_sparse_match git reset update-folder1 -- folder1 &&
     ++	git -C full-checkout reset update-folder1 -- folder1 &&
     ++	test_sparse_match git status --porcelain=v2 &&
     ++	test_all_match git rev-parse HEAD:folder1 &&
      +
     -+test_expect_success 'reset with pathspec match in sparse directory' '
     -+	init_repos &&
     -+
     -+	test_all_match git checkout -b reset-test update-deep &&
     -+	test_all_match git reset --hard update-folder1 &&
     -+	test_all_match git reset base -- folder1/a &&
     -+	test_all_match git status --porcelain=v2
     ++	test_sparse_match git reset update-folder2 -- folder2/a &&
     ++	git -C full-checkout reset update-folder2 -- folder2/a &&
     ++	test_sparse_match git status --porcelain=v2 &&
     ++	test_all_match git rev-parse HEAD:folder2/a
      +'
      +
      +test_expect_success 'reset with wildcard pathspec' '
      +	init_repos &&
      +
      +	test_all_match git checkout -b reset-test update-deep &&
     -+	test_all_match git reset --hard update-folder1 &&
      +	test_all_match git reset base -- \*/a &&
     -+	test_all_match git status --porcelain=v2
     ++	test_all_match git status --porcelain=v2 &&
     ++	test_all_match git rev-parse HEAD:folder1/a &&
     ++
     ++	test_all_match git reset base -- folder\* &&
     ++	test_all_match git status --porcelain=v2 &&
     ++	test_all_match git rev-parse HEAD:folder2
      +'
      +
       test_expect_success 'merge, cherry-pick, and rebase' '
 4:  49813c8d9ed ! 5:  a2d6212e287 reset: integrate with sparse index
     @@ Metadata
       ## Commit message ##
          reset: integrate with sparse index
      
     -    `reset --soft` does not modify the index, so no compatibility changes are
     -    needed for it to function without expanding the index. For all other reset
     -    modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is
     -    explicitly expanded with `ensure_full_index` to maintain current behavior.
     +    Disable `command_requires_full_index` repo setting and add
     +    `ensure_full_index` guards around code paths that cannot yet use sparse
     +    directory index entries. `reset --soft` does not modify the index, so no
     +    compatibility changes are needed for it to function without expanding the
     +    index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`),
     +    the full index is expanded to prevent cache tree corruption and invalid
     +    variable accesses.
      
          Additionally, the `read_cache()` check verifying an uncorrupted index is
          moved after argument parsing and preparing the repo settings. The index is
     -    not used by the preceding argument handling, but `read_cache()` does need to
     -    be run after enabling sparse index for the command and before resetting.
     +    not used by the preceding argument handling, but `read_cache()` must be run
     +    *after* enabling sparse index for the command (so that the index is not
     +    expanded unnecessarily) and *before* using the index for reset (so that it
     +    is verified as uncorrupted).
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      
 5:  78cd85d8dcc ! 6:  330e0c09774 reset: make sparse-aware (except --mixed)
     @@ Metadata
       ## Commit message ##
          reset: make sparse-aware (except --mixed)
      
     -    In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`,
     -    the function must determine whether the currently-processing directory in
     -    the tree is sparse or not. If it is not sparse, the tree is parsed and
     -    subtree recursively constructed. If it is sparse, no subtrees are added to
     -    the tree and the entry count is set to 1 (representing the sparse directory
     -    itself).
     +    Remove `ensure_full_index` guard on `prime_cache_tree` and update
     +    `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
     +    the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
     +    must determine whether a directory entry is sparse or not by searching for
     +    it in the index (*without* expanding the index). If a matching sparse
     +    directory index entry is found, no subtrees are added to the cache tree
     +    entry and the entry count is set to 1 (representing the sparse directory
     +    itself). Otherwise, the tree is assumed to not be sparse and its subtrees
     +    are recursively added to the cache tree.
      
     +    Helped-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## cache-tree.c ##
     @@ cache-tree.c: out:
       	return ret;
       }
       
     -+static void prime_cache_tree_sparse_dir(struct repository *r,
     -+					struct cache_tree *it,
     -+					struct tree *tree,
     -+					struct strbuf *tree_path)
     ++static void prime_cache_tree_sparse_dir(struct cache_tree *it,
     ++					struct tree *tree)
      +{
      +
      +	oidcpy(&it->oid, &tree->object.oid);
      +	it->entry_count = 1;
     -+	return;
      +}
      +
       static void prime_cache_tree_rec(struct repository *r,
     @@ cache-tree.c: out:
      +				 struct tree *tree,
      +				 struct strbuf *tree_path)
       {
     -+	struct strbuf subtree_path = STRBUF_INIT;
       	struct tree_desc desc;
       	struct name_entry entry;
       	int cnt;
     ++	int base_path_len = tree_path->len;
       
       	oidcpy(&it->oid, &tree->object.oid);
      +
     @@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r,
       			sub = cache_tree_sub(it, entry.path);
       			sub->cache_tree = cache_tree();
      -			prime_cache_tree_rec(r, sub->cache_tree, subtree);
     -+			strbuf_reset(&subtree_path);
     -+			strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1);
     -+			strbuf_addbuf(&subtree_path, tree_path);
     -+			strbuf_add(&subtree_path, entry.path, entry.pathlen);
     -+			strbuf_addch(&subtree_path, '/');
     ++
     ++			/*
     ++			 * Recursively-constructed subtree path is only needed when working
     ++			 * in a sparse index (where it's used to determine whether the
     ++			 * subtree is a sparse directory in the index).
     ++			 */
     ++			if (r->index->sparse_index) {
     ++				strbuf_setlen(tree_path, base_path_len);
     ++				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
     ++				strbuf_add(tree_path, entry.path, entry.pathlen);
     ++				strbuf_addch(tree_path, '/');
     ++			}
      +
      +			/*
      +			 * If a sparse index is in use, the directory being processed may be
     @@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r,
      +			 * as normal.
      +			 */
      +			if (r->index->sparse_index &&
     -+			    index_entry_exists(r->index, subtree_path.buf, subtree_path.len))
     -+				prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path);
     ++			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
     ++				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
      +			else
     -+				prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path);
     ++				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
       			cnt += sub->cache_tree->entry_count;
       		}
       	}
     - 	it->entry_count = cnt;
      +
     -+	strbuf_release(&subtree_path);
     + 	it->entry_count = cnt;
       }
       
     - void prime_cache_tree(struct repository *r,
     +@@ cache-tree.c: void prime_cache_tree(struct repository *r,
       		      struct index_state *istate,
       		      struct tree *tree)
       {
     @@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const
        * precise match was not found but a position was found where the entry would
      
       ## read-cache.c ##
     +@@
     +  */
     + #define CACHE_ENTRY_PATH_LENGTH 80
     + 
     ++enum index_search_mode {
     ++	NO_EXPAND_SPARSE = 0,
     ++	EXPAND_SPARSE = 1
     ++};
     ++
     + static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len)
     + {
     + 	struct cache_entry *ce;
      @@ read-cache.c: int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
       	return 0;
       }
     @@ read-cache.c: int cache_name_stage_compare(const char *name1, int len1, int stag
      +static int index_name_stage_pos(struct index_state *istate,
      +				const char *name, int namelen,
      +				int stage,
     -+				int search_sparse)
     ++				enum index_search_mode search_mode)
       {
       	int first, last;
       
     @@ read-cache.c: static int index_name_stage_pos(struct index_state *istate, const
       	}
       
      -	if (istate->sparse_index &&
     -+	if (search_sparse && istate->sparse_index &&
     ++	if (search_mode == EXPAND_SPARSE && istate->sparse_index &&
       	    first > 0) {
       		/* Note: first <= istate->cache_nr */
       		struct cache_entry *ce = istate->cache[first - 1];
     @@ read-cache.c: static int index_name_stage_pos(struct index_state *istate, const
       		    !strncmp(name, ce->name, ce_namelen(ce))) {
       			ensure_full_index(istate);
      -			return index_name_stage_pos(istate, name, namelen, stage);
     -+			return index_name_stage_pos(istate, name, namelen, stage, search_sparse);
     ++			return index_name_stage_pos(istate, name, namelen, stage, search_mode);
       		}
       	}
       
     @@ read-cache.c: static int index_name_stage_pos(struct index_state *istate, const
       int index_name_pos(struct index_state *istate, const char *name, int namelen)
       {
      -	return index_name_stage_pos(istate, name, namelen, 0);
     -+	return index_name_stage_pos(istate, name, namelen, 0, 1);
     ++	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
      +}
      +
      +int index_entry_exists(struct index_state *istate, const char *name, int namelen)
      +{
     -+	return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0;
     ++	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
       }
       
       int remove_index_entry_at(struct index_state *istate, int pos)
     @@ read-cache.c: static int has_dir_name(struct index_state *istate,
       		}
       
      -		pos = index_name_stage_pos(istate, name, len, stage);
     -+		pos = index_name_stage_pos(istate, name, len, stage, 1);
     ++		pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE);
       		if (pos >= 0) {
       			/*
       			 * Found one, but not so fast.  This could
     @@ read-cache.c: static int add_index_entry_with_check(struct index_state *istate,
       		pos = index_pos_to_insert_pos(istate->cache_nr);
       	else
      -		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
     -+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
     ++		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
       
       	/* existing match? Just replace it. */
       	if (pos >= 0) {
     @@ read-cache.c: static int add_index_entry_with_check(struct index_state *istate,
       			return error(_("'%s' appears as both a file and as a directory"),
       				     ce->name);
      -		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
     -+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1);
     ++		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
       		pos = -pos-1;
       	}
       	return pos + 1;
 6:  5eaae0825af ! 7:  6ef8e4e31d3 reset: make --mixed sparse-aware
     @@ Metadata
       ## Commit message ##
          reset: make --mixed sparse-aware
      
     -    Sparse directory entries are "diffed" as trees in `diff_cache` (used
     -    internally by `reset --mixed`), following a code path separate from
     -    individual file handling. The use of `diff_tree_oid` there requires setting
     -    explicit `change` and `add_remove` functions to process the internal
     -    contents of a sparse directory.
     +    Remove the `ensure_full_index` guard on `read_from_tree` and update `git
     +    reset --mixed` to ensure it can use sparse directory index entries wherever
     +    possible. Sparse directory entries are reset use `diff_tree_oid`, which
     +    requires `change` and `add_remove` functions to process the internal
     +    contents of the sparse directory. The `recursive` diff option handles cases
     +    in which `reset --mixed` must diff/merge files that are nested multiple
     +    levels deep in a sparse directory.
      
     -    Additionally, the `recursive` diff option handles cases in which `reset
     -    --mixed` must diff/merge files that are nested multiple levels deep in a
     -    sparse directory.
     +    The use of pathspecs with `git reset --mixed` introduces scenarios in which
     +    internal contents of sparse directories may be matched by the pathspec. In
     +    order to reset *all* files in the repo that may match the pathspec, the
     +    following conditions on the pathspec require index expansion before
     +    performing the reset:
      
     +    * "magic" pathspecs
     +    * wildcard pathspecs that do not match only in-cone files or entire sparse
     +      directories
     +    * literal pathspecs matching something outside the sparse checkout
     +      definition
     +
     +    Helped-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## builtin/reset.c ##
     -@@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec,
     - 			  int intent_to_add)
     - {
     - 	struct diff_options opt;
     -+	unsigned int i;
     -+	char *skip_worktree_seen = NULL;
     +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
     + 		 * If the file 1) corresponds to an existing index entry with
     + 		 * skip-worktree set, or 2) does not exist in the index but is
     + 		 * outside the sparse checkout definition, add a skip-worktree bit
     +-		 * to the new index entry.
     ++		 * to the new index entry. Note that a sparse index will be expanded
     ++		 * if this entry is outside the sparse cone - this is necessary
     ++		 * to properly construct the reset sparse directory.
     + 		 */
     + 		pos = cache_name_pos(one->path, strlen(one->path));
     + 		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
     +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
     + 	}
     + }
       
     - 	memset(&opt, 0, sizeof(opt));
     - 	copy_pathspec(&opt.pathspec, pathspec);
     -@@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec,
     - 	opt.format_callback = update_index_from_diff;
     - 	opt.format_callback_data = &intent_to_add;
     - 	opt.flags.override_submodule_config = 1;
     -+	opt.flags.recursive = 1;
     - 	opt.repo = the_repository;
     -+	opt.change = diff_change;
     -+	opt.add_remove = diff_addremove;
     ++static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
     ++{
     ++	unsigned int i, pos;
     ++	int res = 0;
     ++	char *skip_worktree_seen = NULL;
      +
      +	/*
     -+	 * When pathspec is given for resetting a cone-mode sparse checkout, it may
     -+	 * identify entries that are nested in sparse directories, in which case the
     -+	 * index should be expanded. For the sake of efficiency, this check is
     -+	 * overly-cautious: anything with a wildcard or a magic prefix requires
     -+	 * expansion, as well as literal paths that aren't in the sparse checkout
     -+	 * definition AND don't match any directory in the index.
     ++	 * When using a magic pathspec, assume for the sake of simplicity that
     ++	 * the index needs to be expanded to match all matchable files.
      +	 */
     -+	if (pathspec->nr && the_index.sparse_index) {
     -+		if (pathspec->magic || pathspec->has_wildcard) {
     -+			ensure_full_index(&the_index);
     -+		} else {
     -+			for (i = 0; i < pathspec->nr; i++) {
     -+				if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) &&
     -+				    !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
     -+					ensure_full_index(&the_index);
     ++	if (pathspec->magic)
     ++		return 1;
     ++
     ++	for (i = 0; i < pathspec->nr; i++) {
     ++		struct pathspec_item item = pathspec->items[i];
     ++
     ++		/*
     ++		 * If the pathspec item has a wildcard, the index should be expanded
     ++		 * if the pathspec has the possibility of matching a subset of entries inside
     ++		 * of a sparse directory (but not the entire directory).
     ++		 *
     ++		 * If the pathspec item is a literal path, the index only needs to be expanded
     ++		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
     ++		 * expand for in-cone files) and b) it doesn't match any sparse directories
     ++		 * (since we can reset whole sparse directories without expanding them).
     ++		 */
     ++		if (item.nowildcard_len < item.len) {
     ++			for (pos = 0; pos < active_nr; pos++) {
     ++				struct cache_entry *ce = active_cache[pos];
     ++
     ++				if (!S_ISSPARSEDIR(ce->ce_mode))
     ++					continue;
     ++
     ++				/*
     ++				 * If the pre-wildcard length is longer than the sparse
     ++				 * directory name and the sparse directory is the first
     ++				 * component of the pathspec, need to expand the index.
     ++				 */
     ++				if (item.nowildcard_len > ce_namelen(ce) &&
     ++				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
     ++					res = 1;
     ++					break;
     ++				}
     ++
     ++				/*
     ++				 * If the pre-wildcard length is shorter than the sparse
     ++				 * directory and the pathspec does not match the whole
     ++				 * directory, need to expand the index.
     ++				 */
     ++				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
     ++				    wildmatch(item.original, ce->name, 0)) {
     ++					res = 1;
      +					break;
      +				}
      +			}
     -+		}
     ++		} else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
     ++			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
     ++			res = 1;
     ++
     ++		if (res > 0)
     ++			break;
      +	}
      +
      +	free(skip_worktree_seen);
     ++	return res;
     ++}
     ++
     + static int read_from_tree(const struct pathspec *pathspec,
     + 			  struct object_id *tree_oid,
     + 			  int intent_to_add)
     +@@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec,
     + 	opt.format_callback = update_index_from_diff;
     + 	opt.format_callback_data = &intent_to_add;
     + 	opt.flags.override_submodule_config = 1;
     ++	opt.flags.recursive = 1;
     + 	opt.repo = the_repository;
     ++	opt.change = diff_change;
     ++	opt.add_remove = diff_addremove;
     ++
     ++	if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
     ++		ensure_full_index(&the_index);
       
      -	ensure_full_index(&the_index);
       	if (do_diff_cache(tree_oid, &opt))
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
       	ensure_not_expanded reset --hard update-deep &&
       	ensure_not_expanded reset --keep base &&
       	ensure_not_expanded reset --merge update-deep &&
     --	ensure_not_expanded reset --hard &&
     + 	ensure_not_expanded reset --hard &&
       
      +	ensure_not_expanded reset base -- deep/a &&
      +	ensure_not_expanded reset base -- nonexistent-file &&
      +	ensure_not_expanded reset deepest -- deep &&
      +
      +	# Although folder1 is outside the sparse definition, it exists as a
     -+	# directory entry in the index, so it will be reset without needing to
     -+	# expand the full index.
     -+	ensure_not_expanded reset --hard update-folder1 &&
     -+	ensure_not_expanded reset base -- folder1 &&
     ++	# directory entry in the index, so the pathspec will not force the
     ++	# index to be expanded.
     ++	ensure_not_expanded reset deepest -- folder1 &&
     ++	ensure_not_expanded reset deepest -- folder1/ &&
     ++
     ++	# Wildcard identifies only in-cone files, no index expansion
     ++	ensure_not_expanded reset deepest -- deep/\* &&
     ++
     ++	# Wildcard identifies only full sparse directories, no index expansion
     ++	ensure_not_expanded reset deepest -- folder\* &&
      +
     -+	ensure_not_expanded reset --hard update-deep &&
       	ensure_not_expanded checkout -f update-deep &&
       	test_config -C sparse-index pull.twohead ort &&
       	(
 7:  aa963eefae7 ! 8:  c7145e039f3 unpack-trees: improve performance of next_cache_entry
     @@ Commit message
          beginning of the index each time it is called.
      
          The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
     -    (unpack-trees: preserve cache_bottom, 2021-07-14)).  Therefore, to retain
     -    the benefit `cache_bottom` provides in non-sparse index cases, a separate
     -    `hint` position indicates the first position `next_cache_entry` should
     -    search, updated each execution with a new position.  The performance of `git
     -    reset -- does-not-exist` (testing the "worst case" in which all entries in
     -    the index are unpacked with `next_cache_entry`) is significantly improved
     -    for the sparse index case:
     -
     -    Test          before            after
     -    ------------------------------------------------------
     -    (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
     -    (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
     -    (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
     -    (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%
     +    (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the
     +    benefit `cache_bottom` provides in non-sparse index cases, a separate `hint`
     +    position indicates the first position `next_cache_entry` should search,
     +    updated each execution with a new position.
      
          Signed-off-by: Victoria Dye <vdye@github.com>
      

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 1/8] reset: rename is_missing to !is_in_reset_tree
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset Kevin Willford via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Rename and invert value of `is_missing` to `is_in_reset_tree` to make the
variable more descriptive of what it represents.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 51c9e2f43ff..d3695ce43c4 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -131,10 +131,10 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filespec *one = q->queue[i]->one;
-		int is_missing = !(one->mode && !is_null_oid(&one->oid));
+		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
 		struct cache_entry *ce;
 
-		if (is_missing && !intent_to_add) {
+		if (!is_in_reset_tree && !intent_to_add) {
 			remove_file_from_cache(one->path);
 			continue;
 		}
@@ -144,7 +144,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 		if (!ce)
 			die(_("make_cache_entry failed for path '%s'"),
 			    one->path);
-		if (is_missing) {
+		if (!is_in_reset_tree) {
 			ce->ce_flags |= CE_INTENT_TO_ADD;
 			set_object_name_for_intent_to_add_entry(ce);
 		}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Kevin Willford via GitGitGadget
  2021-10-08  9:04       ` Junio C Hamano
  2021-10-07 21:15     ` [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 85+ messages in thread
From: Kevin Willford via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Kevin Willford

From: Kevin Willford <kewillf@microsoft.com>

Change `update_index_from_diff` to set `skip-worktree` when applicable for
new index entries. When `git reset --mixed <tree-ish>` is run, entries in
the index with differences between the pre-reset HEAD and reset <tree-ish>
are identified and handled with `update_index_from_diff`. For each file, a
new cache entry in inserted into the index, created from the <tree-ish> side
of the reset (without changing the working tree). However, the newly-created
entry must have `skip-worktree` explicitly set in either of the following
scenarios:

1. the file is in the current index and has `skip-worktree` set
2. the file is not in the current index but is outside of a defined sparse
   checkout definition

Not setting the `skip-worktree` bit leads to likely-undesirable results for
a user. It causes `skip-worktree` settings to disappear on the
"diff"-containing files (but *only* the diff-containing files), leading to
those files now showing modifications in `git status`. For example, when
running `git reset --mixed` in a sparse checkout, some file entries outside
of sparse checkout could show up as deleted, despite the user never deleting
anything (and not wanting them on-disk anyway).

Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved
in a basic `git reset --mixed` scenario and update a failure-documenting
test from 19a0acc (t1092: test interesting sparse-checkout scenarios,
2021-01-23) with new expected behavior.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 14 ++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 19 +++++--------------
 t/t7102-reset.sh                         | 17 +++++++++++++++++
 3 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index d3695ce43c4..e441b6601b9 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -25,6 +25,7 @@
 #include "cache-tree.h"
 #include "submodule.h"
 #include "submodule-config.h"
+#include "dir.h"
 
 #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
 
@@ -130,6 +131,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	int intent_to_add = *(int *)data;
 
 	for (i = 0; i < q->nr; i++) {
+		int pos;
 		struct diff_filespec *one = q->queue[i]->one;
 		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
 		struct cache_entry *ce;
@@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 
 		ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path,
 				      0, 0);
+
+		/*
+		 * If the file 1) corresponds to an existing index entry with
+		 * skip-worktree set, or 2) does not exist in the index but is
+		 * outside the sparse checkout definition, add a skip-worktree bit
+		 * to the new index entry.
+		 */
+		pos = cache_name_pos(one->path, strlen(one->path));
+		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
+		    (pos < 0 && !path_in_sparse_checkout(one->path, &the_index)))
+			ce->ce_flags |= CE_SKIP_WORKTREE;
+
 		if (!ce)
 			die(_("make_cache_entry failed for path '%s'"),
 			    one->path);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 886e78715fe..889079f55b8 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,26 +459,17 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-# in this scenario, but it shouldn't.
-test_expect_failure 'checkout and reset (mixed)' '
+test_expect_success 'checkout and reset (mixed)' '
 	init_repos &&
 
 	test_all_match git checkout -b reset-test update-deep &&
 	test_all_match git reset deepest &&
-	test_all_match git reset update-folder1 &&
-	test_all_match git reset update-folder2
-'
-
-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-# in this scenario, but it shouldn't.
-test_expect_success 'checkout and reset (mixed) [sparse]' '
-	init_repos &&
 
-	test_sparse_match git checkout -b reset-test update-deep &&
-	test_sparse_match git reset deepest &&
+	# Because skip-worktree is preserved, resetting to update-folder1
+	# will show worktree changes for full-checkout that are not present
+	# in sparse-checkout or sparse-index.
 	test_sparse_match git reset update-folder1 &&
-	test_sparse_match git reset update-folder2
+	run_on_sparse test_path_is_missing folder1
 '
 
 test_expect_success 'merge, cherry-pick, and rebase' '
diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh
index 601b2bf97f0..d05426062ec 100755
--- a/t/t7102-reset.sh
+++ b/t/t7102-reset.sh
@@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' '
 	test_cmp expect output
 '
 
+test_expect_success '--mixed preserves skip-worktree' '
+	echo 123 >>file2 &&
+	git add file2 &&
+	git update-index --skip-worktree file2 &&
+	git reset --mixed HEAD >output &&
+	test_must_be_empty output &&
+
+	cat >expect <<-\EOF &&
+	Unstaged changes after reset:
+	M	file2
+	EOF
+	git update-index --no-skip-worktree file2 &&
+	git add file2 &&
+	git reset --mixed HEAD >output &&
+	test_cmp expect output
+'
+
 test_expect_success 'resetting specific path that is unmerged' '
 	git rm --cached file2 &&
 	F1=$(git rev-parse HEAD:file1) &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset Kevin Willford via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-08  2:50       ` Bagas Sanjaya
  2021-10-08  5:24       ` Junio C Hamano
  2021-10-07 21:15     ` [PATCH v3 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 2 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add a new `--force-full-index` option to `git update-index`, which skips
explicitly setting `command_requires_full_index`. This option, intended for
use in internal testing purposes only, lets `git update-index` run as a
command without sparse index compatibility implemented, even after it
receives updates to otherwise use the sparse index.

The specific test `--force-full-index` is intended for - `t1092 -
sparse-index is expanded and converted back` - verifies index compatibility
in commands that do not change the default (enabled)
`command_requires_full_index` repo setting. In the past, the test used `git
reset`. However, as `reset` and other commands are integrated with the
sparse index, the command used in the test would need to keep changing.
Conversely, the `--force-full-index` option makes `git update-index` behave
like a not-yet-sparse-aware command, and can be used in the test
indefinitely without interfering with future sparse index integrations.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-update-index.txt       |  5 +++++
 builtin/update-index.c                   | 11 +++++++++++
 t/t1092-sparse-checkout-compatibility.sh |  2 +-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 2853f168d97..06255e321a3 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -24,6 +24,7 @@ SYNOPSIS
 	     [--[no-]fsmonitor]
 	     [--really-refresh] [--unresolve] [--again | -g]
 	     [--info-only] [--index-info]
+	     [--force-full-index]
 	     [-z] [--stdin] [--index-version <n>]
 	     [--verbose]
 	     [--] [<file>...]
@@ -170,6 +171,10 @@ time. Version 4 is relatively young (first released in 1.8.0 in
 October 2012). Other Git implementations such as JGit and libgit2
 may not support it yet.
 
+--force-full-index::
+	Force the command to operate on a full index, expanding a sparse
+	index if necessary.
+
 -z::
 	Only meaningful with `--stdin` or `--index-info`; paths are
 	separated with NUL character instead of LF.
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..32ada3ead77 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -964,6 +964,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	int split_index = -1;
 	int force_write = 0;
 	int fsmonitor = -1;
+	int use_default_full_index = 0;
 	struct lock_file lock_file = LOCK_INIT;
 	struct parse_opt_ctx_t ctx;
 	strbuf_getline_fn getline_fn;
@@ -1069,6 +1070,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		{OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL,
 			N_("clear fsmonitor valid bit"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
+		OPT_SET_INT(0, "force-full-index", &use_default_full_index,
+			N_("run with full index explicitly required"), 1),
 		OPT_END()
 	};
 
@@ -1082,6 +1085,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (newfd < 0)
 		lock_error = errno;
 
+	/*
+	 * If --force-full-index is set, the command should skip manually
+	 * setting `command_requires_full_index`.
+	 */
+	prepare_repo_settings(r);
+	if (!use_default_full_index)
+		r->settings.command_requires_full_index = 1;
+
 	entries = read_cache();
 	if (entries < 0)
 		die("cache corrupted");
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 889079f55b8..4aa4fef7b4f 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -635,7 +635,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
 	init_repos &&
 
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" reset --hard &&
+		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
 	test_region index convert_to_sparse trace2.txt &&
 	test_region index ensure_full_index trace2.txt
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 4/8] reset: expand test coverage for sparse checkouts
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-10-07 21:15     ` [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
pathspecs. New performance test cases exercise various execution paths for
`reset`.

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/perf/p2000-sparse-operations.sh        |  3 +
 t/t1092-sparse-checkout-compatibility.sh | 84 ++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 597626276fb..bfd332120c8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,5 +110,8 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all git reset
+test_perf_on_all git reset --hard
+test_perf_on_all git reset -- does-not-exist
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4aa4fef7b4f..875cdcb0495 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -472,6 +472,90 @@ test_expect_success 'checkout and reset (mixed)' '
 	run_on_sparse test_path_is_missing folder1
 '
 
+test_expect_success 'checkout and reset (merge)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --merge deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --merge deepest
+'
+
+test_expect_success 'checkout and reset (keep)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --keep deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --keep deepest
+'
+
+test_expect_success 'reset with pathspecs inside sparse definition' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents deep/a &&
+
+	test_all_match git reset base -- deep/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset base -- nonexistent-file &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset deepest -- deep &&
+	test_all_match git status --porcelain=v2
+'
+
+# Although the working tree differs between full and sparse checkouts after
+# reset, the state of the index is the same.
+test_expect_success 'reset with pathspecs outside sparse definition' '
+	init_repos &&
+	test_all_match git checkout -b reset-test base &&
+
+	test_sparse_match git reset update-folder1 -- folder1 &&
+	git -C full-checkout reset update-folder1 -- folder1 &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder1 &&
+
+	test_sparse_match git reset update-folder2 -- folder2/a &&
+	git -C full-checkout reset update-folder2 -- folder2/a &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder2/a
+'
+
+test_expect_success 'reset with wildcard pathspec' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset base -- \*/a &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder1/a &&
+
+	test_all_match git reset base -- folder\* &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder2
+'
+
 test_expect_success 'merge, cherry-pick, and rebase' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 5/8] reset: integrate with sparse index
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-10-07 21:15     ` [PATCH v3 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Disable `command_requires_full_index` repo setting and add
`ensure_full_index` guards around code paths that cannot yet use sparse
directory index entries. `reset --soft` does not modify the index, so no
compatibility changes are needed for it to function without expanding the
index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`),
the full index is expanded to prevent cache tree corruption and invalid
variable accesses.

Additionally, the `read_cache()` check verifying an uncorrupted index is
moved after argument parsing and preparing the repo settings. The index is
not used by the preceding argument handling, but `read_cache()` must be run
*after* enabling sparse index for the command (so that the index is not
expanded unnecessarily) and *before* using the index for reset (so that it
is verified as uncorrupted).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c | 10 +++++++---
 cache-tree.c    |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index e441b6601b9..0ac0de7dc97 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -180,6 +180,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.flags.override_submodule_config = 1;
 	opt.repo = the_repository;
 
+	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
@@ -257,9 +258,6 @@ static void parse_args(struct pathspec *pathspec,
 	}
 	*rev_ret = rev;
 
-	if (read_cache() < 0)
-		die(_("index file corrupt"));
-
 	parse_pathspec(pathspec, 0,
 		       PATHSPEC_PREFER_FULL |
 		       (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0),
@@ -405,6 +403,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
 	if (intent_to_add && reset_type != MIXED)
 		die(_("-N can only be used with --mixed"));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
+	if (read_cache() < 0)
+		die(_("index file corrupt"));
+
 	/* Soft reset does not touch the index file nor the working tree
 	 * at all, but requires them in a good order.  Other resets reset
 	 * the index file to the tree object we are switching to. */
diff --git a/cache-tree.c b/cache-tree.c
index 90919f9e345..9be19c85b66 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r,
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
+	ensure_full_index(istate);
 	prime_cache_tree_rec(r, istate->cache_tree, tree);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-10-07 21:15     ` [PATCH v3 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-08 11:09       ` Phillip Wood
  2021-10-07 21:15     ` [PATCH v3 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache-tree.c                             | 47 ++++++++++++++++++++++--
 cache.h                                  | 10 +++++
 read-cache.c                             | 27 ++++++++++----
 t/t1092-sparse-checkout-compatibility.sh | 15 +++++++-
 4 files changed, 86 insertions(+), 13 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 9be19c85b66..2866101052c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -740,15 +740,26 @@ out:
 	return ret;
 }
 
+static void prime_cache_tree_sparse_dir(struct cache_tree *it,
+					struct tree *tree)
+{
+
+	oidcpy(&it->oid, &tree->object.oid);
+	it->entry_count = 1;
+}
+
 static void prime_cache_tree_rec(struct repository *r,
 				 struct cache_tree *it,
-				 struct tree *tree)
+				 struct tree *tree,
+				 struct strbuf *tree_path)
 {
 	struct tree_desc desc;
 	struct name_entry entry;
 	int cnt;
+	int base_path_len = tree_path->len;
 
 	oidcpy(&it->oid, &tree->object.oid);
+
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
@@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r,
 		else {
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
+
 			if (!subtree->object.parsed)
 				parse_tree(subtree);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-			prime_cache_tree_rec(r, sub->cache_tree, subtree);
+
+			/*
+			 * Recursively-constructed subtree path is only needed when working
+			 * in a sparse index (where it's used to determine whether the
+			 * subtree is a sparse directory in the index).
+			 */
+			if (r->index->sparse_index) {
+				strbuf_setlen(tree_path, base_path_len);
+				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
+				strbuf_add(tree_path, entry.path, entry.pathlen);
+				strbuf_addch(tree_path, '/');
+			}
+
+			/*
+			 * If a sparse index is in use, the directory being processed may be
+			 * sparse. To confirm that, we can check whether an entry with that
+			 * exact name exists in the index. If it does, the created subtree
+			 * should be sparse. Otherwise, cache tree expansion should continue
+			 * as normal.
+			 */
+			if (r->index->sparse_index &&
+			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
+				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
+			else
+				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
 			cnt += sub->cache_tree->entry_count;
 		}
 	}
+
 	it->entry_count = cnt;
 }
 
@@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r,
 		      struct index_state *istate,
 		      struct tree *tree)
 {
+	struct strbuf tree_path = STRBUF_INIT;
+
 	trace2_region_enter("cache-tree", "prime_cache_tree", the_repository);
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
-	ensure_full_index(istate);
-	prime_cache_tree_rec(r, istate->cache_tree, tree);
+	prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path);
+	strbuf_release(&tree_path);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
 }
diff --git a/cache.h b/cache.h
index f6295f3b048..1d3e4665562 100644
--- a/cache.h
+++ b/cache.h
@@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  */
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
+/*
+ * Determines whether an entry with the given name exists within the
+ * given index. The return value is 1 if an exact match is found, otherwise
+ * it is 0. Note that, unlike index_name_pos, this function does not expand
+ * the index if it is sparse. If an item exists within the full index but it
+ * is contained within a sparse directory (and not in the sparse index), 0 is
+ * returned.
+ */
+int index_entry_exists(struct index_state *, const char *name, int namelen);
+
 /*
  * Some functions return the negative complement of an insert position when a
  * precise match was not found but a position was found where the entry would
diff --git a/read-cache.c b/read-cache.c
index f5d4385c408..c079ece981a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -68,6 +68,11 @@
  */
 #define CACHE_ENTRY_PATH_LENGTH 80
 
+enum index_search_mode {
+	NO_EXPAND_SPARSE = 0,
+	EXPAND_SPARSE = 1
+};
+
 static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len)
 {
 	struct cache_entry *ce;
@@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate,
+				const char *name, int namelen,
+				int stage,
+				enum index_search_mode search_mode)
 {
 	int first, last;
 
@@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		first = next+1;
 	}
 
-	if (istate->sparse_index &&
+	if (search_mode == EXPAND_SPARSE && istate->sparse_index &&
 	    first > 0) {
 		/* Note: first <= istate->cache_nr */
 		struct cache_entry *ce = istate->cache[first - 1];
@@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		    ce_namelen(ce) < namelen &&
 		    !strncmp(name, ce->name, ce_namelen(ce))) {
 			ensure_full_index(istate);
-			return index_name_stage_pos(istate, name, namelen, stage);
+			return index_name_stage_pos(istate, name, namelen, stage, search_mode);
 		}
 	}
 
@@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 
 int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
-	return index_name_stage_pos(istate, name, namelen, 0);
+	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
+}
+
+int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+{
+	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
 }
 
 int remove_index_entry_at(struct index_state *istate, int pos)
@@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate,
 			 */
 		}
 
-		pos = index_name_stage_pos(istate, name, len, stage);
+		pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE);
 		if (pos >= 0) {
 			/*
 			 * Found one, but not so fast.  This could
@@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
 		pos = index_pos_to_insert_pos(istate->cache_nr);
 	else
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
@@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		if (!ok_to_replace)
 			return error(_("'%s' appears as both a file and as a directory"),
 				     ce->name);
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
 		pos = -pos-1;
 	}
 	return pos + 1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 875cdcb0495..4ac93874cb2 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -756,9 +756,9 @@ test_expect_success 'sparse-index is not expanded' '
 	ensure_not_expanded checkout - &&
 	ensure_not_expanded switch rename-out-to-out &&
 	ensure_not_expanded switch - &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
 
 	echo >>sparse-index/README.md &&
@@ -768,6 +768,17 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	ensure_not_expanded add . &&
 
+	for ref in update-deep update-folder1 update-folder2 update-deep
+	do
+		echo >>sparse-index/README.md &&
+		ensure_not_expanded reset --hard $ref || return 1
+	done &&
+
+	ensure_not_expanded reset --hard update-deep &&
+	ensure_not_expanded reset --keep base &&
+	ensure_not_expanded reset --merge update-deep &&
+	ensure_not_expanded reset --hard &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 7/8] reset: make --mixed sparse-aware
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-10-07 21:15     ` [PATCH v3 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-07 21:15     ` [PATCH v3 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  8 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Remove the `ensure_full_index` guard on `read_from_tree` and update `git
reset --mixed` to ensure it can use sparse directory index entries wherever
possible. Sparse directory entries are reset use `diff_tree_oid`, which
requires `change` and `add_remove` functions to process the internal
contents of the sparse directory. The `recursive` diff option handles cases
in which `reset --mixed` must diff/merge files that are nested multiple
levels deep in a sparse directory.

The use of pathspecs with `git reset --mixed` introduces scenarios in which
internal contents of sparse directories may be matched by the pathspec. In
order to reset *all* files in the repo that may match the pathspec, the
following conditions on the pathspec require index expansion before
performing the reset:

* "magic" pathspecs
* wildcard pathspecs that do not match only in-cone files or entire sparse
  directories
* literal pathspecs matching something outside the sparse checkout
  definition

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 78 +++++++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 17 ++++++
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 0ac0de7dc97..60517e7e1d6 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 		 * If the file 1) corresponds to an existing index entry with
 		 * skip-worktree set, or 2) does not exist in the index but is
 		 * outside the sparse checkout definition, add a skip-worktree bit
-		 * to the new index entry.
+		 * to the new index entry. Note that a sparse index will be expanded
+		 * if this entry is outside the sparse cone - this is necessary
+		 * to properly construct the reset sparse directory.
 		 */
 		pos = cache_name_pos(one->path, strlen(one->path));
 		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
@@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	}
 }
 
+static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
+{
+	unsigned int i, pos;
+	int res = 0;
+	char *skip_worktree_seen = NULL;
+
+	/*
+	 * When using a magic pathspec, assume for the sake of simplicity that
+	 * the index needs to be expanded to match all matchable files.
+	 */
+	if (pathspec->magic)
+		return 1;
+
+	for (i = 0; i < pathspec->nr; i++) {
+		struct pathspec_item item = pathspec->items[i];
+
+		/*
+		 * If the pathspec item has a wildcard, the index should be expanded
+		 * if the pathspec has the possibility of matching a subset of entries inside
+		 * of a sparse directory (but not the entire directory).
+		 *
+		 * If the pathspec item is a literal path, the index only needs to be expanded
+		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
+		 * expand for in-cone files) and b) it doesn't match any sparse directories
+		 * (since we can reset whole sparse directories without expanding them).
+		 */
+		if (item.nowildcard_len < item.len) {
+			for (pos = 0; pos < active_nr; pos++) {
+				struct cache_entry *ce = active_cache[pos];
+
+				if (!S_ISSPARSEDIR(ce->ce_mode))
+					continue;
+
+				/*
+				 * If the pre-wildcard length is longer than the sparse
+				 * directory name and the sparse directory is the first
+				 * component of the pathspec, need to expand the index.
+				 */
+				if (item.nowildcard_len > ce_namelen(ce) &&
+				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
+					res = 1;
+					break;
+				}
+
+				/*
+				 * If the pre-wildcard length is shorter than the sparse
+				 * directory and the pathspec does not match the whole
+				 * directory, need to expand the index.
+				 */
+				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
+				    wildmatch(item.original, ce->name, 0)) {
+					res = 1;
+					break;
+				}
+			}
+		} else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
+			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
+			res = 1;
+
+		if (res > 0)
+			break;
+	}
+
+	free(skip_worktree_seen);
+	return res;
+}
+
 static int read_from_tree(const struct pathspec *pathspec,
 			  struct object_id *tree_oid,
 			  int intent_to_add)
@@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.format_callback = update_index_from_diff;
 	opt.format_callback_data = &intent_to_add;
 	opt.flags.override_submodule_config = 1;
+	opt.flags.recursive = 1;
 	opt.repo = the_repository;
+	opt.change = diff_change;
+	opt.add_remove = diff_addremove;
+
+	if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
+		ensure_full_index(&the_index);
 
-	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 4ac93874cb2..c9343ff5b9c 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -774,11 +774,28 @@ test_expect_success 'sparse-index is not expanded' '
 		ensure_not_expanded reset --hard $ref || return 1
 	done &&
 
+	ensure_not_expanded reset --mixed base &&
 	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded reset --keep base &&
 	ensure_not_expanded reset --merge update-deep &&
 	ensure_not_expanded reset --hard &&
 
+	ensure_not_expanded reset base -- deep/a &&
+	ensure_not_expanded reset base -- nonexistent-file &&
+	ensure_not_expanded reset deepest -- deep &&
+
+	# Although folder1 is outside the sparse definition, it exists as a
+	# directory entry in the index, so the pathspec will not force the
+	# index to be expanded.
+	ensure_not_expanded reset deepest -- folder1 &&
+	ensure_not_expanded reset deepest -- folder1/ &&
+
+	# Wildcard identifies only in-cone files, no index expansion
+	ensure_not_expanded reset deepest -- deep/\* &&
+
+	# Wildcard identifies only full sparse directories, no index expansion
+	ensure_not_expanded reset deepest -- folder\* &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v3 8/8] unpack-trees: improve performance of next_cache_entry
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-10-07 21:15     ` [PATCH v3 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
@ 2021-10-07 21:15     ` Victoria Dye via GitGitGadget
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  8 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-07 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

To find the first non-unpacked cache entry, `next_cache_entry` iterates
through index, starting at `cache_bottom`. The performance of this in full
indexes is helped by `cache_bottom` advancing with each invocation of
`mark_ce_used` (called by `unpack_index_entry`). However, the presence of
sparse directories can prevent the `cache_bottom` from advancing in a sparse
index case, effectively forcing `next_cache_entry` to search from the
beginning of the index each time it is called.

The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
(unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the
benefit `cache_bottom` provides in non-sparse index cases, a separate `hint`
position indicates the first position `next_cache_entry` should search,
updated each execution with a new position.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 8ea0a542da8..b94733de6be 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }
 
-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;
 
+	if (*hint > pos)
+		pos = *hint;
+
 	while (pos < index->cache_nr) {
 		struct cache_entry *ce = index->cache[pos];
-		if (!(ce->ce_flags & CE_UNPACKED))
+		if (!(ce->ce_flags & CE_UNPACKED)) {
+			*hint = pos + 1;
 			return ce;
+		}
 		pos++;
 	}
+
+	*hint = pos;
 	return NULL;
 }
 
@@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Are we supposed to look at the index too? */
 	if (o->merge) {
+		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;
 
 			if (o->diff_index_cached)
-				ce = next_cache_entry(o);
+				ce = next_cache_entry(o, &hint);
 			else
 				ce = find_cache_entry(info, p);
 
@@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *,
 int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
 {
 	struct repository *repo = the_repository;
-	int i, ret;
+	int i, hint, ret;
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
@@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
+			hint = -1;
+
 			/*
 			 * Unpack existing index entries that sort before the
 			 * prefix the tree is spliced into.  Note that o->merge
 			 * is always true in this case.
 			 */
 			while (1) {
-				struct cache_entry *ce = next_cache_entry(o);
+				struct cache_entry *ce = next_cache_entry(o, &hint);
 				if (!ce)
 					break;
 				if (ce_in_traverse_path(ce, &info))
@@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	/* Any left-over entries in the index? */
 	if (o->merge) {
+		hint = -1;
 		while (1) {
-			struct cache_entry *ce = next_cache_entry(o);
+			struct cache_entry *ce = next_cache_entry(o, &hint);
 			if (!ce)
 				break;
 			if (unpack_index_entry(ce, o) < 0)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-07 21:15     ` [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-10-08  2:50       ` Bagas Sanjaya
  2021-10-08  5:24       ` Junio C Hamano
  1 sibling, 0 replies; 85+ messages in thread
From: Bagas Sanjaya @ 2021-10-08  2:50 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau,
	Ævar Arnfjörð Bjarmason, Victoria Dye

On 08/10/21 04.15, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add a new `--force-full-index` option to `git update-index`, which skips
> explicitly setting `command_requires_full_index`. This option, intended for
> use in internal testing purposes only, lets `git update-index` run as a
> command without sparse index compatibility implemented, even after it
> receives updates to otherwise use the sparse index.
> 
> The specific test `--force-full-index` is intended for - `t1092 -
> sparse-index is expanded and converted back` - verifies index compatibility
> in commands that do not change the default (enabled)
> `command_requires_full_index` repo setting. In the past, the test used `git
> reset`. However, as `reset` and other commands are integrated with the
> sparse index, the command used in the test would need to keep changing.
> Conversely, the `--force-full-index` option makes `git update-index` behave
> like a not-yet-sparse-aware command, and can be used in the test
> indefinitely without interfering with future sparse index integrations.
> 
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Victoria Dye <vdye@github.com>

Grammar looks OK.

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test
  2021-10-06 20:40       ` Victoria Dye
@ 2021-10-08  3:42         ` Elijah Newren
  2021-10-08 17:11           ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Elijah Newren @ 2021-10-08  3:42 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, Git Mailing List, Derrick Stolee,
	Junio C Hamano, Taylor Blau, Bagas Sanjaya

On Wed, Oct 6, 2021 at 1:40 PM Victoria Dye <vdye@github.com> wrote:
>
> Elijah Newren wrote:
> > On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Victoria Dye <vdye@github.com>
> >>
> >> Add a new `--force-full-index` option to `git update-index`, which skips
> >> explicitly setting `command_requires_full_index`. This lets `git
> >> update-index --force-full-index` run as a command without sparse index
> >> compatibility implemented, even after it receives sparse index compatibility
> >> updates.
> >>
> >> By using `git update-index --force-full-index` in the `t1092` test
> >> `sparse-index is expanded and converted back`, commands can continue to
> >> integrate with the sparse index without the need to keep modifying the
> >> command used in the test.
> >
> > So...we're adding a permanent user-facing command line flag, whose
> > purpose is just to help us with the transition work of implementing
> > sparse indexes everywhere?  Am I reading that right, or is that just
> > the reason for t1092 and there are more reasons for it elsewhere?
> >
> > Also, I'm curious if update-index is the right place to add this.  If
> > you don't want a sparse index anymore, wouldn't a user want to run
> >    git sparse-checkout disable
> > ?  Or is the point that you do want to keep the sparse checkout, but
> > you just don't want the index to also be sparse?  Still, even in that
> > case, it seems like adding a subcommand or flag to an existing
> > sparse-checkout subcommand would feel more natural, since
> > sparse-checkout is the command the user uses to request to get into a
> > sparse-checkout and sparse index.
> >
>
> This came out of a conversation [1] on an earlier version of this patch.
> Because the `t1092 - sparse-index is expanded and converted back` test
> verifies sparse index compatibility (i.e., expand the index when reading,
> collapse back to sparse when writing) on commands that don't have any sparse
> index integration, it needed to be changed from `git reset` to something
> else. However, as we keep integrating commands with sparse index we'd need
> to keep changing the command in the test, creating a bunch of patches doing
> effectively the same thing for no long-term benefit.
>
> The `--force-full-index` flag isn't meant to be used externally or modify
> the index in any "new" way - it's really just a "test" version of `git
> update-index` that we guarantee will accurately represent a command using
> the default settings. Right now, it does exactly what `git update-index`
> (without the flag) does, and will only behave differently once `git
> update-index` is integrated with sparse index. Using `--force-full-index`,
> the test won't need to be regularly updated and will continue to catch
> errors like:
>
> 1. Changing the default value of `command_requires_full_index` to 0
> 2. Not expanding a sparse index to full when `command_requires_full_index`
>    is 1
> 3. Not collapsing the index back to sparse if sparse index is enabled
>
> I see the issue of introducing a test-only option (when sparse index is
> integrated everywhere, shouldn't it be deprecated?). If there's a way to
> make this more obviously internal/temporary, I'm happy to modify it. Or, if
> semi-frequent updates of the command in the test aren't a huge issue, I can
> revert to V1.

If it's a test-only capability you need, I'd say add it under
t/helpers/ somewhere, either a new flag for an existing subcommand of
test-tool, or a new subcommand for test-tool.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-07 21:15     ` [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
  2021-10-08  2:50       ` Bagas Sanjaya
@ 2021-10-08  5:24       ` Junio C Hamano
  2021-10-08 15:47         ` Victoria Dye
  1 sibling, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-10-08  5:24 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * If --force-full-index is set, the command should skip manually
> +	 * setting `command_requires_full_index`.
> +	 */

Hmph, doesn't that feel unnaturally backwards, though?

The settings.command_requires_full_index bit forces read-cache to
call ensure_full_index() immediately after the in-core index is read
from the disk.  If we are forcing operating on the full index, I'd
imagine that we'd be making sure that ensure_full_index() to be
called.

I do not see anything in the code that ensures active_cache_changed
to be flipped on.  So the new test that says

    git -C sparse-index -c core.fsmonitor="" update-index --force-full-index

may not call ensure_full_index(), but because nothing marks
the_index as changed, I think we won't call write_locked_index() at
the end of cmd_update_index().  IOW, what we have in the test patch
may be an expensive noop, no?

Or perhaps I am reading the patch completely incorrectly.  I dunno.

> +	prepare_repo_settings(r);
> +	if (!use_default_full_index)
> +		r->settings.command_requires_full_index = 1;
> +
>  	entries = read_cache();
>  	if (entries < 0)
>  		die("cache corrupted");
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 889079f55b8..4aa4fef7b4f 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -635,7 +635,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
>  	init_repos &&
>  
>  	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> -		git -C sparse-index -c core.fsmonitor="" reset --hard &&
> +		git -C sparse-index -c core.fsmonitor="" update-index --force-full-index &&
>  	test_region index convert_to_sparse trace2.txt &&
>  	test_region index ensure_full_index trace2.txt
>  '

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset
  2021-10-07 21:15     ` [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset Kevin Willford via GitGitGadget
@ 2021-10-08  9:04       ` Junio C Hamano
  0 siblings, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-08  9:04 UTC (permalink / raw)
  To: Kevin Willford via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye,
	Kevin Willford

"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>  
>  		ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path,
>  				      0, 0);
> +
> +		/*
> +		 * If the file 1) corresponds to an existing index entry with
> +		 * skip-worktree set, or 2) does not exist in the index but is
> +		 * outside the sparse checkout definition, add a skip-worktree bit
> +		 * to the new index entry.
> +		 */
> +		pos = cache_name_pos(one->path, strlen(one->path));
> +		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
> +		    (pos < 0 && !path_in_sparse_checkout(one->path, &the_index)))
> +			ce->ce_flags |= CE_SKIP_WORKTREE;

OK.  Nicely explained.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-07 21:15     ` [PATCH v3 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
@ 2021-10-08 11:09       ` Phillip Wood
  2021-10-08 17:14         ` Victoria Dye
  0 siblings, 1 reply; 85+ messages in thread
From: Phillip Wood @ 2021-10-08 11:09 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Victoria Dye

Hi Victoria

On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Remove `ensure_full_index` guard on `prime_cache_tree` and update
> `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
> the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
> must determine whether a directory entry is sparse or not by searching for
> it in the index (*without* expanding the index). If a matching sparse
> directory index entry is found, no subtrees are added to the cache tree
> entry and the entry count is set to 1 (representing the sparse directory
> itself). Otherwise, the tree is assumed to not be sparse and its subtrees
> are recursively added to the cache tree.

I was looking at the callers to prime_cache_tree() this morning and 
would like to suggest an alternative approach - just delete 
prime_cache_tree() and all of its callers! As far as I can see it is 
only ever called after a successful call to unpack_trees() and since 
52fca2184d ("unpack-trees: populate cache-tree on successful merge", 
2015-07-28) unpack_trees() updates the cache tree for the caller. All 
the call sites are pretty obvious apart from the one in 
t/help/test-fast-rebase.c where unpack_trees() is called by 
merge_switch_to_result()

Best Wishes

Phillip

> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>   cache-tree.c                             | 47 ++++++++++++++++++++++--
>   cache.h                                  | 10 +++++
>   read-cache.c                             | 27 ++++++++++----
>   t/t1092-sparse-checkout-compatibility.sh | 15 +++++++-
>   4 files changed, 86 insertions(+), 13 deletions(-)
> 
> diff --git a/cache-tree.c b/cache-tree.c
> index 9be19c85b66..2866101052c 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -740,15 +740,26 @@ out:
>   	return ret;
>   }
>   
> +static void prime_cache_tree_sparse_dir(struct cache_tree *it,
> +					struct tree *tree)
> +{
> +
> +	oidcpy(&it->oid, &tree->object.oid);
> +	it->entry_count = 1;
> +}
> +
>   static void prime_cache_tree_rec(struct repository *r,
>   				 struct cache_tree *it,
> -				 struct tree *tree)
> +				 struct tree *tree,
> +				 struct strbuf *tree_path)
>   {
>   	struct tree_desc desc;
>   	struct name_entry entry;
>   	int cnt;
> +	int base_path_len = tree_path->len;
>   
>   	oidcpy(&it->oid, &tree->object.oid);
> +
>   	init_tree_desc(&desc, tree->buffer, tree->size);
>   	cnt = 0;
>   	while (tree_entry(&desc, &entry)) {
> @@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r,
>   		else {
>   			struct cache_tree_sub *sub;
>   			struct tree *subtree = lookup_tree(r, &entry.oid);
> +
>   			if (!subtree->object.parsed)
>   				parse_tree(subtree);
>   			sub = cache_tree_sub(it, entry.path);
>   			sub->cache_tree = cache_tree();
> -			prime_cache_tree_rec(r, sub->cache_tree, subtree);
> +
> +			/*
> +			 * Recursively-constructed subtree path is only needed when working
> +			 * in a sparse index (where it's used to determine whether the
> +			 * subtree is a sparse directory in the index).
> +			 */
> +			if (r->index->sparse_index) {
> +				strbuf_setlen(tree_path, base_path_len);
> +				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
> +				strbuf_add(tree_path, entry.path, entry.pathlen);
> +				strbuf_addch(tree_path, '/');
> +			}
> +
> +			/*
> +			 * If a sparse index is in use, the directory being processed may be
> +			 * sparse. To confirm that, we can check whether an entry with that
> +			 * exact name exists in the index. If it does, the created subtree
> +			 * should be sparse. Otherwise, cache tree expansion should continue
> +			 * as normal.
> +			 */
> +			if (r->index->sparse_index &&
> +			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
> +				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
> +			else
> +				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
>   			cnt += sub->cache_tree->entry_count;
>   		}
>   	}
> +
>   	it->entry_count = cnt;
>   }
>   
> @@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r,
>   		      struct index_state *istate,
>   		      struct tree *tree)
>   {
> +	struct strbuf tree_path = STRBUF_INIT;
> +
>   	trace2_region_enter("cache-tree", "prime_cache_tree", the_repository);
>   	cache_tree_free(&istate->cache_tree);
>   	istate->cache_tree = cache_tree();
>   
> -	ensure_full_index(istate);
> -	prime_cache_tree_rec(r, istate->cache_tree, tree);
> +	prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path);
> +	strbuf_release(&tree_path);
>   	istate->cache_changed |= CACHE_TREE_CHANGED;
>   	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
>   }
> diff --git a/cache.h b/cache.h
> index f6295f3b048..1d3e4665562 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
>    */
>   int index_name_pos(struct index_state *, const char *name, int namelen);
>   
> +/*
> + * Determines whether an entry with the given name exists within the
> + * given index. The return value is 1 if an exact match is found, otherwise
> + * it is 0. Note that, unlike index_name_pos, this function does not expand
> + * the index if it is sparse. If an item exists within the full index but it
> + * is contained within a sparse directory (and not in the sparse index), 0 is
> + * returned.
> + */
> +int index_entry_exists(struct index_state *, const char *name, int namelen);
> +
>   /*
>    * Some functions return the negative complement of an insert position when a
>    * precise match was not found but a position was found where the entry would
> diff --git a/read-cache.c b/read-cache.c
> index f5d4385c408..c079ece981a 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -68,6 +68,11 @@
>    */
>   #define CACHE_ENTRY_PATH_LENGTH 80
>   
> +enum index_search_mode {
> +	NO_EXPAND_SPARSE = 0,
> +	EXPAND_SPARSE = 1
> +};
> +
>   static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len)
>   {
>   	struct cache_entry *ce;
> @@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
>   	return 0;
>   }
>   
> -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
> +static int index_name_stage_pos(struct index_state *istate,
> +				const char *name, int namelen,
> +				int stage,
> +				enum index_search_mode search_mode)
>   {
>   	int first, last;
>   
> @@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>   		first = next+1;
>   	}
>   
> -	if (istate->sparse_index &&
> +	if (search_mode == EXPAND_SPARSE && istate->sparse_index &&
>   	    first > 0) {
>   		/* Note: first <= istate->cache_nr */
>   		struct cache_entry *ce = istate->cache[first - 1];
> @@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>   		    ce_namelen(ce) < namelen &&
>   		    !strncmp(name, ce->name, ce_namelen(ce))) {
>   			ensure_full_index(istate);
> -			return index_name_stage_pos(istate, name, namelen, stage);
> +			return index_name_stage_pos(istate, name, namelen, stage, search_mode);
>   		}
>   	}
>   
> @@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
>   
>   int index_name_pos(struct index_state *istate, const char *name, int namelen)
>   {
> -	return index_name_stage_pos(istate, name, namelen, 0);
> +	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
> +}
> +
> +int index_entry_exists(struct index_state *istate, const char *name, int namelen)
> +{
> +	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
>   }
>   
>   int remove_index_entry_at(struct index_state *istate, int pos)
> @@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate,
>   			 */
>   		}
>   
> -		pos = index_name_stage_pos(istate, name, len, stage);
> +		pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE);
>   		if (pos >= 0) {
>   			/*
>   			 * Found one, but not so fast.  This could
> @@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>   		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
>   		pos = index_pos_to_insert_pos(istate->cache_nr);
>   	else
> -		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> +		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
>   
>   	/* existing match? Just replace it. */
>   	if (pos >= 0) {
> @@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>   		if (!ok_to_replace)
>   			return error(_("'%s' appears as both a file and as a directory"),
>   				     ce->name);
> -		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> +		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
>   		pos = -pos-1;
>   	}
>   	return pos + 1;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 875cdcb0495..4ac93874cb2 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -756,9 +756,9 @@ test_expect_success 'sparse-index is not expanded' '
>   	ensure_not_expanded checkout - &&
>   	ensure_not_expanded switch rename-out-to-out &&
>   	ensure_not_expanded switch - &&
> -	git -C sparse-index reset --hard &&
> +	ensure_not_expanded reset --hard &&
>   	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
> -	git -C sparse-index reset --hard &&
> +	ensure_not_expanded reset --hard &&
>   	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
>   
>   	echo >>sparse-index/README.md &&
> @@ -768,6 +768,17 @@ test_expect_success 'sparse-index is not expanded' '
>   	echo >>sparse-index/untracked.txt &&
>   	ensure_not_expanded add . &&
>   
> +	for ref in update-deep update-folder1 update-folder2 update-deep
> +	do
> +		echo >>sparse-index/README.md &&
> +		ensure_not_expanded reset --hard $ref || return 1
> +	done &&
> +
> +	ensure_not_expanded reset --hard update-deep &&
> +	ensure_not_expanded reset --keep base &&
> +	ensure_not_expanded reset --merge update-deep &&
> +	ensure_not_expanded reset --hard &&
> +
>   	ensure_not_expanded checkout -f update-deep &&
>   	test_config -C sparse-index pull.twohead ort &&
>   	(
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-08  5:24       ` Junio C Hamano
@ 2021-10-08 15:47         ` Victoria Dye
  2021-10-08 17:19           ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Victoria Dye @ 2021-10-08 15:47 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	/*
>> +	 * If --force-full-index is set, the command should skip manually
>> +	 * setting `command_requires_full_index`.
>> +	 */
> 
> Hmph, doesn't that feel unnaturally backwards, though?
> 
> The settings.command_requires_full_index bit forces read-cache to
> call ensure_full_index() immediately after the in-core index is read
> from the disk.  If we are forcing operating on the full index, I'd
> imagine that we'd be making sure that ensure_full_index() to be
> called.
> 

I tried coming up with a user-facing name that wasn't too focused on the
internal implementation, but it ends up being misleading. The intention was
to have this be a variation of `git update-index` that uses the default
setting for `command_requires_full_index` but then proceeds to read and
write the index as `update-index` normally would. Something like
`--use-default-index-sparsity` might have been more accurate?

> I do not see anything in the code that ensures active_cache_changed
> to be flipped on.  So the new test that says
> 
>     git -C sparse-index -c core.fsmonitor="" update-index --force-full-index
> 
> may not call ensure_full_index(), but because nothing marks
> the_index as changed, I think we won't call write_locked_index() at
> the end of cmd_update_index().  IOW, what we have in the test patch
> may be an expensive noop, no?
> 

In the test's use-case, `active_cache_changed` ends up set to
`CACHE_TREE_CHANGED`, which forces writing the index. It is still
effectively a no-op, but it serves the needs of the test.

In any case, Elijah suggested using a `test-tool` subcommand for this
purpose [1], which I think is more appropriate overall. Something like
`test-tool read-write-cache` can be implemented to make no mention of
`command_requires_full_index` (therefore using its default value) and force
a basic read & write of the index. It also eliminates the issue of having a
user-facing name at all, and can easily be removed once all sparse index
integrations are done.

[1] https://lore.kernel.org/git/CABPp-BF+bEUcyE0N79uRCkpCayJx_NMqOpnMSHHrpJM5a9hAWw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test
  2021-10-08  3:42         ` Elijah Newren
@ 2021-10-08 17:11           ` Junio C Hamano
  0 siblings, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-08 17:11 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Victoria Dye, Victoria Dye via GitGitGadget, Git Mailing List,
	Derrick Stolee, Taylor Blau, Bagas Sanjaya

Elijah Newren <newren@gmail.com> writes:

>> I see the issue of introducing a test-only option (when sparse index is
>> integrated everywhere, shouldn't it be deprecated?). If there's a way to
>> make this more obviously internal/temporary, I'm happy to modify it. Or, if
>> semi-frequent updates of the command in the test aren't a huge issue, I can
>> revert to V1.
>
> If it's a test-only capability you need, I'd say add it under
> t/helpers/ somewhere, either a new flag for an existing subcommand of
> test-tool, or a new subcommand for test-tool.

Is the ability to force expanding to full index completely useless
in the field?  For diagnosing breakage the end-users may see in the
wild, or perhaps in a specialist usecase for whatever reason working
on full index is preferable and the user may want to force it once
to correct an earlier mistake to enable sparse-index before toggling
the configuration off, or something?

If we do not foresee any such reason, I'd agree it is good to move
that to t/helpers/; otherwise, I think update-index is as good as
any other place, and the option will sit well next to other options
like "--[no-]skip-worktree", "--[no-]assume-unchanged".  It would
most likely need to be used together with "--force-write-index" (or
be made to imply the latter) to be useful, I suspect.

Thanks.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-08 11:09       ` Phillip Wood
@ 2021-10-08 17:14         ` Victoria Dye
  2021-10-08 18:31           ` Junio C Hamano
  2021-10-12 10:17           ` Phillip Wood
  0 siblings, 2 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-08 17:14 UTC (permalink / raw)
  To: phillip.wood, Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Phillip Wood wrote:
> Hi Victoria
> 
> On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Remove `ensure_full_index` guard on `prime_cache_tree` and update
>> `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
>> the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
>> must determine whether a directory entry is sparse or not by searching for
>> it in the index (*without* expanding the index). If a matching sparse
>> directory index entry is found, no subtrees are added to the cache tree
>> entry and the entry count is set to 1 (representing the sparse directory
>> itself). Otherwise, the tree is assumed to not be sparse and its subtrees
>> are recursively added to the cache tree.
> 
> I was looking at the callers to prime_cache_tree() this morning and would like to suggest an alternative approach - just delete prime_cache_tree() and all of its callers! As far as I can see it is only ever called after a successful call to unpack_trees() and since 52fca2184d ("unpack-trees: populate cache-tree on successful merge", 2015-07-28) unpack_trees() updates the cache tree for the caller. All the call sites are pretty obvious apart from the one in t/help/test-fast-rebase.c where unpack_trees() is called by merge_switch_to_result()
> 

It looks like `prime_cache_tree` can be removed mostly without issue, but
it causes the two last tests in `t4058-diff-duplicates.sh` to fail. Those
tests document failure cases when dealing with duplicate tree entries [1],
and it looks like `prime_cache_tree` was creating the appearance of a
fully-reset index but was still leaving it in a state where subsequent
operations could fail.

I'm inclined to say the solution here would be to update the tests to
document the "new" failure behavior and proceed with removing
`prime_cache_tree`, because:

* the test using `git reset --hard` disables `GIT_TEST_CHECK_CACHE_TREE`,
  indicating that `prime_cache_tree` already wasn't behaving correctly
* attempting to fix the overarching issues with duplicate tree entries will
  substantially delay this patch series
* a duplicate entry fix is largely unrelated to the intended scope of the
  series

Another option would be to leave `prime_cache_tree` as it is, but with it
being apparently useless outside of mostly-broken use cases in `t4058`, it
seems like a waste to keep it around.

[1] ac14de13b2 (t4058: explore duplicate tree entry handling in a bit more detail, 2020-12-11)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-08 15:47         ` Victoria Dye
@ 2021-10-08 17:19           ` Junio C Hamano
  2021-10-11 14:12             ` Derrick Stolee
  0 siblings, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-10-08 17:19 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, stolee, newren, Taylor Blau,
	Bagas Sanjaya, Ævar Arnfjörð Bjarmason

Victoria Dye <vdye@github.com> writes:

> I tried coming up with a user-facing name that wasn't too focused on the
> internal implementation, but it ends up being misleading. The intention was
> to have this be a variation of `git update-index` that uses the default
> setting for `command_requires_full_index` but then proceeds to read and
> write the index as `update-index` normally would. Something like
> `--use-default-index-sparsity` might have been more accurate?

The option name in the reviewed patch does imply "we force expanding
to full" and not "use the default", so it probably needs renaming,
if we want the "use the default" semantics.  But is that useful in
the context of the test you are using it in place of "reset" or "mv"?
Even if the default is somehow flipped to use sparse always, wouldn't
the particular test want the index expanded?  I dunno.

> In the test's use-case, `active_cache_changed` ends up set to
> `CACHE_TREE_CHANGED`, which forces writing the index. It is still
> effectively a no-op, but it serves the needs of the test.

Ah, cache-tree is updated, then it's OK.

As to test-tool vs end-user-accessible-command, I do not have a
strong opinion, but use your imagination and ask Derrick or somebody
else for their imagination to see if such a "force expand" feature
may be something the end-users might need an access to in order to
dig themselves out of a hole (in which case, it may be better to
make it end-user-accessible) or not (in which case, test-tool is
more appropriate).

Thanks.



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-08 17:14         ` Victoria Dye
@ 2021-10-08 18:31           ` Junio C Hamano
  2021-10-09 11:18             ` Phillip Wood
  2021-10-12 10:17           ` Phillip Wood
  1 sibling, 1 reply; 85+ messages in thread
From: Junio C Hamano @ 2021-10-08 18:31 UTC (permalink / raw)
  To: Victoria Dye
  Cc: phillip.wood, Victoria Dye via GitGitGadget, git, stolee, newren,
	Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Victoria Dye <vdye@github.com> writes:

> Phillip Wood wrote:

>> I was looking at the callers to prime_cache_tree() this morning
>> and would like to suggest an alternative approach - just delete
>> prime_cache_tree() and all of its callers!

Do you mean the calls added by new patches without understanding
what they are doing, or all calls to it?

Every time you update a path in the index from the working tree
(e.g. "git add") and other sources, the directory in the cache-tree
that includes the path is invalidated, and the surviving subtrees of
cache-tree is used to speed up writing the index as a tree object,
doing "diff-index --cached" (hence "git status"), etc.  So over
time, the cache-tree "degrades" as you muck with the index entries.

When you write out the index as a tree, we by definition have to
know the object names of all the tree objects that correspond to
each directory in the index.  A fully valid cache-tree is saved when
it happens, so the above process can start over.

There are cases other than "git write-tree" that we can cheaply
learn the object names of all the tree objects that correspond to
each directory in the index.  When we read the index from an
existing tree object, we know which tree (and its subtrees) we
populated the index from, so we can salvage a degraded cache-tree.

"reset --hard" and "reset --mixed" may be good opportunities, so is
"checkout <branch>" that starts from a clean index.  And cache tree
priming is a mechanism to take advantage of such an opportunity.

The cache-tree does not have to be primed and all you lose is
performance, so priming can be removed mostly "without an issue", if
you are not paying attention to cache-tree degradation.  Priming
with incorrect data, however, would leave permanent damage by
writing a wrong tree via "git write-tree" (hence "git commit") and
showing a wrong diff via "git diff-index [--cached]" (hence "git
status" and probably "git add -- <pathspec>"), so not priming is
safer than priming incorrectly.

HTH.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-08 18:31           ` Junio C Hamano
@ 2021-10-09 11:18             ` Phillip Wood
  2021-10-10 22:03               ` Junio C Hamano
  0 siblings, 1 reply; 85+ messages in thread
From: Phillip Wood @ 2021-10-09 11:18 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye
  Cc: phillip.wood, Victoria Dye via GitGitGadget, git, stolee, newren,
	Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

On 08/10/2021 19:31, Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> Phillip Wood wrote:
> 
>>> I was looking at the callers to prime_cache_tree() this morning
>>> and would like to suggest an alternative approach - just delete
>>> prime_cache_tree() and all of its callers!
> 
> Do you mean the calls added by new patches without understanding
> what they are doing, or all calls to it?

I mean all calls to prime_cache_tree() after having understood (or at 
least thinking that I understand) what they are doing. As I tried to 
explain in the part of my message that you have cut

(a) a successful call to unpack_trees() updates the cache tree

(b) all the existing calls to prime_cache_tree() follow a successful 
call to unpack_trees() and nothing touches in index in between the call 
to unpack_trees() and prime_cache_tree().

Maybe I've misunderstood something but that leads me believe those calls 
can be removed without degrading performance.

Best Wishes

Phillip

> Every time you update a path in the index from the working tree
> (e.g. "git add") and other sources, the directory in the cache-tree
> that includes the path is invalidated, and the surviving subtrees of
> cache-tree is used to speed up writing the index as a tree object,
> doing "diff-index --cached" (hence "git status"), etc.  So over
> time, the cache-tree "degrades" as you muck with the index entries.
> 
> When you write out the index as a tree, we by definition have to
> know the object names of all the tree objects that correspond to
> each directory in the index.  A fully valid cache-tree is saved when
> it happens, so the above process can start over.
> 
> There are cases other than "git write-tree" that we can cheaply
> learn the object names of all the tree objects that correspond to
> each directory in the index.  When we read the index from an
> existing tree object, we know which tree (and its subtrees) we
> populated the index from, so we can salvage a degraded cache-tree.
> 
> "reset --hard" and "reset --mixed" may be good opportunities, so is
> "checkout <branch>" that starts from a clean index.  And cache tree
> priming is a mechanism to take advantage of such an opportunity.
> 
> The cache-tree does not have to be primed and all you lose is
> performance, so priming can be removed mostly "without an issue", if
> you are not paying attention to cache-tree degradation.  Priming
> with incorrect data, however, would leave permanent damage by
> writing a wrong tree via "git write-tree" (hence "git commit") and
> showing a wrong diff via "git diff-index [--cached]" (hence "git
> status" and probably "git add -- <pathspec>"), so not priming is
> safer than priming incorrectly.
> 
> HTH.
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-09 11:18             ` Phillip Wood
@ 2021-10-10 22:03               ` Junio C Hamano
  2021-10-11 15:55                 ` Victoria Dye
  2021-10-12 10:16                 ` Phillip Wood
  0 siblings, 2 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-10 22:03 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Victoria Dye, phillip.wood, Victoria Dye via GitGitGadget, git,
	stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Phillip Wood <phillip.wood123@gmail.com> writes:

> On 08/10/2021 19:31, Junio C Hamano wrote:
>> Victoria Dye <vdye@github.com> writes:
>> 
>>> Phillip Wood wrote:
>> 
>>>> I was looking at the callers to prime_cache_tree() this morning
>>>> and would like to suggest an alternative approach - just delete
>>>> prime_cache_tree() and all of its callers!
>> Do you mean the calls added by new patches without understanding
>> what they are doing, or all calls to it?
>
> I mean all calls to prime_cache_tree() after having understood (or at
> least thinking that I understand) what they are doing.

Sorry, my statement was confusingly written.  I meant "calls added
by new patches, written by those who do not understand what
prime_cache_tree() calls are doing", but after re-reading it, I
think it could be taken to be referring to "you may be commenting
without understanding what prime_cache_tree() calls are doing",
which wasn't my intention.

> (a) a successful call to unpack_trees() updates the cache tree
>
> (b) all the existing calls to prime_cache_tree() follow a successful
> call to unpack_trees() and nothing touches in index in between the
> call to unpack_trees() and prime_cache_tree().

Ahh, OK.

I think we originally avoided calling cache_tree_update() lightly
(because it is essentially a "write-tree", a fairly heavy-weight
operation, without I/O) and instead relied on prime_cache_tree() to
get degraded cache-tree back into freshness.

What I forgot was that 52fca218 (unpack-trees: populate cache-tree
on successful merge, 2015-07-28) added cache_tree_update() there at
the end of unpack_trees().  The commit covers quite a wide range of
operations---the log message says "merge", but in fact anything that
uses unpack_trees() including branch switching and the resetting of
the index are affected, and they cause a full reconstruction of the
cache tree by calling cache_tree_update().

For most callers of prime_cache_tree(), like the ones in "git
read-tree" and "git reset", it is immediately obvious that we just
read from the same tree, and we should have everything from the tree
and nothing else in the resulting index, so it is clear that the
prime_cache_tree() call is recreating the same cache-tree
information that we already should have computed ourselves, and
these calls can go (or if "prime" is still cheaper than "update",
these callers can pass an option to tell unpack_trees() to skip the
cache_tree_update() call, because they will call "prime" immediately
after).

For other callers it is not immediately obvious, but I trust you are
correctly reading the code ;-)

Thanks.




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-08 17:19           ` Junio C Hamano
@ 2021-10-11 14:12             ` Derrick Stolee
  2021-10-11 15:05               ` Victoria Dye
  2021-10-11 15:24               ` Junio C Hamano
  0 siblings, 2 replies; 85+ messages in thread
From: Derrick Stolee @ 2021-10-11 14:12 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, newren, Taylor Blau,
	Bagas Sanjaya, Ævar Arnfjörð Bjarmason

On 10/8/21 1:19 PM, Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>> I tried coming up with a user-facing name that wasn't too focused on the
>> internal implementation, but it ends up being misleading. The intention was
>> to have this be a variation of `git update-index` that uses the default
>> setting for `command_requires_full_index` but then proceeds to read and
>> write the index as `update-index` normally would. Something like
>> `--use-default-index-sparsity` might have been more accurate?
> 
> The option name in the reviewed patch does imply "we force expanding
> to full" and not "use the default", so it probably needs renaming,
> if we want the "use the default" semantics.  But is that useful in
> the context of the test you are using it in place of "reset" or "mv"?
> Even if the default is somehow flipped to use sparse always, wouldn't
> the particular test want the index expanded?  I dunno.
> 
>> In the test's use-case, `active_cache_changed` ends up set to
>> `CACHE_TREE_CHANGED`, which forces writing the index. It is still
>> effectively a no-op, but it serves the needs of the test.
> 
> Ah, cache-tree is updated, then it's OK.
> 
> As to test-tool vs end-user-accessible-command, I do not have a
> strong opinion, but use your imagination and ask Derrick or somebody
> else for their imagination to see if such a "force expand" feature
> may be something the end-users might need an access to in order to
> dig themselves out of a hole (in which case, it may be better to
> make it end-user-accessible) or not (in which case, test-tool is
> more appropriate).

I think there is something to be said about the name being confusing,
because the current implementation focuses on "expand a sparse index
upon read" but it also allows the index to be written as sparse.

Conversely, if the user runs

  git -c index.sparse=false update-index ...

then the index.sparse config setting forbids conversion from full to
sparse, but does not say anything about expanding to full.

Perhaps this should be corrected: the index.sparse=false setting
should expand a sparse index to a full one, then prevent it from
being converted to a sparse one on write.

This diff should do it:

diff --git a/read-cache.c b/read-cache.c
index 564283c7e7e..04df1051e18 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2376,7 +2376,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (!istate->repo)
 		istate->repo = the_repository;
 	prepare_repo_settings(istate->repo);
-	if (istate->repo->settings.command_requires_full_index)
+	if (!istate->repo->settings.sparse_index ||
+	    istate->repo->settings.command_requires_full_index)
 		ensure_full_index(istate);
 
 	return istate->cache_nr;

Victoria, what are your thoughts about including such a change?

Junio, would it be better to change the config setting, and then
update this test to use the config setting over a command-line flag?
This would allow us to punt on the --force-full-index flag until we
have time to focus on the 'git update-index' command and interactions
like this.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-11 14:12             ` Derrick Stolee
@ 2021-10-11 15:05               ` Victoria Dye
  2021-10-11 15:24               ` Junio C Hamano
  1 sibling, 0 replies; 85+ messages in thread
From: Victoria Dye @ 2021-10-11 15:05 UTC (permalink / raw)
  To: Derrick Stolee, Junio C Hamano
  Cc: Victoria Dye via GitGitGadget, git, newren, Taylor Blau,
	Bagas Sanjaya, Ævar Arnfjörð Bjarmason

Derrick Stolee wrote:
> On 10/8/21 1:19 PM, Junio C Hamano wrote:
>> Victoria Dye <vdye@github.com> writes:
>>
>>> I tried coming up with a user-facing name that wasn't too focused on the
>>> internal implementation, but it ends up being misleading. The intention was
>>> to have this be a variation of `git update-index` that uses the default
>>> setting for `command_requires_full_index` but then proceeds to read and
>>> write the index as `update-index` normally would. Something like
>>> `--use-default-index-sparsity` might have been more accurate?
>>
>> The option name in the reviewed patch does imply "we force expanding
>> to full" and not "use the default", so it probably needs renaming,
>> if we want the "use the default" semantics.  But is that useful in
>> the context of the test you are using it in place of "reset" or "mv"?
>> Even if the default is somehow flipped to use sparse always, wouldn't
>> the particular test want the index expanded?  I dunno.
>>
>>> In the test's use-case, `active_cache_changed` ends up set to
>>> `CACHE_TREE_CHANGED`, which forces writing the index. It is still
>>> effectively a no-op, but it serves the needs of the test.
>>
>> Ah, cache-tree is updated, then it's OK.
>>
>> As to test-tool vs end-user-accessible-command, I do not have a
>> strong opinion, but use your imagination and ask Derrick or somebody
>> else for their imagination to see if such a "force expand" feature
>> may be something the end-users might need an access to in order to
>> dig themselves out of a hole (in which case, it may be better to
>> make it end-user-accessible) or not (in which case, test-tool is
>> more appropriate).
> 
> I think there is something to be said about the name being confusing,
> because the current implementation focuses on "expand a sparse index
> upon read" but it also allows the index to be written as sparse.
> 

This helps clarify what I was misinterpreting in the test. It isn't looking
for "default" behavior, it's verifying whether trace2 logs capture index
expansion and collapse when those operations are expected to happen,
regardless of whether that's because `command_requires_full_index` is 1 or
because the command needs to use entries inside of sparse directories. With
that interpretation, I can replace the command with `git reset
update-folder1 -- folder1/a` and get the same result (without needing to
change the test in the future *or* add a new `git` command option /
`test-tool` subcommand).

> Conversely, if the user runs
> 
>   git -c index.sparse=false update-index ...
> 
> then the index.sparse config setting forbids conversion from full to
> sparse, but does not say anything about expanding to full.
> 
> Perhaps this should be corrected: the index.sparse=false setting
> should expand a sparse index to a full one, then prevent it from
> being converted to a sparse one on write.
> 
> This diff should do it:
> 
> diff --git a/read-cache.c b/read-cache.c
> index 564283c7e7e..04df1051e18 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2376,7 +2376,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  	if (!istate->repo)
>  		istate->repo = the_repository;
>  	prepare_repo_settings(istate->repo);
> -	if (istate->repo->settings.command_requires_full_index)
> +	if (!istate->repo->settings.sparse_index ||
> +	    istate->repo->settings.command_requires_full_index)
>  		ensure_full_index(istate);
>  
>  	return istate->cache_nr;
> 
> Victoria, what are your thoughts about including such a change?
> 

I think this is a worthwhile change, but I'd prefer submitting it separately
(either in an upcoming sparse index integration or on its own). It's not
directly needed by anything in this series, and I'd like to avoid adding
features to the scope if possible.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test
  2021-10-11 14:12             ` Derrick Stolee
  2021-10-11 15:05               ` Victoria Dye
@ 2021-10-11 15:24               ` Junio C Hamano
  1 sibling, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-11 15:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Victoria Dye, Victoria Dye via GitGitGadget, git, newren,
	Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Derrick Stolee <stolee@gmail.com> writes:

> Junio, would it be better to change the config setting, and then
> update this test to use the config setting over a command-line flag?
> This would allow us to punt on the --force-full-index flag until we
> have time to focus on the 'git update-index' command and interactions
> like this.

I do not have a strong opinion on where we add the feature; as long
as we have a way to let us avoid having to unnecessarily change this
particular test, that's perfectly fine, and if we can reuse it as a
way for end-users to help those who are debugging their issues, that
would be an added bonus.

Thanks.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-10 22:03               ` Junio C Hamano
@ 2021-10-11 15:55                 ` Victoria Dye
  2021-10-11 16:16                   ` Junio C Hamano
  2021-10-12 10:16                 ` Phillip Wood
  1 sibling, 1 reply; 85+ messages in thread
From: Victoria Dye @ 2021-10-11 15:55 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood
  Cc: phillip.wood, Victoria Dye via GitGitGadget, git, stolee, newren,
	Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Junio C Hamano wrote:
> For most callers of prime_cache_tree(), like the ones in "git
> read-tree" and "git reset", it is immediately obvious that we just
> read from the same tree, and we should have everything from the tree
> and nothing else in the resulting index, so it is clear that the
> prime_cache_tree() call is recreating the same cache-tree
> information that we already should have computed ourselves, and
> these calls can go (or if "prime" is still cheaper than "update",
> these callers can pass an option to tell unpack_trees() to skip the
> cache_tree_update() call, because they will call "prime" immediately
> after).
> 

After some basic performance testing of `git reset [--hard]`, it's not clear
whether `cache_tree_update` is definitively faster or slower than
`prime_cache_tree`; more conclusive results would indicate which of the two
could be skipped. I'd like to defer this to a future patch (tracking it with
an internal issue so I don't forget) where I can perform a more thorough
analysis across all of the commands currently using `prime_cache_tree` and
update its usage accordingly.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-11 15:55                 ` Victoria Dye
@ 2021-10-11 16:16                   ` Junio C Hamano
  0 siblings, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-11 16:16 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Phillip Wood, phillip.wood, Victoria Dye via GitGitGadget, git,
	stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Victoria Dye <vdye@github.com> writes:

> After some basic performance testing of `git reset [--hard]`, it's not clear
> whether `cache_tree_update` is definitively faster or slower than
> `prime_cache_tree`; more conclusive results would indicate which of the two
> could be skipped. I'd like to defer this to a future patch (tracking it with
> an internal issue so I don't forget) where I can perform a more thorough
> analysis across all of the commands currently using `prime_cache_tree` and
> update its usage accordingly.

Yup.  That sounds sensible.  Concentrating on correctness first is a
good direction to go.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 0/8] Sparse Index: integrate with reset
  2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-10-07 21:15     ` [PATCH v3 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
@ 2021-10-11 20:30     ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
                         ` (7 more replies)
  8 siblings, 8 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye

This series integrates the sparse index with git reset and provides
miscellaneous fixes and improvements to the command in sparse checkouts.
This includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the command
 2. repository settings to enable the sparse index with ensure_full_index
    guarding any code paths that break tests without other compatibility
    updates.
 3. modifications to remove or reduce the scope in which ensure_full_index
    must be called.

The sparse index updates are predicated on a fix originating from the
microsoft/git fork [1], correcting how git reset --mixed handles resetting
entries outside the sparse checkout definition. Additionally, a performance
"bug" in next_cache_entry with sparse index is corrected, preventing
repeatedly looping over already-searched entries.

The p2000 tests demonstrate a ~70% execution time reduction in git reset
using a sparse index, and no change (within expected variability [2]) using
a full index. Results summarized below [3, 4]:

Test                           base              [5/8]                 
-----------------------------------------------------------------------
git reset --hard (full-v3)     1.00(0.50+0.39)   0.97(0.50+0.37) -3.0% 
git reset --hard (full-v4)     1.00(0.51+0.38)   0.96(0.50+0.36) -4.0% 
git reset --hard (sparse-v3)   1.68(1.17+0.39)   1.37(0.91+0.35) -18.5%
git reset --hard (sparse-v4)   1.70(1.18+0.40)   1.41(0.94+0.35) -17.1%

Test                           base              [6/8]   
-----------------------------------------------------------------------
git reset --hard (full-v3)     1.00(0.50+0.39)   0.94(0.48+0.34) -6.0% 
git reset --hard (full-v4)     1.00(0.51+0.38)   0.95(0.51+0.34) -5.0% 
git reset --hard (sparse-v3)   1.68(1.17+0.39)   0.46(0.05+0.29) -72.6%
git reset --hard (sparse-v4)   1.70(1.18+0.40)   0.46(0.06+0.29) -72.9%

Test                               base              [7/8]
---------------------------------------------------------------------------
git reset (full-v3)                0.77(0.27+0.37)   0.72(0.26+0.32) -6.5%
git reset (full-v4)                0.75(0.27+0.34)   0.73(0.26+0.32) -2.7%
git reset (sparse-v3)              1.44(0.96+0.36)   0.43(0.04+0.96) -70.1%
git reset (sparse-v4)              1.46(0.97+0.36)   0.43(0.05+0.79) -70.5%
git reset -- missing (full-v3)     0.72(0.26+0.32)   0.69(0.26+0.30) -4.2%
git reset -- missing (full-v4)     0.74(0.28+0.33)   0.71(0.27+0.32) -4.1% 
git reset -- missing (sparse-v3)   1.45(0.97+0.35)   0.81(0.42+0.90) -44.1%
git reset -- missing (sparse-v4)   1.41(0.94+0.34)   0.79(0.42+0.76) -44.0%

Test                               base              [8/8]            
---------------------------------------------------------------------------
git reset -- missing (full-v3)     0.72(0.26+0.32)   0.73(0.26+0.33) +1.4% 
git reset -- missing (full-v4)     0.74(0.28+0.33)   0.74(0.27+0.32) +0.0% 
git reset -- missing (sparse-v3)   1.45(0.97+0.35)   0.43(0.05+0.80) -70.3%
git reset -- missing (sparse-v4)   1.41(0.94+0.34)   0.44(0.05+0.76) -68.8%



Changes since V1
================

 * Add --force-full-index option to update-index. The option is used
   circumvent changing command_requires_full_index from its default value -
   right now this is effectively a no-op, but will change once update-index
   is integrated with sparse index. By using this option in the t1092
   expand/collapse test, the command used to test will not need to be
   updated with subsequent sparse index integrations.
 * Update implementation of mixed reset for entries outside sparse checkout
   definition. The condition in which a file should be checked out before
   index reset is simplified to "if it has skip-worktree enabled and a reset
   would change the file, check it out".
   * After checking the behavior of update_index_from_diff with renames,
     found that the diff used by reset does not produce diff queue entries
     with different pathnames for one and two. Because of this, and that
     nothing in the implementation seems to rely on identical path names, no
     BUG check is added.
 * Correct a bug in the sparse index is not expanded tests in t1092 where
   failure of a git reset --mixed test was not being reported. Test now
   verifies an appropriate scenario with corrected failure-checking.


Changes since V2
================

 * Replace patch adding checkouts for git reset --mixed with sparse checkout
   with preserving the skip-worktree flag (including a new test for git
   reset --mixed and update to t1092 - checkout and reset (mixed))
 * Move rename of is_missing into its own patch
 * Further extend t1092 tests and remove unnecessary commands/tests where
   possible
 * Refine logic determining which pathspecs require ensure_full_index in git
   reset --mixed, add related ensure_not_expanded tests
 * Add index_search_mode enum to index_name_stage_pos
 * Clean up variable usage & remove unnecessary subtree_path in
   prime_cache_tree_rec
 * Update cover letter performance data
 * More thoroughly explain changes in each commit message


Changes since V3
================

 * Replace git update-index --force-full-index with git reset update-folder1
   -- folder1/a, remove introduction of new --force-full-index option
   entirely, and add comment clarifying the intent of sparse-index is
   expanded and converted back test
 * Fix authorship on reset: preserve skip-worktree bit in mixed reset
   (current patch fully replaces original patch, but metadata of the
   original wasn't properly replaced)

Thanks! -Victoria

[1] microsoft@6b8a074 [2]
https://lore.kernel.org/git/8b9fe3f8-f0e3-4567-b20b-17c92bd1a5c5@github.com/
[3] If a test and/or commit is not mentioned, there is no significant change
to performance [4] Pathspec "does-not-exist" is changed to "missing" to save
space in performance report

Victoria Dye (8):
  reset: rename is_missing to !is_in_reset_tree
  reset: preserve skip-worktree bit in mixed reset
  sparse-index: update command for expand/collapse test
  reset: expand test coverage for sparse checkouts
  reset: integrate with sparse index
  reset: make sparse-aware (except --mixed)
  reset: make --mixed sparse-aware
  unpack-trees: improve performance of next_cache_entry

 builtin/reset.c                          | 104 ++++++++++++++++-
 cache-tree.c                             |  46 +++++++-
 cache.h                                  |  10 ++
 read-cache.c                             |  27 +++--
 t/perf/p2000-sparse-operations.sh        |   3 +
 t/t1092-sparse-checkout-compatibility.sh | 137 ++++++++++++++++++++---
 t/t7102-reset.sh                         |  17 +++
 unpack-trees.c                           |  23 +++-
 8 files changed, 330 insertions(+), 37 deletions(-)


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1048

Range-diff vs v3:

 1:  ad7013a31aa = 1:  ad7013a31aa reset: rename is_missing to !is_in_reset_tree
 2:  1f6da84830b ! 2:  bd72bd175da reset: preserve skip-worktree bit in mixed reset
     @@
       ## Metadata ##
     -Author: Kevin Willford <kewillf@microsoft.com>
     +Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
          reset: preserve skip-worktree bit in mixed reset
 3:  014a408ea5d < -:  ----------- update-index: add --force-full-index option for expand/collapse test
 -:  ----------- > 3:  c4df0d6b136 sparse-index: update command for expand/collapse test
 4:  7f21cf53e9d = 4:  cfbb23e9fe2 reset: expand test coverage for sparse checkouts
 5:  a2d6212e287 = 5:  62fdbf2ad26 reset: integrate with sparse index
 6:  330e0c09774 = 6:  b0d437207e7 reset: make sparse-aware (except --mixed)
 7:  6ef8e4e31d3 = 7:  00d14fb60bd reset: make --mixed sparse-aware
 8:  c7145e039f3 = 8:  e523dadb8bf unpack-trees: improve performance of next_cache_entry

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 1/8] reset: rename is_missing to !is_in_reset_tree
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 2/8] reset: preserve skip-worktree bit in mixed reset Victoria Dye via GitGitGadget
                         ` (6 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Rename and invert value of `is_missing` to `is_in_reset_tree` to make the
variable more descriptive of what it represents.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 51c9e2f43ff..d3695ce43c4 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -131,10 +131,10 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filespec *one = q->queue[i]->one;
-		int is_missing = !(one->mode && !is_null_oid(&one->oid));
+		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
 		struct cache_entry *ce;
 
-		if (is_missing && !intent_to_add) {
+		if (!is_in_reset_tree && !intent_to_add) {
 			remove_file_from_cache(one->path);
 			continue;
 		}
@@ -144,7 +144,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 		if (!ce)
 			die(_("make_cache_entry failed for path '%s'"),
 			    one->path);
-		if (is_missing) {
+		if (!is_in_reset_tree) {
 			ce->ce_flags |= CE_INTENT_TO_ADD;
 			set_object_name_for_intent_to_add_entry(ce);
 		}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 2/8] reset: preserve skip-worktree bit in mixed reset
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 3/8] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Change `update_index_from_diff` to set `skip-worktree` when applicable for
new index entries. When `git reset --mixed <tree-ish>` is run, entries in
the index with differences between the pre-reset HEAD and reset <tree-ish>
are identified and handled with `update_index_from_diff`. For each file, a
new cache entry in inserted into the index, created from the <tree-ish> side
of the reset (without changing the working tree). However, the newly-created
entry must have `skip-worktree` explicitly set in either of the following
scenarios:

1. the file is in the current index and has `skip-worktree` set
2. the file is not in the current index but is outside of a defined sparse
   checkout definition

Not setting the `skip-worktree` bit leads to likely-undesirable results for
a user. It causes `skip-worktree` settings to disappear on the
"diff"-containing files (but *only* the diff-containing files), leading to
those files now showing modifications in `git status`. For example, when
running `git reset --mixed` in a sparse checkout, some file entries outside
of sparse checkout could show up as deleted, despite the user never deleting
anything (and not wanting them on-disk anyway).

Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved
in a basic `git reset --mixed` scenario and update a failure-documenting
test from 19a0acc (t1092: test interesting sparse-checkout scenarios,
2021-01-23) with new expected behavior.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 14 ++++++++++++++
 t/t1092-sparse-checkout-compatibility.sh | 19 +++++--------------
 t/t7102-reset.sh                         | 17 +++++++++++++++++
 3 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index d3695ce43c4..e441b6601b9 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -25,6 +25,7 @@
 #include "cache-tree.h"
 #include "submodule.h"
 #include "submodule-config.h"
+#include "dir.h"
 
 #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000)
 
@@ -130,6 +131,7 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	int intent_to_add = *(int *)data;
 
 	for (i = 0; i < q->nr; i++) {
+		int pos;
 		struct diff_filespec *one = q->queue[i]->one;
 		int is_in_reset_tree = one->mode && !is_null_oid(&one->oid);
 		struct cache_entry *ce;
@@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 
 		ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path,
 				      0, 0);
+
+		/*
+		 * If the file 1) corresponds to an existing index entry with
+		 * skip-worktree set, or 2) does not exist in the index but is
+		 * outside the sparse checkout definition, add a skip-worktree bit
+		 * to the new index entry.
+		 */
+		pos = cache_name_pos(one->path, strlen(one->path));
+		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
+		    (pos < 0 && !path_in_sparse_checkout(one->path, &the_index)))
+			ce->ce_flags |= CE_SKIP_WORKTREE;
+
 		if (!ce)
 			die(_("make_cache_entry failed for path '%s'"),
 			    one->path);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 886e78715fe..889079f55b8 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -459,26 +459,17 @@ test_expect_failure 'blame with pathspec outside sparse definition' '
 	test_all_match git blame deep/deeper2/deepest/a
 '
 
-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-# in this scenario, but it shouldn't.
-test_expect_failure 'checkout and reset (mixed)' '
+test_expect_success 'checkout and reset (mixed)' '
 	init_repos &&
 
 	test_all_match git checkout -b reset-test update-deep &&
 	test_all_match git reset deepest &&
-	test_all_match git reset update-folder1 &&
-	test_all_match git reset update-folder2
-'
-
-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
-# in this scenario, but it shouldn't.
-test_expect_success 'checkout and reset (mixed) [sparse]' '
-	init_repos &&
 
-	test_sparse_match git checkout -b reset-test update-deep &&
-	test_sparse_match git reset deepest &&
+	# Because skip-worktree is preserved, resetting to update-folder1
+	# will show worktree changes for full-checkout that are not present
+	# in sparse-checkout or sparse-index.
 	test_sparse_match git reset update-folder1 &&
-	test_sparse_match git reset update-folder2
+	run_on_sparse test_path_is_missing folder1
 '
 
 test_expect_success 'merge, cherry-pick, and rebase' '
diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh
index 601b2bf97f0..d05426062ec 100755
--- a/t/t7102-reset.sh
+++ b/t/t7102-reset.sh
@@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' '
 	test_cmp expect output
 '
 
+test_expect_success '--mixed preserves skip-worktree' '
+	echo 123 >>file2 &&
+	git add file2 &&
+	git update-index --skip-worktree file2 &&
+	git reset --mixed HEAD >output &&
+	test_must_be_empty output &&
+
+	cat >expect <<-\EOF &&
+	Unstaged changes after reset:
+	M	file2
+	EOF
+	git update-index --no-skip-worktree file2 &&
+	git add file2 &&
+	git reset --mixed HEAD >output &&
+	test_cmp expect output
+'
+
 test_expect_success 'resetting specific path that is unmerged' '
 	git rm --cached file2 &&
 	F1=$(git rev-parse HEAD:file1) &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 3/8] sparse-index: update command for expand/collapse test
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 2/8] reset: preserve skip-worktree bit in mixed reset Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

In anticipation of `git reset --hard` being able to use the sparse index
without expanding it, replace the command in `sparse-index is expanded and
converted back` with `git reset -- folder1/a`. This command will need to
expand the index to work properly, even after integrating the rest of
`reset` with sparse index.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 889079f55b8..e1422797013 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -631,11 +631,15 @@ test_expect_success 'submodule handling' '
 	grep "160000 commit $(git -C initial-repo rev-parse HEAD)	modules/sub" cache
 '
 
+# When working with a sparse index, some commands will need to expand the
+# index to operate properly. If those commands also write the index back
+# to disk, they need to convert the index to sparse before writing.
+# This test verifies that both of these events are logged in trace2 logs.
 test_expect_success 'sparse-index is expanded and converted back' '
 	init_repos &&
 
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
-		git -C sparse-index -c core.fsmonitor="" reset --hard &&
+		git -C sparse-index reset -- folder1/a &&
 	test_region index convert_to_sparse trace2.txt &&
 	test_region index ensure_full_index trace2.txt
 '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 4/8] reset: expand test coverage for sparse checkouts
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-10-11 20:30       ` [PATCH v4 3/8] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
                         ` (3 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
pathspecs. New performance test cases exercise various execution paths for
`reset`.

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/perf/p2000-sparse-operations.sh        |  3 +
 t/t1092-sparse-checkout-compatibility.sh | 84 ++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 597626276fb..bfd332120c8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,5 +110,8 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all git reset
+test_perf_on_all git reset --hard
+test_perf_on_all git reset -- does-not-exist
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e1422797013..535686a2954 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -472,6 +472,90 @@ test_expect_success 'checkout and reset (mixed)' '
 	run_on_sparse test_path_is_missing folder1
 '
 
+test_expect_success 'checkout and reset (merge)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --merge deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --merge deepest
+'
+
+test_expect_success 'checkout and reset (keep)' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents a &&
+	test_all_match git reset --keep deepest &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard update-deep &&
+	run_on_all ../edit-contents deep/a &&
+	test_all_match test_must_fail git reset --keep deepest
+'
+
+test_expect_success 'reset with pathspecs inside sparse definition' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	test_all_match git checkout -b reset-test update-deep &&
+	run_on_all ../edit-contents deep/a &&
+
+	test_all_match git reset base -- deep/a &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset base -- nonexistent-file &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset deepest -- deep &&
+	test_all_match git status --porcelain=v2
+'
+
+# Although the working tree differs between full and sparse checkouts after
+# reset, the state of the index is the same.
+test_expect_success 'reset with pathspecs outside sparse definition' '
+	init_repos &&
+	test_all_match git checkout -b reset-test base &&
+
+	test_sparse_match git reset update-folder1 -- folder1 &&
+	git -C full-checkout reset update-folder1 -- folder1 &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder1 &&
+
+	test_sparse_match git reset update-folder2 -- folder2/a &&
+	git -C full-checkout reset update-folder2 -- folder2/a &&
+	test_sparse_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder2/a
+'
+
+test_expect_success 'reset with wildcard pathspec' '
+	init_repos &&
+
+	test_all_match git checkout -b reset-test update-deep &&
+	test_all_match git reset base -- \*/a &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder1/a &&
+
+	test_all_match git reset base -- folder\* &&
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git rev-parse HEAD:folder2
+'
+
 test_expect_success 'merge, cherry-pick, and rebase' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 5/8] reset: integrate with sparse index
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-10-11 20:30       ` [PATCH v4 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
                         ` (2 subsequent siblings)
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Disable `command_requires_full_index` repo setting and add
`ensure_full_index` guards around code paths that cannot yet use sparse
directory index entries. `reset --soft` does not modify the index, so no
compatibility changes are needed for it to function without expanding the
index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`),
the full index is expanded to prevent cache tree corruption and invalid
variable accesses.

Additionally, the `read_cache()` check verifying an uncorrupted index is
moved after argument parsing and preparing the repo settings. The index is
not used by the preceding argument handling, but `read_cache()` must be run
*after* enabling sparse index for the command (so that the index is not
expanded unnecessarily) and *before* using the index for reset (so that it
is verified as uncorrupted).

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c | 10 +++++++---
 cache-tree.c    |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index e441b6601b9..0ac0de7dc97 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -180,6 +180,7 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.flags.override_submodule_config = 1;
 	opt.repo = the_repository;
 
+	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
@@ -257,9 +258,6 @@ static void parse_args(struct pathspec *pathspec,
 	}
 	*rev_ret = rev;
 
-	if (read_cache() < 0)
-		die(_("index file corrupt"));
-
 	parse_pathspec(pathspec, 0,
 		       PATHSPEC_PREFER_FULL |
 		       (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0),
@@ -405,6 +403,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix)
 	if (intent_to_add && reset_type != MIXED)
 		die(_("-N can only be used with --mixed"));
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
+	if (read_cache() < 0)
+		die(_("index file corrupt"));
+
 	/* Soft reset does not touch the index file nor the working tree
 	 * at all, but requires them in a good order.  Other resets reset
 	 * the index file to the tree object we are switching to. */
diff --git a/cache-tree.c b/cache-tree.c
index 90919f9e345..9be19c85b66 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r,
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
+	ensure_full_index(istate);
 	prime_cache_tree_rec(r, istate->cache_tree, tree);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 6/8] reset: make sparse-aware (except --mixed)
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-10-11 20:30       ` [PATCH v4 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache-tree.c                             | 47 ++++++++++++++++++++++--
 cache.h                                  | 10 +++++
 read-cache.c                             | 27 ++++++++++----
 t/t1092-sparse-checkout-compatibility.sh | 15 +++++++-
 4 files changed, 86 insertions(+), 13 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 9be19c85b66..2866101052c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -740,15 +740,26 @@ out:
 	return ret;
 }
 
+static void prime_cache_tree_sparse_dir(struct cache_tree *it,
+					struct tree *tree)
+{
+
+	oidcpy(&it->oid, &tree->object.oid);
+	it->entry_count = 1;
+}
+
 static void prime_cache_tree_rec(struct repository *r,
 				 struct cache_tree *it,
-				 struct tree *tree)
+				 struct tree *tree,
+				 struct strbuf *tree_path)
 {
 	struct tree_desc desc;
 	struct name_entry entry;
 	int cnt;
+	int base_path_len = tree_path->len;
 
 	oidcpy(&it->oid, &tree->object.oid);
+
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
@@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r,
 		else {
 			struct cache_tree_sub *sub;
 			struct tree *subtree = lookup_tree(r, &entry.oid);
+
 			if (!subtree->object.parsed)
 				parse_tree(subtree);
 			sub = cache_tree_sub(it, entry.path);
 			sub->cache_tree = cache_tree();
-			prime_cache_tree_rec(r, sub->cache_tree, subtree);
+
+			/*
+			 * Recursively-constructed subtree path is only needed when working
+			 * in a sparse index (where it's used to determine whether the
+			 * subtree is a sparse directory in the index).
+			 */
+			if (r->index->sparse_index) {
+				strbuf_setlen(tree_path, base_path_len);
+				strbuf_grow(tree_path, base_path_len + entry.pathlen + 1);
+				strbuf_add(tree_path, entry.path, entry.pathlen);
+				strbuf_addch(tree_path, '/');
+			}
+
+			/*
+			 * If a sparse index is in use, the directory being processed may be
+			 * sparse. To confirm that, we can check whether an entry with that
+			 * exact name exists in the index. If it does, the created subtree
+			 * should be sparse. Otherwise, cache tree expansion should continue
+			 * as normal.
+			 */
+			if (r->index->sparse_index &&
+			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
+				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
+			else
+				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
 			cnt += sub->cache_tree->entry_count;
 		}
 	}
+
 	it->entry_count = cnt;
 }
 
@@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r,
 		      struct index_state *istate,
 		      struct tree *tree)
 {
+	struct strbuf tree_path = STRBUF_INIT;
+
 	trace2_region_enter("cache-tree", "prime_cache_tree", the_repository);
 	cache_tree_free(&istate->cache_tree);
 	istate->cache_tree = cache_tree();
 
-	ensure_full_index(istate);
-	prime_cache_tree_rec(r, istate->cache_tree, tree);
+	prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path);
+	strbuf_release(&tree_path);
 	istate->cache_changed |= CACHE_TREE_CHANGED;
 	trace2_region_leave("cache-tree", "prime_cache_tree", the_repository);
 }
diff --git a/cache.h b/cache.h
index f6295f3b048..1d3e4665562 100644
--- a/cache.h
+++ b/cache.h
@@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  */
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
+/*
+ * Determines whether an entry with the given name exists within the
+ * given index. The return value is 1 if an exact match is found, otherwise
+ * it is 0. Note that, unlike index_name_pos, this function does not expand
+ * the index if it is sparse. If an item exists within the full index but it
+ * is contained within a sparse directory (and not in the sparse index), 0 is
+ * returned.
+ */
+int index_entry_exists(struct index_state *, const char *name, int namelen);
+
 /*
  * Some functions return the negative complement of an insert position when a
  * precise match was not found but a position was found where the entry would
diff --git a/read-cache.c b/read-cache.c
index f5d4385c408..c079ece981a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -68,6 +68,11 @@
  */
 #define CACHE_ENTRY_PATH_LENGTH 80
 
+enum index_search_mode {
+	NO_EXPAND_SPARSE = 0,
+	EXPAND_SPARSE = 1
+};
+
 static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len)
 {
 	struct cache_entry *ce;
@@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate,
+				const char *name, int namelen,
+				int stage,
+				enum index_search_mode search_mode)
 {
 	int first, last;
 
@@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		first = next+1;
 	}
 
-	if (istate->sparse_index &&
+	if (search_mode == EXPAND_SPARSE && istate->sparse_index &&
 	    first > 0) {
 		/* Note: first <= istate->cache_nr */
 		struct cache_entry *ce = istate->cache[first - 1];
@@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		    ce_namelen(ce) < namelen &&
 		    !strncmp(name, ce->name, ce_namelen(ce))) {
 			ensure_full_index(istate);
-			return index_name_stage_pos(istate, name, namelen, stage);
+			return index_name_stage_pos(istate, name, namelen, stage, search_mode);
 		}
 	}
 
@@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 
 int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
-	return index_name_stage_pos(istate, name, namelen, 0);
+	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
+}
+
+int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+{
+	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
 }
 
 int remove_index_entry_at(struct index_state *istate, int pos)
@@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate,
 			 */
 		}
 
-		pos = index_name_stage_pos(istate, name, len, stage);
+		pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE);
 		if (pos >= 0) {
 			/*
 			 * Found one, but not so fast.  This could
@@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
 		pos = index_pos_to_insert_pos(istate->cache_nr);
 	else
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
@@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		if (!ok_to_replace)
 			return error(_("'%s' appears as both a file and as a directory"),
 				     ce->name);
-		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
+		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE);
 		pos = -pos-1;
 	}
 	return pos + 1;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 535686a2954..78476de18ea 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -760,9 +760,9 @@ test_expect_success 'sparse-index is not expanded' '
 	ensure_not_expanded checkout - &&
 	ensure_not_expanded switch rename-out-to-out &&
 	ensure_not_expanded switch - &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
-	git -C sparse-index reset --hard &&
+	ensure_not_expanded reset --hard &&
 	ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
 
 	echo >>sparse-index/README.md &&
@@ -772,6 +772,17 @@ test_expect_success 'sparse-index is not expanded' '
 	echo >>sparse-index/untracked.txt &&
 	ensure_not_expanded add . &&
 
+	for ref in update-deep update-folder1 update-folder2 update-deep
+	do
+		echo >>sparse-index/README.md &&
+		ensure_not_expanded reset --hard $ref || return 1
+	done &&
+
+	ensure_not_expanded reset --hard update-deep &&
+	ensure_not_expanded reset --keep base &&
+	ensure_not_expanded reset --merge update-deep &&
+	ensure_not_expanded reset --hard &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 7/8] reset: make --mixed sparse-aware
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-10-11 20:30       ` [PATCH v4 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  2021-10-11 20:30       ` [PATCH v4 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

Remove the `ensure_full_index` guard on `read_from_tree` and update `git
reset --mixed` to ensure it can use sparse directory index entries wherever
possible. Sparse directory entries are reset use `diff_tree_oid`, which
requires `change` and `add_remove` functions to process the internal
contents of the sparse directory. The `recursive` diff option handles cases
in which `reset --mixed` must diff/merge files that are nested multiple
levels deep in a sparse directory.

The use of pathspecs with `git reset --mixed` introduces scenarios in which
internal contents of sparse directories may be matched by the pathspec. In
order to reset *all* files in the repo that may match the pathspec, the
following conditions on the pathspec require index expansion before
performing the reset:

* "magic" pathspecs
* wildcard pathspecs that do not match only in-cone files or entire sparse
  directories
* literal pathspecs matching something outside the sparse checkout
  definition

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/reset.c                          | 78 +++++++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 17 ++++++
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 0ac0de7dc97..60517e7e1d6 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 		 * If the file 1) corresponds to an existing index entry with
 		 * skip-worktree set, or 2) does not exist in the index but is
 		 * outside the sparse checkout definition, add a skip-worktree bit
-		 * to the new index entry.
+		 * to the new index entry. Note that a sparse index will be expanded
+		 * if this entry is outside the sparse cone - this is necessary
+		 * to properly construct the reset sparse directory.
 		 */
 		pos = cache_name_pos(one->path, strlen(one->path));
 		if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
@@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q,
 	}
 }
 
+static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
+{
+	unsigned int i, pos;
+	int res = 0;
+	char *skip_worktree_seen = NULL;
+
+	/*
+	 * When using a magic pathspec, assume for the sake of simplicity that
+	 * the index needs to be expanded to match all matchable files.
+	 */
+	if (pathspec->magic)
+		return 1;
+
+	for (i = 0; i < pathspec->nr; i++) {
+		struct pathspec_item item = pathspec->items[i];
+
+		/*
+		 * If the pathspec item has a wildcard, the index should be expanded
+		 * if the pathspec has the possibility of matching a subset of entries inside
+		 * of a sparse directory (but not the entire directory).
+		 *
+		 * If the pathspec item is a literal path, the index only needs to be expanded
+		 * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
+		 * expand for in-cone files) and b) it doesn't match any sparse directories
+		 * (since we can reset whole sparse directories without expanding them).
+		 */
+		if (item.nowildcard_len < item.len) {
+			for (pos = 0; pos < active_nr; pos++) {
+				struct cache_entry *ce = active_cache[pos];
+
+				if (!S_ISSPARSEDIR(ce->ce_mode))
+					continue;
+
+				/*
+				 * If the pre-wildcard length is longer than the sparse
+				 * directory name and the sparse directory is the first
+				 * component of the pathspec, need to expand the index.
+				 */
+				if (item.nowildcard_len > ce_namelen(ce) &&
+				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
+					res = 1;
+					break;
+				}
+
+				/*
+				 * If the pre-wildcard length is shorter than the sparse
+				 * directory and the pathspec does not match the whole
+				 * directory, need to expand the index.
+				 */
+				if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
+				    wildmatch(item.original, ce->name, 0)) {
+					res = 1;
+					break;
+				}
+			}
+		} else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
+			   !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
+			res = 1;
+
+		if (res > 0)
+			break;
+	}
+
+	free(skip_worktree_seen);
+	return res;
+}
+
 static int read_from_tree(const struct pathspec *pathspec,
 			  struct object_id *tree_oid,
 			  int intent_to_add)
@@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec,
 	opt.format_callback = update_index_from_diff;
 	opt.format_callback_data = &intent_to_add;
 	opt.flags.override_submodule_config = 1;
+	opt.flags.recursive = 1;
 	opt.repo = the_repository;
+	opt.change = diff_change;
+	opt.add_remove = diff_addremove;
+
+	if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
+		ensure_full_index(&the_index);
 
-	ensure_full_index(&the_index);
 	if (do_diff_cache(tree_oid, &opt))
 		return 1;
 	diffcore_std(&opt);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 78476de18ea..f19c1b3e2eb 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -778,11 +778,28 @@ test_expect_success 'sparse-index is not expanded' '
 		ensure_not_expanded reset --hard $ref || return 1
 	done &&
 
+	ensure_not_expanded reset --mixed base &&
 	ensure_not_expanded reset --hard update-deep &&
 	ensure_not_expanded reset --keep base &&
 	ensure_not_expanded reset --merge update-deep &&
 	ensure_not_expanded reset --hard &&
 
+	ensure_not_expanded reset base -- deep/a &&
+	ensure_not_expanded reset base -- nonexistent-file &&
+	ensure_not_expanded reset deepest -- deep &&
+
+	# Although folder1 is outside the sparse definition, it exists as a
+	# directory entry in the index, so the pathspec will not force the
+	# index to be expanded.
+	ensure_not_expanded reset deepest -- folder1 &&
+	ensure_not_expanded reset deepest -- folder1/ &&
+
+	# Wildcard identifies only in-cone files, no index expansion
+	ensure_not_expanded reset deepest -- deep/\* &&
+
+	# Wildcard identifies only full sparse directories, no index expansion
+	ensure_not_expanded reset deepest -- folder\* &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v4 8/8] unpack-trees: improve performance of next_cache_entry
  2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
                         ` (6 preceding siblings ...)
  2021-10-11 20:30       ` [PATCH v4 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
@ 2021-10-11 20:30       ` Victoria Dye via GitGitGadget
  7 siblings, 0 replies; 85+ messages in thread
From: Victoria Dye via GitGitGadget @ 2021-10-11 20:30 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Victoria Dye, Victoria Dye

From: Victoria Dye <vdye@github.com>

To find the first non-unpacked cache entry, `next_cache_entry` iterates
through index, starting at `cache_bottom`. The performance of this in full
indexes is helped by `cache_bottom` advancing with each invocation of
`mark_ce_used` (called by `unpack_index_entry`). However, the presence of
sparse directories can prevent the `cache_bottom` from advancing in a sparse
index case, effectively forcing `next_cache_entry` to search from the
beginning of the index each time it is called.

The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
(unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the
benefit `cache_bottom` provides in non-sparse index cases, a separate `hint`
position indicates the first position `next_cache_entry` should search,
updated each execution with a new position.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 unpack-trees.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 8ea0a542da8..b94733de6be 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }
 
-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;
 
+	if (*hint > pos)
+		pos = *hint;
+
 	while (pos < index->cache_nr) {
 		struct cache_entry *ce = index->cache[pos];
-		if (!(ce->ce_flags & CE_UNPACKED))
+		if (!(ce->ce_flags & CE_UNPACKED)) {
+			*hint = pos + 1;
 			return ce;
+		}
 		pos++;
 	}
+
+	*hint = pos;
 	return NULL;
 }
 
@@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Are we supposed to look at the index too? */
 	if (o->merge) {
+		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;
 
 			if (o->diff_index_cached)
-				ce = next_cache_entry(o);
+				ce = next_cache_entry(o, &hint);
 			else
 				ce = find_cache_entry(info, p);
 
@@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *,
 int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
 {
 	struct repository *repo = the_repository;
-	int i, ret;
+	int i, hint, ret;
 	static struct cache_entry *dfc;
 	struct pattern_list pl;
 	int free_pattern_list = 0;
@@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		info.pathspec = o->pathspec;
 
 		if (o->prefix) {
+			hint = -1;
+
 			/*
 			 * Unpack existing index entries that sort before the
 			 * prefix the tree is spliced into.  Note that o->merge
 			 * is always true in this case.
 			 */
 			while (1) {
-				struct cache_entry *ce = next_cache_entry(o);
+				struct cache_entry *ce = next_cache_entry(o, &hint);
 				if (!ce)
 					break;
 				if (ce_in_traverse_path(ce, &info))
@@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	/* Any left-over entries in the index? */
 	if (o->merge) {
+		hint = -1;
 		while (1) {
-			struct cache_entry *ce = next_cache_entry(o);
+			struct cache_entry *ce = next_cache_entry(o, &hint);
 			if (!ce)
 				break;
 			if (unpack_index_entry(ce, o) < 0)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-10 22:03               ` Junio C Hamano
  2021-10-11 15:55                 ` Victoria Dye
@ 2021-10-12 10:16                 ` Phillip Wood
  2021-10-12 19:15                   ` Junio C Hamano
  1 sibling, 1 reply; 85+ messages in thread
From: Phillip Wood @ 2021-10-12 10:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Victoria Dye, phillip.wood, Victoria Dye via GitGitGadget, git,
	stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

On 10/10/2021 23:03, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> On 08/10/2021 19:31, Junio C Hamano wrote:
>>> Victoria Dye <vdye@github.com> writes:
>>>
>>>> Phillip Wood wrote:
>>>
>>>>> I was looking at the callers to prime_cache_tree() this morning
>>>>> and would like to suggest an alternative approach - just delete
>>>>> prime_cache_tree() and all of its callers!
>>> Do you mean the calls added by new patches without understanding
>>> what they are doing, or all calls to it?
>>
>> I mean all calls to prime_cache_tree() after having understood (or at
>> least thinking that I understand) what they are doing.
> 
> Sorry, my statement was confusingly written.  I meant "calls added
> by new patches, written by those who do not understand what
> prime_cache_tree() calls are doing", but after re-reading it, I
> think it could be taken to be referring to "you may be commenting
> without understanding what prime_cache_tree() calls are doing",
> which wasn't my intention.

Thanks for clarifying that, I had misunderstood what you had written.

>> (a) a successful call to unpack_trees() updates the cache tree
>>
>> (b) all the existing calls to prime_cache_tree() follow a successful
>> call to unpack_trees() and nothing touches in index in between the
>> call to unpack_trees() and prime_cache_tree().
> 
> Ahh, OK.
> 
> I think we originally avoided calling cache_tree_update() lightly
> (because it is essentially a "write-tree", a fairly heavy-weight
> operation, without I/O) and instead relied on prime_cache_tree() to
> get degraded cache-tree back into freshness.
> 
> What I forgot was that 52fca218 (unpack-trees: populate cache-tree
> on successful merge, 2015-07-28) added cache_tree_update() there at
> the end of unpack_trees().  The commit covers quite a wide range of
> operations---the log message says "merge", but in fact anything that
> uses unpack_trees() including branch switching and the resetting of
> the index are affected, and they cause a full reconstruction of the
> cache tree by calling cache_tree_update().
> 
> For most callers of prime_cache_tree(), like the ones in "git
> read-tree" and "git reset", it is immediately obvious that we just
> read from the same tree, and we should have everything from the tree
> and nothing else in the resulting index, so it is clear that the
> prime_cache_tree() call is recreating the same cache-tree
> information that we already should have computed ourselves, and
> these calls can go (or if "prime" is still cheaper than "update",
> these callers can pass an option to tell unpack_trees() to skip the
> cache_tree_update() call, because they will call "prime" immediately
> after).

I haven't really thought this through but could we teach unpack_trees() 
to call prime_cache_tree() rather than cache_tree_update() when that 
would be safe? For callers that use oneway_merge() merge it should 
always be safe I think and it might be possible to modify twoway_merge() 
to signal if the final tree in the index matches the second one passed 
to it. We could have a more general mechanism for the callback to signal 
if it is safe to prime the tree but I suspect the callers that are using 
custom callbacks are not updating the whole tree.

Best Wishes

Phillip

> For other callers it is not immediately obvious, but I trust you are
> correctly reading the code ;-)
> 
> Thanks.
> 
> 
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-08 17:14         ` Victoria Dye
  2021-10-08 18:31           ` Junio C Hamano
@ 2021-10-12 10:17           ` Phillip Wood
  1 sibling, 0 replies; 85+ messages in thread
From: Phillip Wood @ 2021-10-12 10:17 UTC (permalink / raw)
  To: Victoria Dye, phillip.wood, Victoria Dye via GitGitGadget, git
  Cc: stolee, gitster, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

On 08/10/2021 18:14, Victoria Dye wrote:
> Phillip Wood wrote:
>> Hi Victoria
>>
>> On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote:
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> Remove `ensure_full_index` guard on `prime_cache_tree` and update
>>> `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
>>> the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
>>> must determine whether a directory entry is sparse or not by searching for
>>> it in the index (*without* expanding the index). If a matching sparse
>>> directory index entry is found, no subtrees are added to the cache tree
>>> entry and the entry count is set to 1 (representing the sparse directory
>>> itself). Otherwise, the tree is assumed to not be sparse and its subtrees
>>> are recursively added to the cache tree.
>>
>> I was looking at the callers to prime_cache_tree() this morning and would like to suggest an alternative approach - just delete prime_cache_tree() and all of its callers! As far as I can see it is only ever called after a successful call to unpack_trees() and since 52fca2184d ("unpack-trees: populate cache-tree on successful merge", 2015-07-28) unpack_trees() updates the cache tree for the caller. All the call sites are pretty obvious apart from the one in t/help/test-fast-rebase.c where unpack_trees() is called by merge_switch_to_result()
>>
> 
> It looks like `prime_cache_tree` can be removed mostly without issue, but
> it causes the two last tests in `t4058-diff-duplicates.sh` to fail. Those
> tests document failure cases when dealing with duplicate tree entries [1],
> and it looks like `prime_cache_tree` was creating the appearance of a
> fully-reset index but was still leaving it in a state where subsequent
> operations could fail.
> 
> I'm inclined to say the solution here would be to update the tests to
> document the "new" failure behavior and proceed with removing
> `prime_cache_tree`, because:
> 
> * the test using `git reset --hard` disables `GIT_TEST_CHECK_CACHE_TREE`,
>    indicating that `prime_cache_tree` already wasn't behaving correctly
> * attempting to fix the overarching issues with duplicate tree entries will
>    substantially delay this patch series
> * a duplicate entry fix is largely unrelated to the intended scope of the
>    series

That sounds like a good way forward

Best Wishes

Phillip

> Another option would be to leave `prime_cache_tree` as it is, but with it
> being apparently useless outside of mostly-broken use cases in `t4058`, it
> seems like a waste to keep it around.
> 
> [1] ac14de13b2 (t4058: explore duplicate tree entry handling in a bit more detail, 2020-12-11)
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v3 6/8] reset: make sparse-aware (except --mixed)
  2021-10-12 10:16                 ` Phillip Wood
@ 2021-10-12 19:15                   ` Junio C Hamano
  0 siblings, 0 replies; 85+ messages in thread
From: Junio C Hamano @ 2021-10-12 19:15 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Victoria Dye, phillip.wood, Victoria Dye via GitGitGadget, git,
	stolee, newren, Taylor Blau, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Phillip Wood <phillip.wood123@gmail.com> writes:

> I haven't really thought this through but could we teach
> unpack_trees() to call prime_cache_tree() rather than
> cache_tree_update() when that would be safe? For callers that use
> oneway_merge() merge it should always be safe I think and it might be
> possible to modify twoway_merge() to signal if the final tree in the
> index matches the second one passed to it. We could have a more
> general mechanism for the callback to signal if it is safe to prime
> the tree but I suspect the callers that are using custom callbacks are
> not updating the whole tree.

Before going in any direction, other than doing nothing ;-), we'd
need to see how expensive "prime" and "update" are.  

Having said that.

 * Your idea is quite beneficial for callers of unpack_trees() as
   they no longer have to decide whether they want to make a
   separate call to "prime".

 * Right now we do not seem to have a codepath that

   (1) populates the index entries from existing trees (not
   necessarily making the index in complete sync with the trees)
   without unpack_trees() and

   (2) does "prime" to fix the cache tree

   but such a codepath may want to do either "prime" or "update", or
   neither.  When it knows that it damages cache-tree so badly, and
   that it is often expected that the user would make many other
   changes to the index before writing it out as a tree, it may
   choose not to do either.

Thanks.

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2021-10-12 19:16 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-09-30 14:50 ` [PATCH 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
2021-09-30 18:34   ` Junio C Hamano
2021-10-01 14:55     ` Victoria Dye
2021-09-30 14:50 ` [PATCH 2/7] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
2021-09-30 19:17   ` Taylor Blau
2021-09-30 20:11     ` Victoria Dye
2021-09-30 21:32       ` Junio C Hamano
2021-09-30 22:59         ` Victoria Dye
2021-10-01  0:04           ` Junio C Hamano
2021-10-04 13:47             ` Victoria Dye
2021-10-01  9:14   ` Bagas Sanjaya
2021-09-30 14:50 ` [PATCH 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-09-30 14:50 ` [PATCH 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-09-30 14:50 ` [PATCH 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-09-30 14:51 ` [PATCH 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-01 15:03   ` Victoria Dye
2021-09-30 14:51 ` [PATCH 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
2021-10-05 19:30     ` Junio C Hamano
2021-10-05 21:59       ` Victoria Dye
2021-10-06 12:44         ` Junio C Hamano
2021-10-06  1:46     ` Elijah Newren
2021-10-06 20:09       ` Victoria Dye
2021-10-06 10:31     ` Bagas Sanjaya
2021-10-05 13:20   ` [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
2021-10-06  2:00     ` Elijah Newren
2021-10-06 20:40       ` Victoria Dye
2021-10-08  3:42         ` Elijah Newren
2021-10-08 17:11           ` Junio C Hamano
2021-10-06 10:33     ` Bagas Sanjaya
2021-10-05 13:20   ` [PATCH v2 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-06  2:04     ` Elijah Newren
2021-10-05 13:20   ` [PATCH v2 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-06  2:15     ` Elijah Newren
2021-10-06 17:48       ` Junio C Hamano
2021-10-05 13:20   ` [PATCH v2 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-06  3:43     ` Elijah Newren
2021-10-06 20:56       ` Victoria Dye
2021-10-06 10:34     ` Bagas Sanjaya
2021-10-05 13:20   ` [PATCH v2 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-06  4:43     ` Elijah Newren
2021-10-07 14:34       ` Victoria Dye
2021-10-05 13:20   ` [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-06 10:37     ` Bagas Sanjaya
2021-10-05 15:34   ` [PATCH v2 0/7] Sparse Index: integrate with reset Ævar Arnfjörð Bjarmason
2021-10-05 20:44     ` Victoria Dye
2021-10-06  5:46   ` Elijah Newren
2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset Kevin Willford via GitGitGadget
2021-10-08  9:04       ` Junio C Hamano
2021-10-07 21:15     ` [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
2021-10-08  2:50       ` Bagas Sanjaya
2021-10-08  5:24       ` Junio C Hamano
2021-10-08 15:47         ` Victoria Dye
2021-10-08 17:19           ` Junio C Hamano
2021-10-11 14:12             ` Derrick Stolee
2021-10-11 15:05               ` Victoria Dye
2021-10-11 15:24               ` Junio C Hamano
2021-10-07 21:15     ` [PATCH v3 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-08 11:09       ` Phillip Wood
2021-10-08 17:14         ` Victoria Dye
2021-10-08 18:31           ` Junio C Hamano
2021-10-09 11:18             ` Phillip Wood
2021-10-10 22:03               ` Junio C Hamano
2021-10-11 15:55                 ` Victoria Dye
2021-10-11 16:16                   ` Junio C Hamano
2021-10-12 10:16                 ` Phillip Wood
2021-10-12 19:15                   ` Junio C Hamano
2021-10-12 10:17           ` Phillip Wood
2021-10-07 21:15     ` [PATCH v3 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 2/8] reset: preserve skip-worktree bit in mixed reset Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 3/8] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).