git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs
@ 2022-08-04 20:46 Victoria Dye via GitGitGadget
  2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04 20:46 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye

While working on sparse index integration for 'git rm' [1], Shaoxuan found
that removed sparse directories, when reset, would no longer be sparse. This
was due to how 'unpack_trees()' determined whether a traversed directory was
a sparse directory or not; it would only unpack an entry as a sparse
directory if it existed in the index. However, if the sparse directory was
removed, it would be treated like a non-sparse directory and its contents
would be individually unpacked.

To avoid this unnecessary traversal and keep the results of 'reset' as
sparse as possible, the decision logic for whether a directory is sparse is
changed to:

 * If the directory is a sparse directory in the index, unpack it.
 * If not, is the directory inside the sparse cone? If so, do not unpack it.
 * If the directory is outside the sparse cone, does it have any child
   entries in the index? If so, do not unpack it.
 * Otherwise, unpack the entry as a sparse directory.

In the process of updating 'reset', a separate issue was found in 'checkout'
where collapsed sparse directories did not have modified contents reported
file-by-file. A similar bug was found with 'status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01), and
'checkout' was corrected the same way (setting the diff flag 'recursive' to
1).

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/20220803045118.1243087-1-shaoxuan.yuan02@gmail.com/

Victoria Dye (4):
  checkout: fix nested sparse directory diff in sparse index
  oneway_diff: handle removed sparse directories
  cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
  unpack-trees: handle missing sparse directories

 builtin/checkout.c                       |  1 +
 cache-tree.c                             |  2 +-
 cache.h                                  | 15 ++--
 diff-lib.c                               |  5 ++
 read-cache.c                             |  4 +-
 t/t1092-sparse-checkout-compatibility.sh | 25 +++++++
 unpack-trees.c                           | 88 +++++++++++++++++++++---
 7 files changed, 119 insertions(+), 21 deletions(-)


base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1312%2Fvdye%2Freset%2Fhandle-missing-dirs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1312/vdye/reset/handle-missing-dirs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1312
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index
  2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
@ 2022-08-04 20:46 ` Victoria Dye via GitGitGadget
  2022-08-05 17:59   ` Derrick Stolee
  2022-08-04 20:46 ` [PATCH 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04 20:46 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add the 'recursive' diff flag to the local changes reporting done by 'git
checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
sparse directories will not be recursed into to report the diff of each
file's contents, resulting in the reported local changes including
"modified" sparse directories.

The same issue was found and fixed for 'git status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01)

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/checkout.c                       | 1 +
 t/t1092-sparse-checkout-compatibility.sh | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29c74f898bf..f9d63d80b92 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -626,6 +626,7 @@ static void show_local_changes(struct object *head,
 	repo_init_revisions(the_repository, &rev, NULL);
 	rev.diffopt.flags = opts->flags;
 	rev.diffopt.output_format |= DIFF_FORMAT_NAME_STATUS;
+	rev.diffopt.flags.recursive = 1;
 	diff_setup_done(&rev.diffopt);
 	add_pending_object(&rev, head, NULL);
 	run_diff_index(&rev, 0);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 763c6cc6846..99a1425a929 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -372,6 +372,14 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
+test_expect_success 'checkout with modified sparse directory' '
+	init_repos &&
+
+	test_all_match git checkout rename-in-to-out -- . &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git checkout base
+'
+
 test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] oneway_diff: handle removed sparse directories
  2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
@ 2022-08-04 20:46 ` Victoria Dye via GitGitGadget
  2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04 20:46 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'do_oneway_diff()' to perform a 'diff_tree_oid()' on removed sparse
directories, as it does for added or modified sparse directories (see
9eb00af562 (diff-lib: handle index diffs with sparse dirs, 2021-07-14)).

At the moment, this update is unreachable code because 'unpack_trees()'
(currently the only way 'oneway_diff()' can be called, via 'diff_cache()')
will always traverse trees down to the individual removed files of a deleted
sparse directory. A subsequent patch will change this to better preserve a
sparse index in other uses of 'unpack_tree()', e.g. 'git reset --hard'.
However, making that change without this patch would result in (among other
issues) 'git status' printing only the name of a deleted sparse directory,
not its contents. To avoid introducing that bug, 'do_oneway_diff()' is
updated before modifying 'unpack_trees()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 diff-lib.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index 7eb66a417aa..2edea41a234 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -466,6 +466,11 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 	 * Something removed from the tree?
 	 */
 	if (!idx) {
+		if (S_ISSPARSEDIR(tree->ce_mode)) {
+			diff_tree_oid(&tree->oid, NULL, tree->name, &revs->diffopt);
+			return;
+		}
+
 		diff_index_show_file(revs, "-", tree, &tree->oid, 1,
 				     tree->ce_mode, 0);
 		return;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
  2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
  2022-08-04 20:46 ` [PATCH 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
@ 2022-08-04 20:46 ` Victoria Dye via GitGitGadget
  2022-08-04 22:16   ` Junio C Hamano
  2022-08-06  0:09   ` Junio C Hamano
  2022-08-04 20:46 ` [PATCH 4/4] unpack-trees: handle missing sparse directories Victoria Dye via GitGitGadget
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  4 siblings, 2 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04 20:46 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Replace 'index_entry_exists()' (which returns a binary '1' or '0' depending
on whether a specified entry exists in the index) with
'index_name_pos_sparse()' (which behaves the same as 'index_name_pos()',
except that it does not expand a sparse index to search for an entry inside
a sparse directory).

'index_entry_exists()' was original implemented in 20ec2d034c (reset: make
sparse-aware (except --mixed), 2021-11-29) to allow callers to search for an
index entry without expanding a sparse index. That particular case only
required knowing whether the requested entry existed. This patch expands the
amount of information returned by indicating both 1) whether the entry
exists, and 2) its position (or potential position) in the index.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache-tree.c |  2 +-
 cache.h      | 15 +++++++--------
 read-cache.c |  4 ++--
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 56db0b5026b..ff0c1733356 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -798,7 +798,7 @@ static void prime_cache_tree_rec(struct repository *r,
 			 * as normal.
 			 */
 			if (r->index->sparse_index &&
-			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
+			    index_name_pos_sparse(r->index, tree_path->buf, tree_path->len) >= 0)
 				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
 			else
 				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
diff --git a/cache.h b/cache.h
index 4aa1bd079d5..6bfd1d80b24 100644
--- a/cache.h
+++ b/cache.h
@@ -831,14 +831,13 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
 /*
- * Determines whether an entry with the given name exists within the
- * given index. The return value is 1 if an exact match is found, otherwise
- * it is 0. Note that, unlike index_name_pos, this function does not expand
- * the index if it is sparse. If an item exists within the full index but it
- * is contained within a sparse directory (and not in the sparse index), 0 is
- * returned.
- */
-int index_entry_exists(struct index_state *, const char *name, int namelen);
+ * Like index_name_pos, returns the position of an entry of the given name in
+ * the index if one exists, otherwise returns a negative value where the negated
+ * value minus 1 is the position where the index entry would be inserted. Unlike
+ * index_name_pos, however, a sparse index is not expanded to find an entry
+ * inside a sparse directory.
+ */
+int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
 
 /*
  * Some functions return the negative complement of an insert position when a
diff --git a/read-cache.c b/read-cache.c
index 4de207752dc..a85b922a3bc 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -620,9 +620,9 @@ int index_name_pos(struct index_state *istate, const char *name, int namelen)
 	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
 }
 
-int index_entry_exists(struct index_state *istate, const char *name, int namelen)
+int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
 {
-	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
+	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
 }
 
 int remove_index_entry_at(struct index_state *istate, int pos)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] unpack-trees: handle missing sparse directories
  2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
@ 2022-08-04 20:46 ` Victoria Dye via GitGitGadget
  2022-08-04 23:23   ` Junio C Hamano
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  4 siblings, 1 reply; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-04 20:46 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

If a sparse directory does not exist in the index, unpack it at the
directory level rather than recursing into it an unpacking its contents
file-by-file. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a sparse directory.

A directory is determined to be truly non-existent in the index (rather than
the parent of existing index entries), if 1) its path is outside the sparse
cone and 2) there are no children of the directory in the index. This check
is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
directory is a missing sparse dir, 'unpack_single_entry()'  will proceed
with unpacking it. This determination is also propagated back up to
'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
traversal into the unpacked directory.

Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 17 +++++
 unpack-trees.c                           | 88 +++++++++++++++++++++---
 2 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 99a1425a929..eb5edebfa5d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -695,6 +695,23 @@ test_expect_success 'reset with wildcard pathspec' '
 	test_all_match git ls-files -s -- folder1
 '
 
+test_expect_success 'reset hard with removed sparse dir' '
+	init_repos &&
+
+	test_all_match git rm -r --sparse folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard &&
+	test_all_match git status --porcelain=v2 &&
+
+	cat >expect <<-\EOF &&
+	folder1/
+	EOF
+
+	git -C sparse-index ls-files --sparse folder1 >out &&
+	test_cmp expect out
+'
+
 test_expect_success 'update-index modify outside sparse definition' '
 	init_repos &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a454e03bff..aa62cef20fe 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1069,6 +1069,53 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	return ce;
 }
 
+/*
+ * Determine whether the path specified corresponds to a sparse directory
+ * completely missing from the index. This function is assumed to only be
+ * called when the named path isn't already in the index.
+ */
+static int missing_dir_is_sparse(const struct traverse_info *info,
+				 const struct name_entry *p)
+{
+	int res, pos;
+	struct strbuf dirpath = STRBUF_INIT;
+	struct unpack_trees_options *o = info->data;
+
+	/*
+	 * First, check whether the path is in the sparse cone. If it is,
+	 * then this directory shouldn't be sparse.
+	 */
+	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
+	strbuf_add(&dirpath, p->path, p->pathlen);
+	strbuf_addch(&dirpath, '/');
+	if (path_in_cone_mode_sparse_checkout(dirpath.buf, o->src_index)) {
+		res = 0;
+		goto cleanup;
+	}
+
+	/*
+	 * Given that the directory is not inside the sparse cone, it could be
+	 * (partially) expanded in the index. If child entries exist, the path
+	 * is not a missing sparse directory.
+	 */
+	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
+	if (pos >= 0)
+		BUG("cache entry '%s%s' shouldn't exist in the index",
+		    info->traverse_path, p->path);
+
+	pos = -pos - 1;
+	if (pos >= o->src_index->cache_nr) {
+		res = 1;
+		goto cleanup;
+	}
+
+	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
+
+cleanup:
+	strbuf_release(&dirpath);
+	return res;
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1078,21 +1125,40 @@ static int unpack_single_entry(int n, unsigned long mask,
 			       unsigned long dirmask,
 			       struct cache_entry **src,
 			       const struct name_entry *names,
-			       const struct traverse_info *info)
+			       const struct traverse_info *info,
+			       int *is_missing_sparse_dir)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
+	const struct name_entry *p = names;
 
-	if (mask == dirmask && !src[0])
-		return 0;
+	*is_missing_sparse_dir = 0;
+	if (mask == dirmask && !src[0]) {
+		/*
+		 * If the directory is completely missing from the index but
+		 * would otherwise be a sparse directory, we should unpack it.
+		 * If not, we'll return and continue recursively traversing the
+		 * tree.
+		 */
+		if (!o->src_index->sparse_index)
+			return 0;
+
+		/* Find first entry with a real name (we could use "mask" too) */
+		while (!p->mode)
+			p++;
+
+		*is_missing_sparse_dir = missing_dir_is_sparse(info, p);
+		if (!*is_missing_sparse_dir)
+			return 0;
+	}
 
 	/*
-	 * When we have a sparse directory entry for src[0],
-	 * then this isn't necessarily a directory-file conflict.
+	 * When we are unpacking a sparse directory, then this isn't necessarily
+	 * a directory-file conflict.
 	 */
-	if (mask == dirmask && src[0] &&
-	    S_ISSPARSEDIR(src[0]->ce_mode))
+	if (mask == dirmask &&
+	    (*is_missing_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
 		conflicts = 0;
 
 	/*
@@ -1352,7 +1418,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
-	int ret;
+	int ret, is_missing_sparse_dir;
 
 	assert(o->merge);
 
@@ -1376,7 +1442,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 	 * "index" tree (i.e., names[0]) and adjust 'names', 'n', 'mask', and
 	 * 'dirmask' accordingly.
 	 */
-	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
+	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_missing_sparse_dir);
 
 	if (src[0])
 		discard_cache_entry(src[0]);
@@ -1394,6 +1460,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	int is_missing_sparse_dir;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1440,7 +1507,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_missing_sparse_dir))
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1478,6 +1545,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 
 		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    !is_missing_sparse_dir &&
 		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 						    names, info) < 0) {
 			return -1;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
  2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
@ 2022-08-04 22:16   ` Junio C Hamano
  2022-08-06  0:09   ` Junio C Hamano
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2022-08-04 22:16 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, shaoxuan.yuan02, newren, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Replace 'index_entry_exists()' (which returns a binary '1' or '0' depending
> on whether a specified entry exists in the index) with
> 'index_name_pos_sparse()' (which behaves the same as 'index_name_pos()',
> except that it does not expand a sparse index to search for an entry inside
> a sparse directory).
>
> 'index_entry_exists()' was original implemented in 20ec2d034c (reset: make

"original" -> "originally"?

> sparse-aware (except --mixed), 2021-11-29) to allow callers to search for an
> index entry without expanding a sparse index. That particular case only
> required knowing whether the requested entry existed. This patch expands the
> amount of information returned by indicating both 1) whether the entry
> exists, and 2) its position (or potential position) in the index.

"position or potential position" is good, but the latter stalled my
reading briefly, wondering if "potential" refers to "the position
that the entry would be at when the sparse entries are all
expanded", which is not what the patch and the description intend to
say.  I am not sure how to rephrase the proposed log message to
alleviate the confusion, though.

> diff --git a/cache.h b/cache.h
> index 4aa1bd079d5..6bfd1d80b24 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -831,14 +831,13 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
>  int index_name_pos(struct index_state *, const char *name, int namelen);
>  
>  /*
> + * Like index_name_pos, returns the position of an entry of the given name in
> + * the index if one exists, otherwise returns a negative value where the negated
> + * value minus 1 is the position where the index entry would be inserted. Unlike
> + * index_name_pos, however, a sparse index is not expanded to find an entry
> + * inside a sparse directory.
> + */

This description is perfectly clear.

> +int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
>  
>  /*
>   * Some functions return the negative complement of an insert position when a
> diff --git a/read-cache.c b/read-cache.c
> index 4de207752dc..a85b922a3bc 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -620,9 +620,9 @@ int index_name_pos(struct index_state *istate, const char *name, int namelen)
>  	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
>  }
>  
> -int index_entry_exists(struct index_state *istate, const char *name, int namelen)
> +int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
>  {
> -	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
> +	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
>  }

Nice.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] unpack-trees: handle missing sparse directories
  2022-08-04 20:46 ` [PATCH 4/4] unpack-trees: handle missing sparse directories Victoria Dye via GitGitGadget
@ 2022-08-04 23:23   ` Junio C Hamano
  2022-08-05 16:36     ` Victoria Dye
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2022-08-04 23:23 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, shaoxuan.yuan02, newren, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> If a sparse directory does not exist in the index, unpack it at the
> directory level rather than recursing into it an unpacking its contents
> file-by-file. This helps keep the sparse index as collapsed as possible in
> cases such as 'git reset --hard' restoring a sparse directory.

My reading hiccuped at "does not exist in".  After reading the above
twice, I think the situation is that there is a directory A/B in
which a file C sits, and A/ is marked as sparse and the paragraph is
talking about directory A/B.  Because the index has A/ as a tree in
index, A/B does not exist in the index.

When we _somehow_ need to ensure that A/B exists in the index for
_unknown_ reason, we could flatten the index fully and have A/B/C as
a blob (without A or A/B in the index proper, even though they may
appear in cache-tree), but if we stop at removing A and adding A/B
still as a tree without showing A/B/C in the index, we may gain
efficiency, under the assumption that we do not have to access A/B/C
and its siblings.

Did I read the intention correctly?  I suspect future readers of the
commit that would result from this patch would also wonder what the
"somehow" and "unknown" above are.

> A directory is determined to be truly non-existent in the index (rather than
> the parent of existing index entries), if 1) its path is outside the sparse
> cone and 2) there are no children of the directory in the index. This check
> is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
> directory is a missing sparse dir, 'unpack_single_entry()'  will proceed
> with unpacking it. This determination is also propagated back up to
> 'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
> traversal into the unpacked directory.

Makes sense.

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 8a454e03bff..aa62cef20fe 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1069,6 +1069,53 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>  	return ce;
>  }
>  
> +/*
> + * Determine whether the path specified corresponds to a sparse directory
> + * completely missing from the index. This function is assumed to only be
> + * called when the named path isn't already in the index.
> + */
> +static int missing_dir_is_sparse(const struct traverse_info *info,
> +				 const struct name_entry *p)
> +{
> +	int res, pos;
> +	struct strbuf dirpath = STRBUF_INIT;
> +	struct unpack_trees_options *o = info->data;
> +
> +	/*
> +	 * First, check whether the path is in the sparse cone. If it is,
> +	 * then this directory shouldn't be sparse.
> +	 */
> +	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
> +	strbuf_add(&dirpath, p->path, p->pathlen);
> +	strbuf_addch(&dirpath, '/');
> +	if (path_in_cone_mode_sparse_checkout(dirpath.buf, o->src_index)) {
> +		res = 0;
> +		goto cleanup;
> +	}

OK.

> +	/*
> +	 * Given that the directory is not inside the sparse cone, it could be
> +	 * (partially) expanded in the index. If child entries exist, the path
> +	 * is not a missing sparse directory.
> +	 */
> +	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
> +	if (pos >= 0)
> +		BUG("cache entry '%s%s' shouldn't exist in the index",
> +		    info->traverse_path, p->path);

So, we fed 'p' to this function, and we didn't expect it to exist in
the index if it is outside the sparse cone.

> +	pos = -pos - 1;

This is the location that a cache entry for the dirpath.buf would
sit in the index if it were a sparse entry.

> +	if (pos >= o->src_index->cache_nr) {
> +		res = 1;
> +		goto cleanup;
> +	}
> +	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);

So, we found where dirpath.buf would be inserted.  If we (can) look
at the cache entry that is currently sitting at that position, and
find that it is a path inside it, then res becomes zero.  IOW, we
found that the sparse directory is missing in the index, but there
is a path that is in the directory recorded in the index.

The current entry, on the other hand, is outside the dirpath.buf,
then res becomes non-zero.  We are asked to yield true (i.e.
nonzero) if and only if the given directory and all paths in that
directory are missing from the index, so that is what happens here.
Sounds OK.

And the "out of bounds" check just means that the entries that sit
near the end of the index sort strictly before our dirpath.buf, so
they cannot be inside our directory, which is also the case where we
are expected to yield "true".

OK.

> +
> +cleanup:
> +	strbuf_release(&dirpath);
> +	return res;
> +}
> +
>  /*
>   * Note that traverse_by_cache_tree() duplicates some logic in this function
>   * without actually calling it. If you change the logic here you may need to
> @@ -1078,21 +1125,40 @@ static int unpack_single_entry(int n, unsigned long mask,
>  			       unsigned long dirmask,
>  			       struct cache_entry **src,
>  			       const struct name_entry *names,
> -			       const struct traverse_info *info)
> +			       const struct traverse_info *info,
> +			       int *is_missing_sparse_dir)
>  {
>  	int i;
>  	struct unpack_trees_options *o = info->data;
>  	unsigned long conflicts = info->df_conflicts | dirmask;
> +	const struct name_entry *p = names;
>  
> -	if (mask == dirmask && !src[0])
> -		return 0;
> +	*is_missing_sparse_dir = 0;
> +	if (mask == dirmask && !src[0]) {
> +		/*
> +		 * If the directory is completely missing from the index but
> +		 * would otherwise be a sparse directory, we should unpack it.
> +		 * If not, we'll return and continue recursively traversing the
> +		 * tree.
> +		 */
> +		if (!o->src_index->sparse_index)
> +			return 0;
> +
> +		/* Find first entry with a real name (we could use "mask" too) */
> +		while (!p->mode)
> +			p++;
> +
> +		*is_missing_sparse_dir = missing_dir_is_sparse(info, p);
> +		if (!*is_missing_sparse_dir)
> +			return 0;

Interesting.  Nobody checked if the name in 'p' is a directory up to
this point, but missing_dir_is_sparse() does not care?  The only
thing we know is that !src[0], so the name represented by 'p' does
not exist in the index.  IIRC, the function only checked if p names
a path inside or outside the sparse cone and if there are or are not
paths that would be inside the path if p _were_ a directory but I
didn't read it carefully what it did when the caller fed a path to a
non-directory to the function.  As long as it will say "no" to such
a situation, I won't complain, but I have to wonder how the logic
around here tells apart a case where a sparse directory is
completely hidden (which missing_dir_is_sparse() gives us) from a
case where there is absolutely nothing, not a directory and not a
blob, at the path in the index.  Or perhaps it works correctly
without having to tell them apart?  I dunno.

By the way, there is a comment before the function that reminds us
that traverse_by_cache_tree() may need a matching change whenever a
change is made to this function.  Does this require such a matching
change?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] unpack-trees: handle missing sparse directories
  2022-08-04 23:23   ` Junio C Hamano
@ 2022-08-05 16:36     ` Victoria Dye
  2022-08-05 19:24       ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Victoria Dye @ 2022-08-05 16:36 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, shaoxuan.yuan02, newren

Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Victoria Dye <vdye@github.com>
>>
>> If a sparse directory does not exist in the index, unpack it at the
>> directory level rather than recursing into it an unpacking its contents
>> file-by-file. This helps keep the sparse index as collapsed as possible in
>> cases such as 'git reset --hard' restoring a sparse directory.
> 
> My reading hiccuped at "does not exist in".  After reading the above
> twice, I think the situation is that there is a directory A/B in
> which a file C sits, and A/ is marked as sparse and the paragraph is
> talking about directory A/B.  Because the index has A/ as a tree in
> index, A/B does not exist in the index.
> > When we _somehow_ need to ensure that A/B exists in the index for
> _unknown_ reason, we could flatten the index fully and have A/B/C as
> a blob (without A or A/B in the index proper, even though they may
> appear in cache-tree), but if we stop at removing A and adding A/B
> still as a tree without showing A/B/C in the index, we may gain
> efficiency, under the assumption that we do not have to access A/B/C
> and its siblings.
> 
> Did I read the intention correctly?  I suspect future readers of the
> commit that would result from this patch would also wonder what the
> "somehow" and "unknown" above are.
> 
If I'm reading this correctly, it's not quite what I meant - the situation
this patch addresses is when _nothing_ in the tree rooted at 'A/' (files,
subdirectories) exists in the index, but 'unpack_trees()' is merging tree(s)
where 'A/' *does* exist. I originally wanted to write "If a sparse directory
*is deleted from* the index", but that doesn't cover the case where 'A/'
never existed in the index and we're merging in tree(s) that introduce it.

Maybe it would be clearer to describe it with a different perspective: "If
'unpack_callback()' is merging a new tree into a sparse index, merge the
tree as a sparse directory rather than traversing its contents" or something
like that? I'll try to come up with a better way of explaining this and
update in V2. 

>> A directory is determined to be truly non-existent in the index (rather than
>> the parent of existing index entries), if 1) its path is outside the sparse
>> cone and 2) there are no children of the directory in the index. This check
>> is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
>> directory is a missing sparse dir, 'unpack_single_entry()'  will proceed
>> with unpacking it. This determination is also propagated back up to
>> 'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
>> traversal into the unpacked directory.
> 
> Makes sense.
> 
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 8a454e03bff..aa62cef20fe 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1069,6 +1069,53 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>>  	return ce;
>>  }
>>  
>> +/*
>> + * Determine whether the path specified corresponds to a sparse directory
>> + * completely missing from the index. This function is assumed to only be
>> + * called when the named path isn't already in the index.
>> + */
>> +static int missing_dir_is_sparse(const struct traverse_info *info,
>> +				 const struct name_entry *p)
>> +{
>> +	int res, pos;
>> +	struct strbuf dirpath = STRBUF_INIT;
>> +	struct unpack_trees_options *o = info->data;
>> +
>> +	/*
>> +	 * First, check whether the path is in the sparse cone. If it is,
>> +	 * then this directory shouldn't be sparse.
>> +	 */
>> +	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
>> +	strbuf_add(&dirpath, p->path, p->pathlen);
>> +	strbuf_addch(&dirpath, '/');
>> +	if (path_in_cone_mode_sparse_checkout(dirpath.buf, o->src_index)) {
>> +		res = 0;
>> +		goto cleanup;
>> +	}
> 
> OK.
> 
>> +	/*
>> +	 * Given that the directory is not inside the sparse cone, it could be
>> +	 * (partially) expanded in the index. If child entries exist, the path
>> +	 * is not a missing sparse directory.
>> +	 */
>> +	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
>> +	if (pos >= 0)
>> +		BUG("cache entry '%s%s' shouldn't exist in the index",
>> +		    info->traverse_path, p->path);
> 
> So, we fed 'p' to this function, and we didn't expect it to exist in
> the index if it is outside the sparse cone.
> 

Correct; if an index entry with the name 'p' existed, 'src[0]' would not be
NULL in 'unpack_callback()'/'unpack_single_entry()' and this function
wouldn't have been called.

>> +	pos = -pos - 1;
> 
> This is the location that a cache entry for the dirpath.buf would
> sit in the index if it were a sparse entry.
> 
>> +	if (pos >= o->src_index->cache_nr) {
>> +		res = 1;
>> +		goto cleanup;
>> +	}
>> +	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
> 
> So, we found where dirpath.buf would be inserted.  If we (can) look
> at the cache entry that is currently sitting at that position, and
> find that it is a path inside it, then res becomes zero.  IOW, we
> found that the sparse directory is missing in the index, but there
> is a path that is in the directory recorded in the index.
> 
> The current entry, on the other hand, is outside the dirpath.buf,
> then res becomes non-zero.  We are asked to yield true (i.e.
> nonzero) if and only if the given directory and all paths in that
> directory are missing from the index, so that is what happens here.
> Sounds OK.
> 
> And the "out of bounds" check just means that the entries that sit
> near the end of the index sort strictly before our dirpath.buf, so
> they cannot be inside our directory, which is also the case where we
> are expected to yield "true".
> 
> OK.
> 
>> +
>> +cleanup:
>> +	strbuf_release(&dirpath);
>> +	return res;
>> +}
>> +
>>  /*
>>   * Note that traverse_by_cache_tree() duplicates some logic in this function
>>   * without actually calling it. If you change the logic here you may need to
>> @@ -1078,21 +1125,40 @@ static int unpack_single_entry(int n, unsigned long mask,
>>  			       unsigned long dirmask,
>>  			       struct cache_entry **src,
>>  			       const struct name_entry *names,
>> -			       const struct traverse_info *info)
>> +			       const struct traverse_info *info,
>> +			       int *is_missing_sparse_dir)
>>  {
>>  	int i;
>>  	struct unpack_trees_options *o = info->data;
>>  	unsigned long conflicts = info->df_conflicts | dirmask;
>> +	const struct name_entry *p = names;
>>  
>> -	if (mask == dirmask && !src[0])
>> -		return 0;
>> +	*is_missing_sparse_dir = 0;
>> +	if (mask == dirmask && !src[0]) {
>> +		/*
>> +		 * If the directory is completely missing from the index but
>> +		 * would otherwise be a sparse directory, we should unpack it.
>> +		 * If not, we'll return and continue recursively traversing the
>> +		 * tree.
>> +		 */
>> +		if (!o->src_index->sparse_index)
>> +			return 0;
>> +
>> +		/* Find first entry with a real name (we could use "mask" too) */
>> +		while (!p->mode)
>> +			p++;
>> +
>> +		*is_missing_sparse_dir = missing_dir_is_sparse(info, p);
>> +		if (!*is_missing_sparse_dir)
>> +			return 0;
> 
> Interesting.  Nobody checked if the name in 'p' is a directory up to
> this point, but missing_dir_is_sparse() does not care?  The only
> thing we know is that !src[0], so the name represented by 'p' does
> not exist in the index.  IIRC, the function only checked if p names
> a path inside or outside the sparse cone and if there are or are not
> paths that would be inside the path if p _were_ a directory but I
> didn't read it carefully what it did when the caller fed a path to a
> non-directory to the function.  As long as it will say "no" to such
> a situation, I won't complain, but I have to wonder how the logic
> around here tells apart a case where a sparse directory is
> completely hidden (which missing_dir_is_sparse() gives us) from a
> case where there is absolutely nothing, not a directory and not a
> blob, at the path in the index.  Or perhaps it works correctly
> without having to tell them apart?  I dunno.
> 

I wrote 'missing_dir_is_sparse()' in an attempt keep some complex logic out
of the already-complicated 'unpack_single_entry()', but as part of that it
relies on information already established by its caller.

We know 'p' is a directory because 'missing_dir_is_sparse()' is called
inside a 'mask == dirmask' condition. 'mask' is a representation of which
trees being traversed have an entry with the given name and 'dirmask' is a
representation of which of those entries are directories, so the only way
'mask == dirmask' and 'p' is *not* a directory is if the currently-traversed
entries in all of the trees do not exist. *That* won't happen because
'unpack_callback()' won't be invoked at all by 'traverse_trees()' if 'mask'
is 0.

Given that it requires jumping through multiple function invocations and
callbacks to figure that out, I can add some assertions or 'return 0'
conditions at the beginning of 'missing_dir_is_sparse()' to codify its
assumptions. Even if the assertions are slightly redundant now, they'll make
the code clearer and make the function safer for reuse.

> By the way, there is a comment before the function that reminds us
> that traverse_by_cache_tree() may need a matching change whenever a
> change is made to this function.  Does this require such a matching
> change?

In this case, I *think* a parallel change there is unnecessary.
'traverse_by_cache_tree()' iterates on the contents of the index, so its
equivalent of 'src[0]' always exists when it merges individual entries. And,
to trigger that function, the cache tree needs to identically match the
tree(s) being merged; if a subtree doesn't exist in either the index or any
of the merging trees, there's nothing to merge.

In any case, thanks for the reminder - I didn't notice the comment while
implementing this, so it could have just as easily needed fixing.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index
  2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
@ 2022-08-05 17:59   ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2022-08-05 17:59 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget, git
  Cc: shaoxuan.yuan02, newren, gitster, Victoria Dye

On 8/4/2022 4:46 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> Add the 'recursive' diff flag to the local changes reporting done by 'git
> checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
> sparse directories will not be recursed into to report the diff of each
> file's contents, resulting in the reported local changes including
> "modified" sparse directories.

Nice find!

> +	rev.diffopt.flags.recursive = 1;

Simple fix.

> +test_expect_success 'checkout with modified sparse directory' '
> +	init_repos &&
> +
> +	test_all_match git checkout rename-in-to-out -- . &&
> +	test_sparse_match git sparse-checkout reapply &&
> +	test_all_match git checkout base
> +'

Simple test.

Excellent.
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] unpack-trees: handle missing sparse directories
  2022-08-05 16:36     ` Victoria Dye
@ 2022-08-05 19:24       ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2022-08-05 19:24 UTC (permalink / raw)
  To: Victoria Dye
  Cc: Victoria Dye via GitGitGadget, git, derrickstolee,
	shaoxuan.yuan02, newren

Victoria Dye <vdye@github.com> writes:

> If I'm reading this correctly, it's not quite what I meant - the situation
> this patch addresses is when _nothing_ in the tree rooted at 'A/' (files,
> subdirectories) exists in the index, but 'unpack_trees()' is merging tree(s)
> where 'A/' *does* exist.

OK, that is very different from my flawed readign.

> Maybe it would be clearer to describe it with a different perspective: "If
> 'unpack_callback()' is merging a new tree into a sparse index, merge the
> tree as a sparse directory rather than traversing its contents" or something
> like that? I'll try to come up with a better way of explaining this and
> update in V2. 

Yeah, that explains a typical scenario where you want to do this
kind of thing better.  Right now, do we just merge in the contents
of that whole tree whether the root of that new tree is or is not in
the sparse cone(s)?  Noticing that the new tree is outside the cones
of our interest and populating the index minimally, just enough to
be able to write the resulting index as a tree, does make sense.

> I wrote 'missing_dir_is_sparse()' in an attempt keep some complex logic out
> of the already-complicated 'unpack_single_entry()', but as part of that it
> relies on information already established by its caller.
>
> We know 'p' is a directory because 'missing_dir_is_sparse()' is called
> inside a 'mask == dirmask' condition.

Ahh, thanks---that is exactly what I missed, and led to my questions.

> 'mask' is a representation of which
> trees being traversed have an entry with the given name and 'dirmask' is a
> representation of which of those entries are directories, so the only way
> 'mask == dirmask' and 'p' is *not* a directory is if the currently-traversed
> entries in all of the trees do not exist. *That* won't happen because
> 'unpack_callback()' won't be invoked at all by 'traverse_trees()' if 'mask'
> is 0.
>
> Given that it requires jumping through multiple function invocations and
> callbacks to figure that out, I can add some assertions or 'return 0'
> conditions at the beginning of 'missing_dir_is_sparse()' to codify its
> assumptions. Even if the assertions are slightly redundant now, they'll make
> the code clearer and make the function safer for reuse.

Sounds good.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
  2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
  2022-08-04 22:16   ` Junio C Hamano
@ 2022-08-06  0:09   ` Junio C Hamano
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2022-08-06  0:09 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, shaoxuan.yuan02, newren, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> 'index_entry_exists()' was original implemented in 20ec2d034c (reset: make
> sparse-aware (except --mixed), 2021-11-29) to allow callers to search for an
> index entry without expanding a sparse index. That particular case only
> required knowing whether the requested entry existed. This patch expands the
> amount of information returned by indicating both 1) whether the entry
> exists, and 2) its position (or potential position) in the index.

This patch probably should keep index_entry_exists() to potential
new callers that may be introduced by contemporary topics in flight,
by doing something like the following squashed in.  We know that
Shaoxuan's series does add a new callsite, which is of a lessor
concern as that series may want to be rebased on top of these fixes
anyway.  But there may be other topics that may want to add new
calls for this helper that is more trivially and obviously correct
than these four patches, in which case such topics want to proceed
independent of these four patches.

 cache.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git c/cache.h w/cache.h
index ba85435fee..039a32a317 100644
--- c/cache.h
+++ w/cache.h
@@ -839,6 +839,17 @@ int index_name_pos(struct index_state *, const char *name, int namelen);
  */
 int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
 
+/*
+ * This helper function is only kept to help possible
+ * contemporary topics in flight.  Do not use it in new
+ * code; use index_name_pos_sparse() instead.
+ */
+static inline int index_entry_exists(struct index_state *istate,
+				     const char *name, int namelen)
+{
+	return 0 <= index_name_pos_sparse(istate, name, namelen);
+}
+
 /*
  * Some functions return the negative complement of an insert position when a
  * precise match was not found but a position was found where the entry would

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs
  2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-08-04 20:46 ` [PATCH 4/4] unpack-trees: handle missing sparse directories Victoria Dye via GitGitGadget
@ 2022-08-07  2:57 ` Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
                     ` (4 more replies)
  4 siblings, 5 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-07  2:57 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye

While working on sparse index integration for 'git rm' [1], Shaoxuan found
that removed sparse directories, when reset, would no longer be sparse. This
was due to how 'unpack_trees()' determined whether a traversed directory was
a sparse directory or not; it would only unpack an entry as a sparse
directory if it existed in the index. However, if the sparse directory was
removed, it would be treated like a non-sparse directory and its contents
would be individually unpacked.

To avoid this unnecessary traversal and keep the results of 'reset' as
sparse as possible, the decision logic for whether a directory is sparse is
changed to:

 * If the directory is a sparse directory in the index, unpack it.
 * If not, is the directory inside the sparse cone? If so, do not unpack it.
 * If the directory is outside the sparse cone, does it have any child
   entries in the index? If so, do not unpack it.
 * Otherwise, unpack the entry as a sparse directory.

In the process of updating 'reset', a separate issue was found in 'checkout'
where collapsed sparse directories did not have modified contents reported
file-by-file. A similar bug was found with 'status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01), and
'checkout' was corrected the same way (setting the diff flag 'recursive' to
1).


Changes since V1
================

 * Reverted the removal of 'index_entry_exists()' to avoid breaking other
   in-flight series.
 * Renamed 'is_missing_sparse_dir()' to 'is_new_sparse_dir()'; revised
   comments and commit messages to clarify what that function is doing and
   why.
 * Handled "unexpected" inputs to 'is_new_sparse_dir()' more gently,
   returning 0 if 'p' is not a directory or the directory already exists in
   the index (rather than exiting with 'BUG()'). This is intended to make
   'is_new_sparse_dir()' less reliant on information about the index
   established by 'unpack_callback()' & 'unpack_single_entry()', resulting
   in easier-to-read and more reusable code.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/20220803045118.1243087-1-shaoxuan.yuan02@gmail.com/

Victoria Dye (4):
  checkout: fix nested sparse directory diff in sparse index
  oneway_diff: handle removed sparse directories
  cache.h: create 'index_name_pos_sparse()'
  unpack-trees: unpack new trees as sparse directories

 builtin/checkout.c                       |   1 +
 cache.h                                  |   9 ++
 diff-lib.c                               |   5 ++
 read-cache.c                             |   5 ++
 t/t1092-sparse-checkout-compatibility.sh |  25 ++++++
 unpack-trees.c                           | 106 ++++++++++++++++++++---
 6 files changed, 141 insertions(+), 10 deletions(-)


base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1312%2Fvdye%2Freset%2Fhandle-missing-dirs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1312/vdye/reset/handle-missing-dirs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1312

Range-diff vs v1:

 1:  255318f4dc6 = 1:  255318f4dc6 checkout: fix nested sparse directory diff in sparse index
 2:  55c77ba4b29 = 2:  55c77ba4b29 oneway_diff: handle removed sparse directories
 3:  f7978d223fe ! 3:  d0bdec63286 cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()'
     +    cache.h: create 'index_name_pos_sparse()'
      
     -    Replace 'index_entry_exists()' (which returns a binary '1' or '0' depending
     -    on whether a specified entry exists in the index) with
     -    'index_name_pos_sparse()' (which behaves the same as 'index_name_pos()',
     +    Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()',
          except that it does not expand a sparse index to search for an entry inside
     -    a sparse directory).
     +    a sparse directory.
      
     -    'index_entry_exists()' was original implemented in 20ec2d034c (reset: make
     -    sparse-aware (except --mixed), 2021-11-29) to allow callers to search for an
     -    index entry without expanding a sparse index. That particular case only
     -    required knowing whether the requested entry existed. This patch expands the
     -    amount of information returned by indicating both 1) whether the entry
     -    exists, and 2) its position (or potential position) in the index.
     +    'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make
     +    sparse-aware (except --mixed), 2021-11-29) as an alternative to
     +    'index_name_pos()' to allow callers to search for an index entry without
     +    expanding a sparse index. However, that particular use case only required
     +    knowing whether the requested entry existed, so 'index_entry_exists()' does
     +    not return the index positioning information provided by 'index_name_pos()'.
      
     -    Signed-off-by: Victoria Dye <vdye@github.com>
     +    This patch implements 'index_name_pos_sparse()' to accommodate callers that
     +    need the positioning information of 'index_name_pos()', but do not want to
     +    expand the index.
      
     - ## cache-tree.c ##
     -@@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r,
     - 			 * as normal.
     - 			 */
     - 			if (r->index->sparse_index &&
     --			    index_entry_exists(r->index, tree_path->buf, tree_path->len))
     -+			    index_name_pos_sparse(r->index, tree_path->buf, tree_path->len) >= 0)
     - 				prime_cache_tree_sparse_dir(sub->cache_tree, subtree);
     - 			else
     - 				prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path);
     +    Signed-off-by: Victoria Dye <vdye@github.com>
      
       ## cache.h ##
      @@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const char *na
     +  */
       int index_name_pos(struct index_state *, const char *name, int namelen);
       
     - /*
     -- * Determines whether an entry with the given name exists within the
     -- * given index. The return value is 1 if an exact match is found, otherwise
     -- * it is 0. Note that, unlike index_name_pos, this function does not expand
     -- * the index if it is sparse. If an item exists within the full index but it
     -- * is contained within a sparse directory (and not in the sparse index), 0 is
     -- * returned.
     -- */
     --int index_entry_exists(struct index_state *, const char *name, int namelen);
     ++/*
      + * Like index_name_pos, returns the position of an entry of the given name in
      + * the index if one exists, otherwise returns a negative value where the negated
      + * value minus 1 is the position where the index entry would be inserted. Unlike
     @@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const
      + * inside a sparse directory.
      + */
      +int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
     - 
     ++
       /*
     -  * Some functions return the negative complement of an insert position when a
     +  * Determines whether an entry with the given name exists within the
     +  * given index. The return value is 1 if an exact match is found, otherwise
      
       ## read-cache.c ##
      @@ read-cache.c: int index_name_pos(struct index_state *istate, const char *name, int namelen)
       	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
       }
       
     --int index_entry_exists(struct index_state *istate, const char *name, int namelen)
      +int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
     - {
     --	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
     ++{
      +	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
     - }
     - 
     - int remove_index_entry_at(struct index_state *istate, int pos)
     ++}
     ++
     + int index_entry_exists(struct index_state *istate, const char *name, int namelen)
     + {
     + 	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
 4:  016971a6711 ! 4:  97ca668102c unpack-trees: handle missing sparse directories
     @@ Metadata
      Author: Victoria Dye <vdye@github.com>
      
       ## Commit message ##
     -    unpack-trees: handle missing sparse directories
     +    unpack-trees: unpack new trees as sparse directories
      
     -    If a sparse directory does not exist in the index, unpack it at the
     -    directory level rather than recursing into it an unpacking its contents
     -    file-by-file. This helps keep the sparse index as collapsed as possible in
     -    cases such as 'git reset --hard' restoring a sparse directory.
     +    If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
     +    not already present in the index) into a sparse index, unpack the tree as a
     +    sparse directory rather than traversing its contents and unpacking each file
     +    individually. This helps keep the sparse index as collapsed as possible in
     +    cases such as 'git reset --hard' restoring a outside-of-cone directory
     +    removed with 'git rm -r --sparse'.
      
     -    A directory is determined to be truly non-existent in the index (rather than
     -    the parent of existing index entries), if 1) its path is outside the sparse
     -    cone and 2) there are no children of the directory in the index. This check
     -    is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
     -    directory is a missing sparse dir, 'unpack_single_entry()'  will proceed
     -    with unpacking it. This determination is also propagated back up to
     -    'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
     -    traversal into the unpacked directory.
     +    Without this patch, 'unpack_single_entry()' will only unpack a directory
     +    into the index as a sparse directory (rather than traversing into it and
     +    unpacking its files one-by-one) if an entry with the same name already
     +    exists in the index. This patch allows sparse directory unpacking without a
     +    matching index entry when the following conditions are met:
     +
     +    1. the directory's path is outside the sparse cone, and
     +    2. there are no children of the directory in the index
     +
     +    If a directory meets these requirements (as determined by
     +    'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
     +    index entry and propagates the decision back up to 'unpack_callback()' to
     +    prevent unnecessary tree traversal into the unpacked directory.
      
          Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
          Signed-off-by: Victoria Dye <vdye@github.com>
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
       }
       
      +/*
     -+ * Determine whether the path specified corresponds to a sparse directory
     -+ * completely missing from the index. This function is assumed to only be
     -+ * called when the named path isn't already in the index.
     ++ * Determine whether the path specified by 'p' should be unpacked as a new
     ++ * sparse directory in a sparse index. A new sparse directory 'A/':
     ++ * - must be outside the sparse cone.
     ++ * - must not already be in the index (i.e., no index entry with name 'A/'
     ++ *   exists).
     ++ * - must not have any child entries in the index (i.e., no index entry
     ++ *   'A/<something>' exists).
     ++ * If 'p' meets the above requirements, return 1; otherwise, return 0.
      + */
     -+static int missing_dir_is_sparse(const struct traverse_info *info,
     -+				 const struct name_entry *p)
     ++static int entry_is_new_sparse_dir(const struct traverse_info *info,
     ++				   const struct name_entry *p)
      +{
      +	int res, pos;
      +	struct strbuf dirpath = STRBUF_INIT;
      +	struct unpack_trees_options *o = info->data;
      +
     ++	if (!S_ISDIR(p->mode))
     ++		return 0;
     ++
      +	/*
     -+	 * First, check whether the path is in the sparse cone. If it is,
     -+	 * then this directory shouldn't be sparse.
     ++	 * If the path is inside the sparse cone, it can't be a sparse directory.
      +	 */
      +	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
      +	strbuf_add(&dirpath, p->path, p->pathlen);
     @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
      +		goto cleanup;
      +	}
      +
     -+	/*
     -+	 * Given that the directory is not inside the sparse cone, it could be
     -+	 * (partially) expanded in the index. If child entries exist, the path
     -+	 * is not a missing sparse directory.
     -+	 */
      +	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
     -+	if (pos >= 0)
     -+		BUG("cache entry '%s%s' shouldn't exist in the index",
     -+		    info->traverse_path, p->path);
     ++	if (pos >= 0) {
     ++		/* Path is already in the index, not a new sparse dir */
     ++		res = 0;
     ++		goto cleanup;
     ++	}
      +
     ++	/* Where would this sparse dir be inserted into the index? */
      +	pos = -pos - 1;
      +	if (pos >= o->src_index->cache_nr) {
     ++		/*
     ++		 * Sparse dir would be inserted at the end of the index, so we
     ++		 * know it has no child entries.
     ++		 */
      +		res = 1;
      +		goto cleanup;
      +	}
      +
     ++	/*
     ++	 * If the dir has child entries in the index, the first would be at the
     ++	 * position the sparse directory would be inserted. If the entry at this
     ++	 * position is inside the dir, not a new sparse dir.
     ++	 */
      +	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
      +
      +cleanup:
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
       			       const struct name_entry *names,
      -			       const struct traverse_info *info)
      +			       const struct traverse_info *info,
     -+			       int *is_missing_sparse_dir)
     ++			       int *is_new_sparse_dir)
       {
       	int i;
       	struct unpack_trees_options *o = info->data;
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
       
      -	if (mask == dirmask && !src[0])
      -		return 0;
     -+	*is_missing_sparse_dir = 0;
     ++	*is_new_sparse_dir = 0;
      +	if (mask == dirmask && !src[0]) {
      +		/*
     -+		 * If the directory is completely missing from the index but
     -+		 * would otherwise be a sparse directory, we should unpack it.
     -+		 * If not, we'll return and continue recursively traversing the
     -+		 * tree.
     ++		 * If we're not in a sparse index, we can't unpack a directory
     ++		 * without recursing into it, so we return.
      +		 */
      +		if (!o->src_index->sparse_index)
      +			return 0;
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
      +		while (!p->mode)
      +			p++;
      +
     -+		*is_missing_sparse_dir = missing_dir_is_sparse(info, p);
     -+		if (!*is_missing_sparse_dir)
     ++		/*
     ++		 * If the directory is completely missing from the index but
     ++		 * would otherwise be a sparse directory, we should unpack it.
     ++		 * If not, we'll return and continue recursively traversing the
     ++		 * tree.
     ++		 */
     ++		*is_new_sparse_dir = entry_is_new_sparse_dir(info, p);
     ++		if (!*is_new_sparse_dir)
      +			return 0;
      +	}
       
     @@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
      -	if (mask == dirmask && src[0] &&
      -	    S_ISSPARSEDIR(src[0]->ce_mode))
      +	if (mask == dirmask &&
     -+	    (*is_missing_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
     ++	    (*is_new_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
       		conflicts = 0;
       
       	/*
     @@ unpack-trees.c: static int unpack_sparse_callback(int n, unsigned long mask, uns
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
      -	int ret;
     -+	int ret, is_missing_sparse_dir;
     ++	int ret, is_new_sparse_dir;
       
       	assert(o->merge);
       
     @@ unpack-trees.c: static int unpack_sparse_callback(int n, unsigned long mask, uns
       	 * 'dirmask' accordingly.
       	 */
      -	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
     -+	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_missing_sparse_dir);
     ++	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_new_sparse_dir);
       
       	if (src[0])
       		discard_cache_entry(src[0]);
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
       	const struct name_entry *p = names;
     -+	int is_missing_sparse_dir;
     ++	int is_new_sparse_dir;
       
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       	}
       
      -	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
     -+	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_missing_sparse_dir))
     ++	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_new_sparse_dir))
       		return -1;
       
       	if (o->merge && src[0]) {
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       		}
       
       		if (!is_sparse_directory_entry(src[0], names, info) &&
     -+		    !is_missing_sparse_dir &&
     ++		    !is_new_sparse_dir &&
       		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
       						    names, info) < 0) {
       			return -1;

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
@ 2022-08-07  2:57   ` Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-07  2:57 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add the 'recursive' diff flag to the local changes reporting done by 'git
checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
sparse directories will not be recursed into to report the diff of each
file's contents, resulting in the reported local changes including
"modified" sparse directories.

The same issue was found and fixed for 'git status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01)

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/checkout.c                       | 1 +
 t/t1092-sparse-checkout-compatibility.sh | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29c74f898bf..f9d63d80b92 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -626,6 +626,7 @@ static void show_local_changes(struct object *head,
 	repo_init_revisions(the_repository, &rev, NULL);
 	rev.diffopt.flags = opts->flags;
 	rev.diffopt.output_format |= DIFF_FORMAT_NAME_STATUS;
+	rev.diffopt.flags.recursive = 1;
 	diff_setup_done(&rev.diffopt);
 	add_pending_object(&rev, head, NULL);
 	run_diff_index(&rev, 0);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 763c6cc6846..99a1425a929 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -372,6 +372,14 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
+test_expect_success 'checkout with modified sparse directory' '
+	init_repos &&
+
+	test_all_match git checkout rename-in-to-out -- . &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git checkout base
+'
+
 test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/4] oneway_diff: handle removed sparse directories
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
@ 2022-08-07  2:57   ` Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-07  2:57 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'do_oneway_diff()' to perform a 'diff_tree_oid()' on removed sparse
directories, as it does for added or modified sparse directories (see
9eb00af562 (diff-lib: handle index diffs with sparse dirs, 2021-07-14)).

At the moment, this update is unreachable code because 'unpack_trees()'
(currently the only way 'oneway_diff()' can be called, via 'diff_cache()')
will always traverse trees down to the individual removed files of a deleted
sparse directory. A subsequent patch will change this to better preserve a
sparse index in other uses of 'unpack_tree()', e.g. 'git reset --hard'.
However, making that change without this patch would result in (among other
issues) 'git status' printing only the name of a deleted sparse directory,
not its contents. To avoid introducing that bug, 'do_oneway_diff()' is
updated before modifying 'unpack_trees()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 diff-lib.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index 7eb66a417aa..2edea41a234 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -466,6 +466,11 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 	 * Something removed from the tree?
 	 */
 	if (!idx) {
+		if (S_ISSPARSEDIR(tree->ce_mode)) {
+			diff_tree_oid(&tree->oid, NULL, tree->name, &revs->diffopt);
+			return;
+		}
+
 		diff_index_show_file(revs, "-", tree, &tree->oid, 1,
 				     tree->ce_mode, 0);
 		return;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()'
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
@ 2022-08-07  2:57   ` Victoria Dye via GitGitGadget
  2022-08-07  2:57   ` [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-07  2:57 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()',
except that it does not expand a sparse index to search for an entry inside
a sparse directory.

'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make
sparse-aware (except --mixed), 2021-11-29) as an alternative to
'index_name_pos()' to allow callers to search for an index entry without
expanding a sparse index. However, that particular use case only required
knowing whether the requested entry existed, so 'index_entry_exists()' does
not return the index positioning information provided by 'index_name_pos()'.

This patch implements 'index_name_pos_sparse()' to accommodate callers that
need the positioning information of 'index_name_pos()', but do not want to
expand the index.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache.h      | 9 +++++++++
 read-cache.c | 5 +++++
 2 files changed, 14 insertions(+)

diff --git a/cache.h b/cache.h
index 4aa1bd079d5..4fa3be26393 100644
--- a/cache.h
+++ b/cache.h
@@ -830,6 +830,15 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  */
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
+/*
+ * Like index_name_pos, returns the position of an entry of the given name in
+ * the index if one exists, otherwise returns a negative value where the negated
+ * value minus 1 is the position where the index entry would be inserted. Unlike
+ * index_name_pos, however, a sparse index is not expanded to find an entry
+ * inside a sparse directory.
+ */
+int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
+
 /*
  * Determines whether an entry with the given name exists within the
  * given index. The return value is 1 if an exact match is found, otherwise
diff --git a/read-cache.c b/read-cache.c
index 4de207752dc..b09128b1884 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -620,6 +620,11 @@ int index_name_pos(struct index_state *istate, const char *name, int namelen)
 	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
 }
 
+int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
+{
+	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
+}
+
 int index_entry_exists(struct index_state *istate, const char *name, int namelen)
 {
 	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-08-07  2:57   ` [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
@ 2022-08-07  2:57   ` Victoria Dye via GitGitGadget
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-07  2:57 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
not already present in the index) into a sparse index, unpack the tree as a
sparse directory rather than traversing its contents and unpacking each file
individually. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a outside-of-cone directory
removed with 'git rm -r --sparse'.

Without this patch, 'unpack_single_entry()' will only unpack a directory
into the index as a sparse directory (rather than traversing into it and
unpacking its files one-by-one) if an entry with the same name already
exists in the index. This patch allows sparse directory unpacking without a
matching index entry when the following conditions are met:

1. the directory's path is outside the sparse cone, and
2. there are no children of the directory in the index

If a directory meets these requirements (as determined by
'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
index entry and propagates the decision back up to 'unpack_callback()' to
prevent unnecessary tree traversal into the unpacked directory.

Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  17 ++++
 unpack-trees.c                           | 106 ++++++++++++++++++++---
 2 files changed, 113 insertions(+), 10 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 99a1425a929..eb5edebfa5d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -695,6 +695,23 @@ test_expect_success 'reset with wildcard pathspec' '
 	test_all_match git ls-files -s -- folder1
 '
 
+test_expect_success 'reset hard with removed sparse dir' '
+	init_repos &&
+
+	test_all_match git rm -r --sparse folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard &&
+	test_all_match git status --porcelain=v2 &&
+
+	cat >expect <<-\EOF &&
+	folder1/
+	EOF
+
+	git -C sparse-index ls-files --sparse folder1 >out &&
+	test_cmp expect out
+'
+
 test_expect_success 'update-index modify outside sparse definition' '
 	init_repos &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a454e03bff..90b92114be8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1069,6 +1069,67 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	return ce;
 }
 
+/*
+ * Determine whether the path specified by 'p' should be unpacked as a new
+ * sparse directory in a sparse index. A new sparse directory 'A/':
+ * - must be outside the sparse cone.
+ * - must not already be in the index (i.e., no index entry with name 'A/'
+ *   exists).
+ * - must not have any child entries in the index (i.e., no index entry
+ *   'A/<something>' exists).
+ * If 'p' meets the above requirements, return 1; otherwise, return 0.
+ */
+static int entry_is_new_sparse_dir(const struct traverse_info *info,
+				   const struct name_entry *p)
+{
+	int res, pos;
+	struct strbuf dirpath = STRBUF_INIT;
+	struct unpack_trees_options *o = info->data;
+
+	if (!S_ISDIR(p->mode))
+		return 0;
+
+	/*
+	 * If the path is inside the sparse cone, it can't be a sparse directory.
+	 */
+	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
+	strbuf_add(&dirpath, p->path, p->pathlen);
+	strbuf_addch(&dirpath, '/');
+	if (path_in_cone_mode_sparse_checkout(dirpath.buf, o->src_index)) {
+		res = 0;
+		goto cleanup;
+	}
+
+	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
+	if (pos >= 0) {
+		/* Path is already in the index, not a new sparse dir */
+		res = 0;
+		goto cleanup;
+	}
+
+	/* Where would this sparse dir be inserted into the index? */
+	pos = -pos - 1;
+	if (pos >= o->src_index->cache_nr) {
+		/*
+		 * Sparse dir would be inserted at the end of the index, so we
+		 * know it has no child entries.
+		 */
+		res = 1;
+		goto cleanup;
+	}
+
+	/*
+	 * If the dir has child entries in the index, the first would be at the
+	 * position the sparse directory would be inserted. If the entry at this
+	 * position is inside the dir, not a new sparse dir.
+	 */
+	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
+
+cleanup:
+	strbuf_release(&dirpath);
+	return res;
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1078,21 +1139,44 @@ static int unpack_single_entry(int n, unsigned long mask,
 			       unsigned long dirmask,
 			       struct cache_entry **src,
 			       const struct name_entry *names,
-			       const struct traverse_info *info)
+			       const struct traverse_info *info,
+			       int *is_new_sparse_dir)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
+	const struct name_entry *p = names;
 
-	if (mask == dirmask && !src[0])
-		return 0;
+	*is_new_sparse_dir = 0;
+	if (mask == dirmask && !src[0]) {
+		/*
+		 * If we're not in a sparse index, we can't unpack a directory
+		 * without recursing into it, so we return.
+		 */
+		if (!o->src_index->sparse_index)
+			return 0;
+
+		/* Find first entry with a real name (we could use "mask" too) */
+		while (!p->mode)
+			p++;
+
+		/*
+		 * If the directory is completely missing from the index but
+		 * would otherwise be a sparse directory, we should unpack it.
+		 * If not, we'll return and continue recursively traversing the
+		 * tree.
+		 */
+		*is_new_sparse_dir = entry_is_new_sparse_dir(info, p);
+		if (!*is_new_sparse_dir)
+			return 0;
+	}
 
 	/*
-	 * When we have a sparse directory entry for src[0],
-	 * then this isn't necessarily a directory-file conflict.
+	 * When we are unpacking a sparse directory, then this isn't necessarily
+	 * a directory-file conflict.
 	 */
-	if (mask == dirmask && src[0] &&
-	    S_ISSPARSEDIR(src[0]->ce_mode))
+	if (mask == dirmask &&
+	    (*is_new_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
 		conflicts = 0;
 
 	/*
@@ -1352,7 +1436,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
-	int ret;
+	int ret, is_new_sparse_dir;
 
 	assert(o->merge);
 
@@ -1376,7 +1460,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 	 * "index" tree (i.e., names[0]) and adjust 'names', 'n', 'mask', and
 	 * 'dirmask' accordingly.
 	 */
-	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
+	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_new_sparse_dir);
 
 	if (src[0])
 		discard_cache_entry(src[0]);
@@ -1394,6 +1478,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	int is_new_sparse_dir;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1440,7 +1525,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_new_sparse_dir))
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1478,6 +1563,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 
 		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    !is_new_sparse_dir &&
 		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 						    names, info) < 0) {
 			return -1;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs
  2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-08-07  2:57   ` [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
@ 2022-08-08 19:07   ` Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
                       ` (4 more replies)
  4 siblings, 5 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-08 19:07 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye

While working on sparse index integration for 'git rm' [1], Shaoxuan found
that removed sparse directories, when reset, would no longer be sparse. This
was due to how 'unpack_trees()' determined whether a traversed directory was
a sparse directory or not; it would only unpack an entry as a sparse
directory if it existed in the index. However, if the sparse directory was
removed, it would be treated like a non-sparse directory and its contents
would be individually unpacked.

To avoid this unnecessary traversal and keep the results of 'reset' as
sparse as possible, the decision logic for whether a directory is sparse is
changed to:

 * If the directory is a sparse directory in the index, unpack it.
 * If not, is the directory inside the sparse cone? If so, do not unpack it.
 * If the directory is outside the sparse cone, does it have any child
   entries in the index? If so, do not unpack it.
 * Otherwise, unpack the entry as a sparse directory.

In the process of updating 'reset', a separate issue was found in 'checkout'
where collapsed sparse directories did not have modified contents reported
file-by-file. A similar bug was found with 'status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01), and
'checkout' was corrected the same way (setting the diff flag 'recursive' to
1).


Changes since V2
================

 * Adjusted 'reset hard with removed sparse dir' test in
   't1092-sparse-checkout-compatibility.sh' to avoid 'git rm' log message
   conflicts with [1]


Changes since V1
================

 * Reverted the removal of 'index_entry_exists()' to avoid breaking other
   in-flight series.
 * Renamed 'is_missing_sparse_dir()' to 'is_new_sparse_dir()'; revised
   comments and commit messages to clarify what that function is doing and
   why.
 * Handled "unexpected" inputs to 'is_new_sparse_dir()' more gently,
   returning 0 if 'p' is not a directory or the directory already exists in
   the index (rather than exiting with 'BUG()'). This is intended to make
   'is_new_sparse_dir()' less reliant on information about the index
   established by 'unpack_callback()' & 'unpack_single_entry()', resulting
   in easier-to-read and more reusable code.

Thanks!

 * Victoria

[1]
https://lore.kernel.org/git/20220803045118.1243087-1-shaoxuan.yuan02@gmail.com/

Victoria Dye (4):
  checkout: fix nested sparse directory diff in sparse index
  oneway_diff: handle removed sparse directories
  cache.h: create 'index_name_pos_sparse()'
  unpack-trees: unpack new trees as sparse directories

 builtin/checkout.c                       |   1 +
 cache.h                                  |   9 ++
 diff-lib.c                               |   5 ++
 read-cache.c                             |   5 ++
 t/t1092-sparse-checkout-compatibility.sh |  25 ++++++
 unpack-trees.c                           | 106 ++++++++++++++++++++---
 6 files changed, 141 insertions(+), 10 deletions(-)


base-commit: 4af7188bc97f70277d0f10d56d5373022b1fa385
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1312%2Fvdye%2Freset%2Fhandle-missing-dirs-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1312/vdye/reset/handle-missing-dirs-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1312

Range-diff vs v2:

 1:  255318f4dc6 = 1:  255318f4dc6 checkout: fix nested sparse directory diff in sparse index
 2:  55c77ba4b29 = 2:  55c77ba4b29 oneway_diff: handle removed sparse directories
 3:  d0bdec63286 = 3:  d0bdec63286 cache.h: create 'index_name_pos_sparse()'
 4:  97ca668102c ! 4:  010d5e51595 unpack-trees: unpack new trees as sparse directories
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'reset with wildca
      +test_expect_success 'reset hard with removed sparse dir' '
      +	init_repos &&
      +
     -+	test_all_match git rm -r --sparse folder1 &&
     ++	run_on_all git rm -r --sparse folder1 &&
      +	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git reset --hard &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
@ 2022-08-08 19:07     ` Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-08 19:07 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add the 'recursive' diff flag to the local changes reporting done by 'git
checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded
sparse directories will not be recursed into to report the diff of each
file's contents, resulting in the reported local changes including
"modified" sparse directories.

The same issue was found and fixed for 'git status' in 2c521b0e49 (status:
fix nested sparse directory diff in sparse index, 2022-03-01)

Signed-off-by: Victoria Dye <vdye@github.com>
---
 builtin/checkout.c                       | 1 +
 t/t1092-sparse-checkout-compatibility.sh | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29c74f898bf..f9d63d80b92 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -626,6 +626,7 @@ static void show_local_changes(struct object *head,
 	repo_init_revisions(the_repository, &rev, NULL);
 	rev.diffopt.flags = opts->flags;
 	rev.diffopt.output_format |= DIFF_FORMAT_NAME_STATUS;
+	rev.diffopt.flags.recursive = 1;
 	diff_setup_done(&rev.diffopt);
 	add_pending_object(&rev, head, NULL);
 	run_diff_index(&rev, 0);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 763c6cc6846..99a1425a929 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -372,6 +372,14 @@ test_expect_success 'deep changes during checkout' '
 	test_all_match git checkout base
 '
 
+test_expect_success 'checkout with modified sparse directory' '
+	init_repos &&
+
+	test_all_match git checkout rename-in-to-out -- . &&
+	test_sparse_match git sparse-checkout reapply &&
+	test_all_match git checkout base
+'
+
 test_expect_success 'add outside sparse cone' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/4] oneway_diff: handle removed sparse directories
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
@ 2022-08-08 19:07     ` Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-08 19:07 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Update 'do_oneway_diff()' to perform a 'diff_tree_oid()' on removed sparse
directories, as it does for added or modified sparse directories (see
9eb00af562 (diff-lib: handle index diffs with sparse dirs, 2021-07-14)).

At the moment, this update is unreachable code because 'unpack_trees()'
(currently the only way 'oneway_diff()' can be called, via 'diff_cache()')
will always traverse trees down to the individual removed files of a deleted
sparse directory. A subsequent patch will change this to better preserve a
sparse index in other uses of 'unpack_tree()', e.g. 'git reset --hard'.
However, making that change without this patch would result in (among other
issues) 'git status' printing only the name of a deleted sparse directory,
not its contents. To avoid introducing that bug, 'do_oneway_diff()' is
updated before modifying 'unpack_trees()'.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 diff-lib.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index 7eb66a417aa..2edea41a234 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -466,6 +466,11 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 	 * Something removed from the tree?
 	 */
 	if (!idx) {
+		if (S_ISSPARSEDIR(tree->ce_mode)) {
+			diff_tree_oid(&tree->oid, NULL, tree->name, &revs->diffopt);
+			return;
+		}
+
 		diff_index_show_file(revs, "-", tree, &tree->oid, 1,
 				     tree->ce_mode, 0);
 		return;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()'
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
@ 2022-08-08 19:07     ` Victoria Dye via GitGitGadget
  2022-08-08 19:07     ` [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
  2022-08-08 21:17     ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-08 19:07 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()',
except that it does not expand a sparse index to search for an entry inside
a sparse directory.

'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make
sparse-aware (except --mixed), 2021-11-29) as an alternative to
'index_name_pos()' to allow callers to search for an index entry without
expanding a sparse index. However, that particular use case only required
knowing whether the requested entry existed, so 'index_entry_exists()' does
not return the index positioning information provided by 'index_name_pos()'.

This patch implements 'index_name_pos_sparse()' to accommodate callers that
need the positioning information of 'index_name_pos()', but do not want to
expand the index.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 cache.h      | 9 +++++++++
 read-cache.c | 5 +++++
 2 files changed, 14 insertions(+)

diff --git a/cache.h b/cache.h
index 4aa1bd079d5..4fa3be26393 100644
--- a/cache.h
+++ b/cache.h
@@ -830,6 +830,15 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  */
 int index_name_pos(struct index_state *, const char *name, int namelen);
 
+/*
+ * Like index_name_pos, returns the position of an entry of the given name in
+ * the index if one exists, otherwise returns a negative value where the negated
+ * value minus 1 is the position where the index entry would be inserted. Unlike
+ * index_name_pos, however, a sparse index is not expanded to find an entry
+ * inside a sparse directory.
+ */
+int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
+
 /*
  * Determines whether an entry with the given name exists within the
  * given index. The return value is 1 if an exact match is found, otherwise
diff --git a/read-cache.c b/read-cache.c
index 4de207752dc..b09128b1884 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -620,6 +620,11 @@ int index_name_pos(struct index_state *istate, const char *name, int namelen)
 	return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE);
 }
 
+int index_name_pos_sparse(struct index_state *istate, const char *name, int namelen)
+{
+	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE);
+}
+
 int index_entry_exists(struct index_state *istate, const char *name, int namelen)
 {
 	return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-08-08 19:07     ` [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
@ 2022-08-08 19:07     ` Victoria Dye via GitGitGadget
  2022-08-08 21:17     ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
  4 siblings, 0 replies; 23+ messages in thread
From: Victoria Dye via GitGitGadget @ 2022-08-08 19:07 UTC (permalink / raw)
  To: git
  Cc: derrickstolee, shaoxuan.yuan02, newren, gitster, Victoria Dye,
	Victoria Dye

From: Victoria Dye <vdye@github.com>

If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
not already present in the index) into a sparse index, unpack the tree as a
sparse directory rather than traversing its contents and unpacking each file
individually. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a outside-of-cone directory
removed with 'git rm -r --sparse'.

Without this patch, 'unpack_single_entry()' will only unpack a directory
into the index as a sparse directory (rather than traversing into it and
unpacking its files one-by-one) if an entry with the same name already
exists in the index. This patch allows sparse directory unpacking without a
matching index entry when the following conditions are met:

1. the directory's path is outside the sparse cone, and
2. there are no children of the directory in the index

If a directory meets these requirements (as determined by
'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
index entry and propagates the decision back up to 'unpack_callback()' to
prevent unnecessary tree traversal into the unpacked directory.

Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh |  17 ++++
 unpack-trees.c                           | 106 ++++++++++++++++++++---
 2 files changed, 113 insertions(+), 10 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 99a1425a929..3588dd7b102 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -695,6 +695,23 @@ test_expect_success 'reset with wildcard pathspec' '
 	test_all_match git ls-files -s -- folder1
 '
 
+test_expect_success 'reset hard with removed sparse dir' '
+	init_repos &&
+
+	run_on_all git rm -r --sparse folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard &&
+	test_all_match git status --porcelain=v2 &&
+
+	cat >expect <<-\EOF &&
+	folder1/
+	EOF
+
+	git -C sparse-index ls-files --sparse folder1 >out &&
+	test_cmp expect out
+'
+
 test_expect_success 'update-index modify outside sparse definition' '
 	init_repos &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a454e03bff..90b92114be8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1069,6 +1069,67 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	return ce;
 }
 
+/*
+ * Determine whether the path specified by 'p' should be unpacked as a new
+ * sparse directory in a sparse index. A new sparse directory 'A/':
+ * - must be outside the sparse cone.
+ * - must not already be in the index (i.e., no index entry with name 'A/'
+ *   exists).
+ * - must not have any child entries in the index (i.e., no index entry
+ *   'A/<something>' exists).
+ * If 'p' meets the above requirements, return 1; otherwise, return 0.
+ */
+static int entry_is_new_sparse_dir(const struct traverse_info *info,
+				   const struct name_entry *p)
+{
+	int res, pos;
+	struct strbuf dirpath = STRBUF_INIT;
+	struct unpack_trees_options *o = info->data;
+
+	if (!S_ISDIR(p->mode))
+		return 0;
+
+	/*
+	 * If the path is inside the sparse cone, it can't be a sparse directory.
+	 */
+	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
+	strbuf_add(&dirpath, p->path, p->pathlen);
+	strbuf_addch(&dirpath, '/');
+	if (path_in_cone_mode_sparse_checkout(dirpath.buf, o->src_index)) {
+		res = 0;
+		goto cleanup;
+	}
+
+	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
+	if (pos >= 0) {
+		/* Path is already in the index, not a new sparse dir */
+		res = 0;
+		goto cleanup;
+	}
+
+	/* Where would this sparse dir be inserted into the index? */
+	pos = -pos - 1;
+	if (pos >= o->src_index->cache_nr) {
+		/*
+		 * Sparse dir would be inserted at the end of the index, so we
+		 * know it has no child entries.
+		 */
+		res = 1;
+		goto cleanup;
+	}
+
+	/*
+	 * If the dir has child entries in the index, the first would be at the
+	 * position the sparse directory would be inserted. If the entry at this
+	 * position is inside the dir, not a new sparse dir.
+	 */
+	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
+
+cleanup:
+	strbuf_release(&dirpath);
+	return res;
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1078,21 +1139,44 @@ static int unpack_single_entry(int n, unsigned long mask,
 			       unsigned long dirmask,
 			       struct cache_entry **src,
 			       const struct name_entry *names,
-			       const struct traverse_info *info)
+			       const struct traverse_info *info,
+			       int *is_new_sparse_dir)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
+	const struct name_entry *p = names;
 
-	if (mask == dirmask && !src[0])
-		return 0;
+	*is_new_sparse_dir = 0;
+	if (mask == dirmask && !src[0]) {
+		/*
+		 * If we're not in a sparse index, we can't unpack a directory
+		 * without recursing into it, so we return.
+		 */
+		if (!o->src_index->sparse_index)
+			return 0;
+
+		/* Find first entry with a real name (we could use "mask" too) */
+		while (!p->mode)
+			p++;
+
+		/*
+		 * If the directory is completely missing from the index but
+		 * would otherwise be a sparse directory, we should unpack it.
+		 * If not, we'll return and continue recursively traversing the
+		 * tree.
+		 */
+		*is_new_sparse_dir = entry_is_new_sparse_dir(info, p);
+		if (!*is_new_sparse_dir)
+			return 0;
+	}
 
 	/*
-	 * When we have a sparse directory entry for src[0],
-	 * then this isn't necessarily a directory-file conflict.
+	 * When we are unpacking a sparse directory, then this isn't necessarily
+	 * a directory-file conflict.
 	 */
-	if (mask == dirmask && src[0] &&
-	    S_ISSPARSEDIR(src[0]->ce_mode))
+	if (mask == dirmask &&
+	    (*is_new_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
 		conflicts = 0;
 
 	/*
@@ -1352,7 +1436,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
-	int ret;
+	int ret, is_new_sparse_dir;
 
 	assert(o->merge);
 
@@ -1376,7 +1460,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 	 * "index" tree (i.e., names[0]) and adjust 'names', 'n', 'mask', and
 	 * 'dirmask' accordingly.
 	 */
-	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
+	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_new_sparse_dir);
 
 	if (src[0])
 		discard_cache_entry(src[0]);
@@ -1394,6 +1478,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	int is_new_sparse_dir;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1440,7 +1525,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_new_sparse_dir))
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1478,6 +1563,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 
 		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    !is_new_sparse_dir &&
 		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 						    names, info) < 0) {
 			return -1;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs
  2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-08-08 19:07     ` [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
@ 2022-08-08 21:17     ` Junio C Hamano
  2022-08-09 13:20       ` Derrick Stolee
  4 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2022-08-08 21:17 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget
  Cc: git, derrickstolee, shaoxuan.yuan02, newren, Victoria Dye

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Changes since V2
> ================
>
>  * Adjusted 'reset hard with removed sparse dir' test in
>    't1092-sparse-checkout-compatibility.sh' to avoid 'git rm' log message
>    conflicts with [1]

The topic looks quite ready for 'next'.

Thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs
  2022-08-08 21:17     ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
@ 2022-08-09 13:20       ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2022-08-09 13:20 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye via GitGitGadget
  Cc: git, shaoxuan.yuan02, newren, Victoria Dye

On 8/8/2022 5:17 PM, Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Changes since V2
>> ================
>>
>>  * Adjusted 'reset hard with removed sparse dir' test in
>>    't1092-sparse-checkout-compatibility.sh' to avoid 'git rm' log message
>>    conflicts with [1]
> 
> The topic looks quite ready for 'next'.

I agree. Thank you!

-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-08-09 13:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-05 17:59   ` Derrick Stolee
2022-08-04 20:46 ` [PATCH 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-04 22:16   ` Junio C Hamano
2022-08-06  0:09   ` Junio C Hamano
2022-08-04 20:46 ` [PATCH 4/4] unpack-trees: handle missing sparse directories Victoria Dye via GitGitGadget
2022-08-04 23:23   ` Junio C Hamano
2022-08-05 16:36     ` Victoria Dye
2022-08-05 19:24       ` Junio C Hamano
2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 21:17     ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
2022-08-09 13:20       ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).