git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: derrickstolee@github.com, shaoxuan.yuan02@gmail.com,
	newren@gmail.com, gitster@pobox.com,
	Victoria Dye <vdye@github.com>, Victoria Dye <vdye@github.com>
Subject: [PATCH 4/4] unpack-trees: handle missing sparse directories
Date: Thu, 04 Aug 2022 20:46:07 +0000	[thread overview]
Message-ID: <016971a67112efe2d15fe7908e86c5d2631f8e66.1659645967.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1312.git.1659645967.gitgitgadget@gmail.com>

From: Victoria Dye <vdye@github.com>

If a sparse directory does not exist in the index, unpack it at the
directory level rather than recursing into it an unpacking its contents
file-by-file. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a sparse directory.

A directory is determined to be truly non-existent in the index (rather than
the parent of existing index entries), if 1) its path is outside the sparse
cone and 2) there are no children of the directory in the index. This check
is performed by 'missing_dir_is_sparse()' in 'unpack_single_entry()'. If the
directory is a missing sparse dir, 'unpack_single_entry()'  will proceed
with unpacking it. This determination is also propagated back up to
'unpack_callback()' via 'is_missing_sparse_dir' to prevent further tree
traversal into the unpacked directory.

Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 17 +++++
 unpack-trees.c                           | 88 +++++++++++++++++++++---
 2 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 99a1425a929..eb5edebfa5d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -695,6 +695,23 @@ test_expect_success 'reset with wildcard pathspec' '
 	test_all_match git ls-files -s -- folder1
 '
 
+test_expect_success 'reset hard with removed sparse dir' '
+	init_repos &&
+
+	test_all_match git rm -r --sparse folder1 &&
+	test_all_match git status --porcelain=v2 &&
+
+	test_all_match git reset --hard &&
+	test_all_match git status --porcelain=v2 &&
+
+	cat >expect <<-\EOF &&
+	folder1/
+	EOF
+
+	git -C sparse-index ls-files --sparse folder1 >out &&
+	test_cmp expect out
+'
+
 test_expect_success 'update-index modify outside sparse definition' '
 	init_repos &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 8a454e03bff..aa62cef20fe 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1069,6 +1069,53 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	return ce;
 }
 
+/*
+ * Determine whether the path specified corresponds to a sparse directory
+ * completely missing from the index. This function is assumed to only be
+ * called when the named path isn't already in the index.
+ */
+static int missing_dir_is_sparse(const struct traverse_info *info,
+				 const struct name_entry *p)
+{
+	int res, pos;
+	struct strbuf dirpath = STRBUF_INIT;
+	struct unpack_trees_options *o = info->data;
+
+	/*
+	 * First, check whether the path is in the sparse cone. If it is,
+	 * then this directory shouldn't be sparse.
+	 */
+	strbuf_add(&dirpath, info->traverse_path, info->pathlen);
+	strbuf_add(&dirpath, p->path, p->pathlen);
+	strbuf_addch(&dirpath, '/');
+	if (path_in_cone_mode_sparse_checkout(dirpath.buf, o->src_index)) {
+		res = 0;
+		goto cleanup;
+	}
+
+	/*
+	 * Given that the directory is not inside the sparse cone, it could be
+	 * (partially) expanded in the index. If child entries exist, the path
+	 * is not a missing sparse directory.
+	 */
+	pos = index_name_pos_sparse(o->src_index, dirpath.buf, dirpath.len);
+	if (pos >= 0)
+		BUG("cache entry '%s%s' shouldn't exist in the index",
+		    info->traverse_path, p->path);
+
+	pos = -pos - 1;
+	if (pos >= o->src_index->cache_nr) {
+		res = 1;
+		goto cleanup;
+	}
+
+	res = strncmp(o->src_index->cache[pos]->name, dirpath.buf, dirpath.len);
+
+cleanup:
+	strbuf_release(&dirpath);
+	return res;
+}
+
 /*
  * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
@@ -1078,21 +1125,40 @@ static int unpack_single_entry(int n, unsigned long mask,
 			       unsigned long dirmask,
 			       struct cache_entry **src,
 			       const struct name_entry *names,
-			       const struct traverse_info *info)
+			       const struct traverse_info *info,
+			       int *is_missing_sparse_dir)
 {
 	int i;
 	struct unpack_trees_options *o = info->data;
 	unsigned long conflicts = info->df_conflicts | dirmask;
+	const struct name_entry *p = names;
 
-	if (mask == dirmask && !src[0])
-		return 0;
+	*is_missing_sparse_dir = 0;
+	if (mask == dirmask && !src[0]) {
+		/*
+		 * If the directory is completely missing from the index but
+		 * would otherwise be a sparse directory, we should unpack it.
+		 * If not, we'll return and continue recursively traversing the
+		 * tree.
+		 */
+		if (!o->src_index->sparse_index)
+			return 0;
+
+		/* Find first entry with a real name (we could use "mask" too) */
+		while (!p->mode)
+			p++;
+
+		*is_missing_sparse_dir = missing_dir_is_sparse(info, p);
+		if (!*is_missing_sparse_dir)
+			return 0;
+	}
 
 	/*
-	 * When we have a sparse directory entry for src[0],
-	 * then this isn't necessarily a directory-file conflict.
+	 * When we are unpacking a sparse directory, then this isn't necessarily
+	 * a directory-file conflict.
 	 */
-	if (mask == dirmask && src[0] &&
-	    S_ISSPARSEDIR(src[0]->ce_mode))
+	if (mask == dirmask &&
+	    (*is_missing_sparse_dir || (src[0] && S_ISSPARSEDIR(src[0]->ce_mode))))
 		conflicts = 0;
 
 	/*
@@ -1352,7 +1418,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
-	int ret;
+	int ret, is_missing_sparse_dir;
 
 	assert(o->merge);
 
@@ -1376,7 +1442,7 @@ static int unpack_sparse_callback(int n, unsigned long mask, unsigned long dirma
 	 * "index" tree (i.e., names[0]) and adjust 'names', 'n', 'mask', and
 	 * 'dirmask' accordingly.
 	 */
-	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info);
+	ret = unpack_single_entry(n - 1, mask >> 1, dirmask >> 1, src, names + 1, info, &is_missing_sparse_dir);
 
 	if (src[0])
 		discard_cache_entry(src[0]);
@@ -1394,6 +1460,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
 	const struct name_entry *p = names;
+	int is_missing_sparse_dir;
 
 	/* Find first entry with a real name (we could use "mask" too) */
 	while (!p->mode)
@@ -1440,7 +1507,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 	}
 
-	if (unpack_single_entry(n, mask, dirmask, src, names, info) < 0)
+	if (unpack_single_entry(n, mask, dirmask, src, names, info, &is_missing_sparse_dir))
 		return -1;
 
 	if (o->merge && src[0]) {
@@ -1478,6 +1545,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 		}
 
 		if (!is_sparse_directory_entry(src[0], names, info) &&
+		    !is_missing_sparse_dir &&
 		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
 						    names, info) < 0) {
 			return -1;
-- 
gitgitgadget

  parent reply	other threads:[~2022-08-04 20:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-04 20:46 [PATCH 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-05 17:59   ` Derrick Stolee
2022-08-04 20:46 ` [PATCH 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-04 20:46 ` [PATCH 3/4] cache.h: replace 'index_entry_exists()' with 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-04 22:16   ` Junio C Hamano
2022-08-06  0:09   ` Junio C Hamano
2022-08-04 20:46 ` Victoria Dye via GitGitGadget [this message]
2022-08-04 23:23   ` [PATCH 4/4] unpack-trees: handle missing sparse directories Junio C Hamano
2022-08-05 16:36     ` Victoria Dye
2022-08-05 19:24       ` Junio C Hamano
2022-08-07  2:57 ` [PATCH v2 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-07  2:57   ` [PATCH v2 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07   ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 1/4] checkout: fix nested sparse directory diff in sparse index Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 2/4] oneway_diff: handle removed sparse directories Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 3/4] cache.h: create 'index_name_pos_sparse()' Victoria Dye via GitGitGadget
2022-08-08 19:07     ` [PATCH v3 4/4] unpack-trees: unpack new trees as sparse directories Victoria Dye via GitGitGadget
2022-08-08 21:17     ` [PATCH v3 0/4] reset/checkout: fix miscellaneous sparse index bugs Junio C Hamano
2022-08-09 13:20       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=016971a67112efe2d15fe7908e86c5d2631f8e66.1659645967.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).