git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] Sparse index: integrate with sparse-checkout
@ 2022-05-16 18:11 Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
                   ` (8 more replies)
  0 siblings, 9 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee

This series is based on ds/sparse-colon-path.

This series integrates the 'git sparse-checkout' builtin with the sparse
index. This is the last integration that we fast-tracked into the
microsoft/git fork. After this, we have no work in flight that would
conflict with a Google Summer of Code project in this area.

The tricky part about the sparse-checkout builtin is that we actually do
need to expand the index when growing the sparse-checkout boundary. The
trick is to expand it only as far as we need it, and then ensure that we
collapse any removed directories before the command completes.

To do this, we introduce a concept of a "partially expanded" index. In fact,
we break the boolean sparse_index member into an enum with three states:

 * COMPLETELY_FULL (0): No sparse directories exist.

 * COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
   sparse-checkout cone are reduced to sparse directory entries whenever
   possible.

 * PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
   outside the sparse-checkout cone may exist. Running convert_to_sparse()
   may further reduce those files to sparse directory entries.

Most of the patches in this series focus on introducing this enum and
carefully converting previous uses of the boolean to use the enum. Some
additional work is required to refactor ensure_full_index() into a new
expand_to_pattern_list() method, as they are doing essentially the same
thing, but with different scopes.

The result is improved performance on the sparse-checkout builtin as
demonstrated in a new p2000-sparse-operations.sh performance test:


Test HEAD~1 HEAD
================

2000.24: git sparse-checkout ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53)
-26.6% 2000.25: git sparse-checkout ... (sparse-v4) 2.20(1.62+0.57)
1.58(0.98+0.59) -28.2%

The improvement here is less dramatic because the operation is dominated by
writing and deleting a lot of files in the worktree. A repeated no-op
operation such as git sparse-checkout set $SPARSE_CONE would show a greater
improvement, but is less interesting since it could gain that improvement
without satisfying the "hard" parts of this builtin.

I specifically chose how to update the tests in t1092 and p2000 to avoid
conflicts with Victoria's 'git stash' work.

Thanks, -Stolee

Derrick Stolee (8):
  sparse-index: create expand_to_pattern_list()
  sparse-index: introduce partially-sparse indexes
  cache-tree: implement cache_tree_find_path()
  sparse-checkout: --no-sparse-index needs a full index
  sparse-index: partially expand directories
  sparse-index: complete partial expansion
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-checkout: integrate with sparse index

 builtin/sparse-checkout.c                |   8 +-
 cache-tree.c                             |  24 +++++
 cache-tree.h                             |   2 +
 cache.h                                  |  32 ++++--
 read-cache.c                             |   6 +-
 sparse-index.c                           | 126 ++++++++++++++++++++---
 sparse-index.h                           |  14 +++
 t/perf/p2000-sparse-operations.sh        |   1 +
 t/t1092-sparse-checkout-compatibility.sh |  25 +++++
 unpack-trees.c                           |   4 +
 10 files changed, 214 insertions(+), 28 deletions(-)


base-commit: 124b05b23005437fa5fb91863bde2a8f5840e164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1208%2Fderrickstolee%2Fsparse-index%2Fsparse-checkout-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1208/derrickstolee/sparse-index/sparse-checkout-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1208
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/8] sparse-index: create expand_to_pattern_list()
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 20:36   ` Victoria Dye
  2022-05-16 18:11 ` [PATCH 2/8] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This is the first change in a series to allow modifying the
sparse-checkout pattern set without expanding a sparse index to a full
one in the process. Here, we focus on the problem of expanding the
pattern set through a command like 'git sparse-checkout add <path>'
which needs to create new index entries for the paths now being written
to the worktree.

To achieve this, we need to be able to replace sparse directory entries
with their contained files and subdirectories. Once this is complete,
other code paths can discover those cache entries and write the
corresponding files to disk before committing the index.

We already have logic in ensure_full_index() that expands the index
entries, so we will use that as our base. Create a new method,
expand_to_pattern_list(), which takes a pattern list, but for now mostly
ignores it. The current implementation is only correct when the pattern
list is NULL as that does the same as ensure_full_index(). In fact,
ensure_full_index() is converted to a shim over
expand_to_pattern_list().

A future update will actually implement expand_to_pattern_list() to its
full capabilities. For now, it is created and documented.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 35 ++++++++++++++++++++++++++++++++---
 sparse-index.h | 14 ++++++++++++++
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 8636af72de5..37c7df877a6 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -248,19 +248,41 @@ static int add_path_to_index(const struct object_id *oid,
 	return 0;
 }
 
-void ensure_full_index(struct index_state *istate)
+void expand_to_pattern_list(struct index_state *istate,
+			      struct pattern_list *pl)
 {
 	int i;
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
 
+	/*
+	 * If the index is already full, then keep it full. We will convert
+	 * it to a sparse index on write, if possible.
+	 */
 	if (!istate || !istate->sparse_index)
 		return;
 
+	/*
+	 * If our index is sparse, but our new pattern set does not use
+	 * cone mode patterns, then we need to expand the index before we
+	 * continue. A NULL pattern set indicates a full expansion to a
+	 * full index.
+	 */
+	if (pl && !pl->use_cone_patterns)
+		pl = NULL;
+
 	if (!istate->repo)
 		istate->repo = the_repository;
 
-	trace2_region_enter("index", "ensure_full_index", istate->repo);
+	/*
+	 * A NULL pattern set indicates we are expanding a full index, so
+	 * we use a special region name that indicates the full expansion.
+	 * This is used by test cases, but also helps to differentiate the
+	 * two cases.
+	 */
+	trace2_region_enter("index",
+			    pl ? "expand_to_pattern_list" : "ensure_full_index",
+			    istate->repo);
 
 	/* initialize basics of new index */
 	full = xcalloc(1, sizeof(struct index_state));
@@ -322,7 +344,14 @@ void ensure_full_index(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
-	trace2_region_leave("index", "ensure_full_index", istate->repo);
+	trace2_region_leave("index",
+			    pl ? "expand_to_pattern_list" : "ensure_full_index",
+			    istate->repo);
+}
+
+void ensure_full_index(struct index_state *istate)
+{
+	expand_to_pattern_list(istate, NULL);
 }
 
 void ensure_correct_sparsity(struct index_state *istate)
diff --git a/sparse-index.h b/sparse-index.h
index 633d4fb7e31..037b541f49d 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -23,4 +23,18 @@ void expand_to_path(struct index_state *istate,
 struct repository;
 int set_sparse_index_config(struct repository *repo, int enable);
 
+struct pattern_list;
+
+/**
+ * Scan the given index and compare its entries to the given pattern list.
+ * If the index is sparse and the pattern list uses cone mode patterns,
+ * then modify the index to contain the all of the file entries within that
+ * new pattern list. This expands sparse directories only as far as needed.
+ *
+ * If the pattern list is NULL or does not use cone mode patterns, then the
+ * index is expanded to a full index.
+ */
+void expand_to_pattern_list(struct index_state *istate,
+			      struct pattern_list *pl);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/8] sparse-index: introduce partially-sparse indexes
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 3/8] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* COMPLETELY_FULL (0): No sparse directories exist.

* COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
  outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
COMPLETELY_SPARSE mode but to actually do the necessary work when in
PARTIALLY_SPARSE mode.

The PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c |  2 +-
 cache.h                   | 32 ++++++++++++++++++++++++--------
 read-cache.c              |  6 +++---
 sparse-index.c            |  6 +++---
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 0217d44c5b1..88eea069ad4 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -128,7 +128,7 @@ static void clean_tracked_sparse_directories(struct repository *r)
 	 * sparse index will not delete directories that contain
 	 * conflicted entries or submodules.
 	 */
-	if (!r->index->sparse_index) {
+	if (r->index->sparse_index == COMPLETELY_FULL) {
 		/*
 		 * If something, such as a merge conflict or other concern,
 		 * prevents us from converting to a sparse index, then do
diff --git a/cache.h b/cache.h
index 6226f6a8a53..2d067aca2fd 100644
--- a/cache.h
+++ b/cache.h
@@ -310,6 +310,28 @@ struct untracked_cache;
 struct progress;
 struct pattern_list;
 
+enum sparse_index_mode {
+	/*
+	 * COMPLETELY_FULL: there are no sparse directories
+	 * in the index at all.
+	 */
+	COMPLETELY_FULL = 0,
+
+	/*
+	 * COLLAPSED: the index has already been collapsed to sparse
+	 * directories whereever possible.
+	 */
+	COLLAPSED = 1,
+
+	/*
+	 * PARTIALLY_SPARSE: the sparse directories that exist are
+	 * outside the sparse-checkout boundary, but it is possible
+	 * that some file entries could collapse to sparse directory
+	 * entries.
+	 */
+	PARTIALLY_SPARSE = 2,
+};
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -323,14 +345,8 @@ struct index_state {
 		 drop_cache_tree : 1,
 		 updated_workdir : 1,
 		 updated_skipworktree : 1,
-		 fsmonitor_has_run_once : 1,
-
-		 /*
-		  * sparse_index == 1 when sparse-directory
-		  * entries exist. Requires sparse-checkout
-		  * in cone mode.
-		  */
-		 sparse_index : 1;
+		 fsmonitor_has_run_once : 1;
+	enum sparse_index_mode sparse_index;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	struct object_id oid;
diff --git a/read-cache.c b/read-cache.c
index 4df97e185e9..cb9b33169fd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -112,7 +112,7 @@ static const char *alternate_index_output;
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	if (S_ISSPARSEDIR(ce->ce_mode))
-		istate->sparse_index = 1;
+		istate->sparse_index = COLLAPSED;
 
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
@@ -1856,7 +1856,7 @@ static int read_index_extension(struct index_state *istate,
 		break;
 	case CACHE_EXT_SPARSE_DIRECTORIES:
 		/* no content, only an indicator */
-		istate->sparse_index = 1;
+		istate->sparse_index = COLLAPSED;
 		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
@@ -3149,7 +3149,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 				 unsigned flags)
 {
 	int ret;
-	int was_full = !istate->sparse_index;
+	int was_full = istate->sparse_index == COMPLETELY_FULL;
 
 	ret = convert_to_sparse(istate, 0);
 
diff --git a/sparse-index.c b/sparse-index.c
index 37c7df877a6..79e8ff087bc 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -173,7 +173,7 @@ int convert_to_sparse(struct index_state *istate, int flags)
 	 * If the index is already sparse, empty, or otherwise
 	 * cannot be converted to sparse, do not convert.
 	 */
-	if (istate->sparse_index || !istate->cache_nr ||
+	if (istate->sparse_index == COLLAPSED || !istate->cache_nr ||
 	    !is_sparse_index_allowed(istate, flags))
 		return 0;
 
@@ -214,7 +214,7 @@ int convert_to_sparse(struct index_state *istate, int flags)
 	FREE_AND_NULL(istate->fsmonitor_dirty);
 	FREE_AND_NULL(istate->fsmonitor_last_update);
 
-	istate->sparse_index = 1;
+	istate->sparse_index = COLLAPSED;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
 }
@@ -259,7 +259,7 @@ void expand_to_pattern_list(struct index_state *istate,
 	 * If the index is already full, then keep it full. We will convert
 	 * it to a sparse index on write, if possible.
 	 */
-	if (!istate || !istate->sparse_index)
+	if (!istate || istate->sparse_index == COMPLETELY_FULL)
 		return;
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/8] cache-tree: implement cache_tree_find_path()
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 2/8] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.

The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way.

This new method is not currently used, but will be in an upcoming
change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 cache-tree.c | 24 ++++++++++++++++++++++++
 cache-tree.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index 6752f69d515..23893a7b113 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -100,6 +100,30 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *it, const char *path)
 	return find_subtree(it, path, pathlen, 1);
 }
 
+struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path)
+{
+	const char *slash;
+	int namelen;
+	struct cache_tree_sub *down;
+
+	if (!it)
+		return NULL;
+	slash = strchrnul(path, '/');
+	namelen = slash - path;
+	it->entry_count = -1;
+	if (!*slash) {
+		int pos;
+		pos = cache_tree_subtree_pos(it, path, namelen);
+		if (0 <= pos)
+			return it->down[pos]->cache_tree;
+		return NULL;
+	}
+	down = find_subtree(it, path, namelen, 0);
+	if (down)
+		return cache_tree_find_path(down->cache_tree, slash + 1);
+	return NULL;
+}
+
 static int do_invalidate_path(struct cache_tree *it, const char *path)
 {
 	/* a/b/c
diff --git a/cache-tree.h b/cache-tree.h
index 8efeccebfc9..f75f8e74dcd 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -29,6 +29,8 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
 
+struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path);
+
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-05-16 18:11 ` [PATCH 3/8] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When the --no-sparse-index option is supplied, the sparse-checkout
builtin should explicitly ask to expand a sparse index to a full one.
This is currently done implicitly due to the command_requires_full_index
protection, but that will be removed in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 88eea069ad4..cbff6ad00b0 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -413,6 +413,9 @@ static int update_modes(int *cone_mode, int *sparse_index)
 		/* force an index rewrite */
 		repo_read_index(the_repository);
 		the_repository->index->updated_workdir = 1;
+
+		if (!*sparse_index)
+			ensure_full_index(the_repository->index);
 	}
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/8] sparse-index: partially expand directories
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-05-16 18:11 ` [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 20:36   ` Victoria Dye
  2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate->sparse_checkout_patterns.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 46 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 79e8ff087bc..3d8eed585b5 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -9,6 +9,11 @@
 #include "dir.h"
 #include "fsmonitor.h"
 
+struct modify_index_context {
+	struct index_state *write;
+	struct pattern_list *pl;
+};
+
 static struct cache_entry *construct_sparse_dir_entry(
 				struct index_state *istate,
 				const char *sparse_dir,
@@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
 			     struct strbuf *base, const char *path,
 			     unsigned int mode, void *context)
 {
-	struct index_state *istate = (struct index_state *)context;
+	struct modify_index_context *ctx = (struct modify_index_context *)context;
 	struct cache_entry *ce;
 	size_t len = base->len;
 
-	if (S_ISDIR(mode))
-		return READ_TREE_RECURSIVE;
+	if (S_ISDIR(mode)) {
+		int dtype;
+		size_t baselen = base->len;
+		if (!ctx->pl)
+			return READ_TREE_RECURSIVE;
 
-	strbuf_addstr(base, path);
+		/*
+		 * Have we expanded to a point outside of the sparse-checkout?
+		 */
+		strbuf_addstr(base, path);
+		strbuf_add(base, "/-", 2);
+
+		if (path_matches_pattern_list(base->buf, base->len,
+					      NULL, &dtype,
+					      ctx->pl, ctx->write)) {
+			strbuf_setlen(base, baselen);
+			return READ_TREE_RECURSIVE;
+		}
 
-	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
+		/*
+		 * The path "{base}{path}/" is a sparse directory. Create the correct
+		 * name for inserting the entry into the idnex.
+		 */
+		strbuf_setlen(base, base->len - 1);
+	} else {
+		strbuf_addstr(base, path);
+	}
+
+	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
 	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
-	set_index_entry(istate, istate->cache_nr++, ce);
+	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
 	return 0;
@@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate,
 	int i;
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
+	struct modify_index_context ctx;
 
 	/*
 	 * If the index is already full, then keep it full. We will convert
@@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate,
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
 
+	ctx.write = full;
+	ctx.pl = pl;
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
@@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate,
 		strbuf_add(&base, ce->name, strlen(ce->name));
 
 		read_tree_at(istate->repo, tree, &base, &ps,
-			     add_path_to_index, full);
+			     add_path_to_index, &ctx);
 
 		/* free directory entries. full entries are re-used */
 		discard_cache_entry(ce);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 6/8] sparse-index: complete partial expansion
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 20:38   ` Victoria Dye
  2022-05-16 18:11 ` [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

To complete the implementation of expand_to_pattern_list(), we need to
detect when a sparse directory entry should remain sparse. This avoids a
full expansion, so we now need to use the PARTIALLY_SPARSE mode to
indicate this state.

There still are no callers to this method, but we will add one in the
next change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 3d8eed585b5..0bad5503304 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -297,8 +297,24 @@ void expand_to_pattern_list(struct index_state *istate,
 	 * continue. A NULL pattern set indicates a full expansion to a
 	 * full index.
 	 */
-	if (pl && !pl->use_cone_patterns)
+	if (pl && !pl->use_cone_patterns) {
 		pl = NULL;
+	} else {
+		/*
+		 * We might contract file entries into sparse-directory
+		 * entries, and for that we will need the cache tree to
+		 * be recomputed.
+		 */
+		cache_tree_free(&istate->cache_tree);
+
+		/*
+		 * If there is a problem creating the cache tree, then we
+		 * need to expand to a full index since we cannot satisfy
+		 * the current request as a sparse index.
+		 */
+		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
+			pl = NULL;
+	}
 
 	if (!istate->repo)
 		istate->repo = the_repository;
@@ -317,8 +333,14 @@ void expand_to_pattern_list(struct index_state *istate,
 	full = xcalloc(1, sizeof(struct index_state));
 	memcpy(full, istate, sizeof(struct index_state));
 
+	/*
+	 * This slightly-misnamed 'full' index might still be sparse if we
+	 * are only modifying the list of sparse directories. This hinges
+	 * on whether we have a non-NULL pattern list.
+	 */
+	full->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
+
 	/* then change the necessary things */
-	full->sparse_index = 0;
 	full->cache_alloc = (3 * istate->cache_alloc) / 2;
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
@@ -330,11 +352,22 @@ void expand_to_pattern_list(struct index_state *istate,
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
 		struct pathspec ps;
+		int dtype;
 
 		if (!S_ISSPARSEDIR(ce->ce_mode)) {
 			set_index_entry(full, full->cache_nr++, ce);
 			continue;
 		}
+
+		/* We now have a sparse directory entry. Should we expand? */
+		if (pl &&
+		    path_matches_pattern_list(ce->name, ce->ce_namelen,
+					      NULL, &dtype,
+					      pl, istate) <= 0) {
+			set_index_entry(full, full->cache_nr++, ce);
+			continue;
+		}
+
 		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
 			warning(_("index entry is a directory, but not sparse (%08x)"),
 				ce->ce_flags);
@@ -360,7 +393,7 @@ void expand_to_pattern_list(struct index_state *istate,
 	/* Copy back into original index. */
 	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
 	memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash));
-	istate->sparse_index = 0;
+	istate->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
 	free(istate->cache);
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
@@ -374,7 +407,7 @@ void expand_to_pattern_list(struct index_state *istate,
 
 	/* Clear and recompute the cache-tree */
 	cache_tree_free(&istate->cache_tree);
-	cache_tree_update(istate, 0);
+	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
 
 	trace2_region_leave("index",
 			    pl ? "expand_to_pattern_list" : "ensure_full_index",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]'
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  8 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The sparse-checkout builtin is almost completely integrated with the
sparse index, allowing the sparse-checkout boundary to be modified
without expanding a sparse index to a full one. Add a test to
p2000-sparse-operations.sh that adds a directory to the sparse-checkout
definition, then removes it. Using both operations is important to
ensure that the operation is doing the same work in each repetition as
well as leaving the test repo in a good state for later tests.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/perf/p2000-sparse-operations.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 382716cfca9..ce5cfac5714 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,6 +110,7 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all "git sparse-checkout add f2/f3/f1 && git sparse-checkout set $SPARSE_CONE"
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 8/8] sparse-checkout: integrate with sparse index
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-05-16 18:11 ` [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
@ 2022-05-16 18:11 ` Derrick Stolee via GitGitGadget
  2022-05-16 20:38   ` Victoria Dye
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  8 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-16 18:11 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

We can see the improved performance in the p2000 test script:

Test                           HEAD~1            HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%

These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c                |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++++
 unpack-trees.c                           |  4 ++++
 3 files changed, 32 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index cbff6ad00b0..0157b292b36 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -937,6 +937,9 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (argc > 0) {
 		if (!strcmp(argv[0], "list"))
 			return sparse_checkout_list(argc, argv);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 93bcfd20bbc..614357fc48c 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1552,6 +1552,31 @@ test_expect_success 'ls-files' '
 	ensure_not_expanded ls-files --sparse
 '
 
+test_expect_success 'sparse index is not expanded: sparse-checkout' '
+	init_repos &&
+
+	ensure_not_expanded sparse-checkout set deep/deeper2 &&
+	ensure_not_expanded sparse-checkout set deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set deep &&
+	ensure_not_expanded sparse-checkout add folder1 &&
+	ensure_not_expanded sparse-checkout set deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set folder2 &&
+
+	# Demonstrate that the checks that "folder1/a" is a file
+	# do not cause a sparse-index expansion (since it is in the
+	# sparse-checkout cone).
+	echo >>sparse-index/folder2/a &&
+	git -C sparse-index add folder2/a &&
+
+	ensure_not_expanded sparse-checkout add folder1 &&
+
+	# Skip checks here, since deep/deeper1 is inside a sparse directory
+	# that must be expanded to check whether `deep/deeper1` is a file
+	# or not.
+	ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
diff --git a/unpack-trees.c b/unpack-trees.c
index 7f528d35cc2..9745e0dfc34 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -18,6 +18,7 @@
 #include "promisor-remote.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "sparse-index.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -2018,6 +2019,9 @@ enum update_sparsity_result update_sparsity(struct unpack_trees_options *o)
 			goto skip_sparse_checkout;
 	}
 
+	/* Expand sparse directories as needed */
+	expand_to_pattern_list(o->src_index, o->pl);
+
 	/* Set NEW_SKIP_WORKTREE on existing entries. */
 	mark_all_ce_unused(o->src_index);
 	mark_new_skip_worktree(o->pl, o->src_index, 0,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/8] sparse-index: create expand_to_pattern_list()
  2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
@ 2022-05-16 20:36   ` Victoria Dye
  2022-05-16 20:49     ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Victoria Dye @ 2022-05-16 20:36 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> This is the first change in a series to allow modifying the
> sparse-checkout pattern set without expanding a sparse index to a full
> one in the process. Here, we focus on the problem of expanding the
> pattern set through a command like 'git sparse-checkout add <path>'
> which needs to create new index entries for the paths now being written
> to the worktree.
> 
> To achieve this, we need to be able to replace sparse directory entries
> with their contained files and subdirectories. Once this is complete,
> other code paths can discover those cache entries and write the
> corresponding files to disk before committing the index.
> 
> We already have logic in ensure_full_index() that expands the index
> entries, so we will use that as our base. Create a new method,
> expand_to_pattern_list(), which takes a pattern list, but for now mostly
> ignores it. The current implementation is only correct when the pattern
> list is NULL as that does the same as ensure_full_index(). In fact,
> ensure_full_index() is converted to a shim over
> expand_to_pattern_list().
> 
> A future update will actually implement expand_to_pattern_list() to its
> full capabilities. For now, it is created and documented.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  sparse-index.c | 35 ++++++++++++++++++++++++++++++++---
>  sparse-index.h | 14 ++++++++++++++
>  2 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/sparse-index.c b/sparse-index.c
> index 8636af72de5..37c7df877a6 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -248,19 +248,41 @@ static int add_path_to_index(const struct object_id *oid,
>  	return 0;
>  }
>  
> -void ensure_full_index(struct index_state *istate)
> +void expand_to_pattern_list(struct index_state *istate,
> +			      struct pattern_list *pl)

Hyper-nit: I don't think this is aligned (it's probably not worth fixing
unless you end up rerolling for something else).

>  {
>  	int i;
>  	struct index_state *full;
>  	struct strbuf base = STRBUF_INIT;
>  
> +	/*
> +	 * If the index is already full, then keep it full. We will convert
> +	 * it to a sparse index on write, if possible.
> +	 */
>  	if (!istate || !istate->sparse_index)
>  		return;
>  
> +	/*
> +	 * If our index is sparse, but our new pattern set does not use
> +	 * cone mode patterns, then we need to expand the index before we
> +	 * continue. A NULL pattern set indicates a full expansion to a
> +	 * full index.
> +	 */
> +	if (pl && !pl->use_cone_patterns)
> +		pl = NULL;
> +
>  	if (!istate->repo)
>  		istate->repo = the_repository;
>  
> -	trace2_region_enter("index", "ensure_full_index", istate->repo);
> +	/*
> +	 * A NULL pattern set indicates we are expanding a full index, so
> +	 * we use a special region name that indicates the full expansion.
> +	 * This is used by test cases, but also helps to differentiate the
> +	 * two cases.
> +	 */
> +	trace2_region_enter("index",
> +			    pl ? "expand_to_pattern_list" : "ensure_full_index",
> +			    istate->repo);
>  
>  	/* initialize basics of new index */
>  	full = xcalloc(1, sizeof(struct index_state));
> @@ -322,7 +344,14 @@ void ensure_full_index(struct index_state *istate)
>  	cache_tree_free(&istate->cache_tree);
>  	cache_tree_update(istate, 0);
>  
> -	trace2_region_leave("index", "ensure_full_index", istate->repo);
> +	trace2_region_leave("index",
> +			    pl ? "expand_to_pattern_list" : "ensure_full_index",
> +			    istate->repo);
> +}
> +
> +void ensure_full_index(struct index_state *istate)
> +{
> +	expand_to_pattern_list(istate, NULL);
>  }
>  
>  void ensure_correct_sparsity(struct index_state *istate)
> diff --git a/sparse-index.h b/sparse-index.h
> index 633d4fb7e31..037b541f49d 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -23,4 +23,18 @@ void expand_to_path(struct index_state *istate,
>  struct repository;
>  int set_sparse_index_config(struct repository *repo, int enable);
>  
> +struct pattern_list;
> +
> +/**
> + * Scan the given index and compare its entries to the given pattern list.
> + * If the index is sparse and the pattern list uses cone mode patterns,
> + * then modify the index to contain the all of the file entries within that
> + * new pattern list. This expands sparse directories only as far as needed.
> + *
> + * If the pattern list is NULL or does not use cone mode patterns, then the
> + * index is expanded to a full index.
> + */
> +void expand_to_pattern_list(struct index_state *istate,
> +			      struct pattern_list *pl);
> +
>  #endif


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/8] sparse-index: partially expand directories
  2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
@ 2022-05-16 20:36   ` Victoria Dye
  0 siblings, 0 replies; 55+ messages in thread
From: Victoria Dye @ 2022-05-16 20:36 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The expand_to_pattern_list() method expands sparse directory entries
> to their list of contained files when either the pattern list is NULL or
> the directory is contained in the new pattern list's cone mode patterns.
> 
> It is possible that the pattern list has a recursive match with a
> directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
> to be expanded. If there exists a directory 'A/B/D/', then that
> directory should not be expanded and instead we can create a sparse
> directory.
> 
> To implement this, we plug into the add_path_to_index() callback for the
> call to read_tree_at(). Since we now need access to both the index we
> are writing and the pattern list we are comparing, create a 'struct
> modify_index_context' to use as a data transfer object. It is important
> that we use the given pattern list since we will use this pattern list
> to change the sparse-checkout patterns and cannot use
> istate->sparse_checkout_patterns.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  sparse-index.c | 46 +++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 39 insertions(+), 7 deletions(-)
> 
> diff --git a/sparse-index.c b/sparse-index.c
> index 79e8ff087bc..3d8eed585b5 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -9,6 +9,11 @@
>  #include "dir.h"
>  #include "fsmonitor.h"
>  
> +struct modify_index_context {
> +	struct index_state *write;
> +	struct pattern_list *pl;
> +};
> +
>  static struct cache_entry *construct_sparse_dir_entry(
>  				struct index_state *istate,
>  				const char *sparse_dir,
> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>  			     struct strbuf *base, const char *path,
>  			     unsigned int mode, void *context)
>  {
> -	struct index_state *istate = (struct index_state *)context;
> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>  	struct cache_entry *ce;
>  	size_t len = base->len;
>  
> -	if (S_ISDIR(mode))
> -		return READ_TREE_RECURSIVE;
> +	if (S_ISDIR(mode)) {
> +		int dtype;
> +		size_t baselen = base->len;
> +		if (!ctx->pl)
> +			return READ_TREE_RECURSIVE;
>  
> -	strbuf_addstr(base, path);
> +		/*
> +		 * Have we expanded to a point outside of the sparse-checkout?
> +		 */
> +		strbuf_addstr(base, path);
> +		strbuf_add(base, "/-", 2);
> +
> +		if (path_matches_pattern_list(base->buf, base->len,
> +					      NULL, &dtype,
> +					      ctx->pl, ctx->write)) {
> +			strbuf_setlen(base, baselen);
> +			return READ_TREE_RECURSIVE;
> +		}
>  
> -	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
> +		/*
> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
> +		 * name for inserting the entry into the idnex.

s/idnex/index

> +		 */
> +		strbuf_setlen(base, base->len - 1);
> +	} else {
> +		strbuf_addstr(base, path);
> +	}
> +
> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
> -	set_index_entry(istate, istate->cache_nr++, ce);
> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
>  
>  	strbuf_setlen(base, len);
>  	return 0;
> @@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  	int i;
>  	struct index_state *full;
>  	struct strbuf base = STRBUF_INIT;
> +	struct modify_index_context ctx;
>  
>  	/*
>  	 * If the index is already full, then keep it full. We will convert
> @@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate,
>  	full->cache_nr = 0;
>  	ALLOC_ARRAY(full->cache, full->cache_alloc);
>  
> +	ctx.write = full;
> +	ctx.pl = pl;
> +
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>  		struct tree *tree;
> @@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  		strbuf_add(&base, ce->name, strlen(ce->name));
>  
>  		read_tree_at(istate->repo, tree, &base, &ps,
> -			     add_path_to_index, full);
> +			     add_path_to_index, &ctx);
>  
>  		/* free directory entries. full entries are re-used */
>  		discard_cache_entry(ce);


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/8] sparse-index: complete partial expansion
  2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
@ 2022-05-16 20:38   ` Victoria Dye
  2022-05-17 13:23     ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Victoria Dye @ 2022-05-16 20:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> To complete the implementation of expand_to_pattern_list(), we need to
> detect when a sparse directory entry should remain sparse. This avoids a
> full expansion, so we now need to use the PARTIALLY_SPARSE mode to
> indicate this state.
> 
> There still are no callers to this method, but we will add one in the
> next change.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  sparse-index.c | 41 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/sparse-index.c b/sparse-index.c
> index 3d8eed585b5..0bad5503304 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -297,8 +297,24 @@ void expand_to_pattern_list(struct index_state *istate,
>  	 * continue. A NULL pattern set indicates a full expansion to a
>  	 * full index.
>  	 */
> -	if (pl && !pl->use_cone_patterns)
> +	if (pl && !pl->use_cone_patterns) {
>  		pl = NULL;
> +	} else {
> +		/*
> +		 * We might contract file entries into sparse-directory
> +		 * entries, and for that we will need the cache tree to
> +		 * be recomputed.
> +		 */
> +		cache_tree_free(&istate->cache_tree);
> +
> +		/*
> +		 * If there is a problem creating the cache tree, then we
> +		 * need to expand to a full index since we cannot satisfy
> +		 * the current request as a sparse index.
> +		 */
> +		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
> +			pl = NULL;
> +	}
>  
>  	if (!istate->repo)
>  		istate->repo = the_repository;
> @@ -317,8 +333,14 @@ void expand_to_pattern_list(struct index_state *istate,
>  	full = xcalloc(1, sizeof(struct index_state));
>  	memcpy(full, istate, sizeof(struct index_state));
>  
> +	/*
> +	 * This slightly-misnamed 'full' index might still be sparse if we
> +	 * are only modifying the list of sparse directories. This hinges
> +	 * on whether we have a non-NULL pattern list.
> +	 */
> +	full->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
> +
>  	/* then change the necessary things */
> -	full->sparse_index = 0;
>  	full->cache_alloc = (3 * istate->cache_alloc) / 2;
>  	full->cache_nr = 0;
>  	ALLOC_ARRAY(full->cache, full->cache_alloc);
> @@ -330,11 +352,22 @@ void expand_to_pattern_list(struct index_state *istate,
>  		struct cache_entry *ce = istate->cache[i];
>  		struct tree *tree;
>  		struct pathspec ps;
> +		int dtype;
>  
>  		if (!S_ISSPARSEDIR(ce->ce_mode)) {
>  			set_index_entry(full, full->cache_nr++, ce);
>  			continue;
>  		}
> +
> +		/* We now have a sparse directory entry. Should we expand? */
> +		if (pl &&
> +		    path_matches_pattern_list(ce->name, ce->ce_namelen,
> +					      NULL, &dtype,
> +					      pl, istate) <= 0) {

If I'm reading this correctly, what this is doing is:

- if we have a sparse directory entry
- ...and we're expanding only what matches the pattern list (i.e., not
  'ensure_full_index')
- ...and that sparse directory path is either *not matching* or *undecided
  whether it matches* the pattern list
- ...then we add the sparse directory to the result index and continue. 

The part that's confusing me is the "<= 0", which means a return value of
'UNDECIDED' from 'path_matches_pattern_list' adds the sparse directory
as-is. At the moment, it looks like 'UNDECIDED' is only returned if not
using cone patterns (so it shouldn't make a functional difference at this
point), but wouldn't that return value indicate that the pattern *may or may
not* match the path, so we should continue on to 'read_tree_at()'?

All that to say, should the condition be:

		/* We now have a sparse directory entry. Should we expand? */
		if (pl &&
		    path_matches_pattern_list(ce->name, ce->ce_namelen,
					      NULL, &dtype,
					      pl, istate) == NOT_MATCHED) {

to reflect that a sparse directory should only be added to the index if we
*know* it isn't matched?

To be clear, this is ultimately a non-functional nit - my question is mostly
to make sure I understand the intent of the code.

> +			set_index_entry(full, full->cache_nr++, ce);
> +			continue;
> +		}
> +
>  		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
>  			warning(_("index entry is a directory, but not sparse (%08x)"),
>  				ce->ce_flags);
> @@ -360,7 +393,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  	/* Copy back into original index. */
>  	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
>  	memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash));
> -	istate->sparse_index = 0;
> +	istate->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
>  	free(istate->cache);
>  	istate->cache = full->cache;
>  	istate->cache_nr = full->cache_nr;
> @@ -374,7 +407,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  
>  	/* Clear and recompute the cache-tree */
>  	cache_tree_free(&istate->cache_tree);
> -	cache_tree_update(istate, 0);
> +	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
>  
>  	trace2_region_leave("index",
>  			    pl ? "expand_to_pattern_list" : "ensure_full_index",


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: integrate with sparse index
  2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2022-05-16 20:38   ` Victoria Dye
  2022-05-17 13:28     ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Victoria Dye @ 2022-05-16 20:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> When modifying the sparse-checkout definition, the sparse-checkout
> builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
> cache entries in the index. Before, we needed the index to be fully
> expanded in order to ensure we had the full list of files necessary that
> match the new patterns.
> 
> Insert a call to reset_sparse_directories() that expands sparse
> directories that are within the new pattern list, but only far enough
> that every necessary file path now exists as a cache entry. The
> remaining logic within update_sparsity() will modify the SKIP_WORKTREE
> bits appropriately.
> 
> This allows us to disable command_requires_full_index within the
> sparse-checkout builtin. Add tests that demonstrate that we are not
> expanding to a full index unnecessarily.
> 
> We can see the improved performance in the p2000 test script:
> 
> Test                           HEAD~1            HEAD
> ------------------------------------------------------------------------
> 2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
> 2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%
> 
> These reductions of 26-28% are small compared to most examples, but the
> time is dominated by writing a new copy of the base repository to the
> worktree and then deleting it again. The fact that the previous index
> expansion was such a large portion of the time is telling how important
> it is to complete this sparse index integration.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/sparse-checkout.c                |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++++
>  unpack-trees.c                           |  4 ++++
>  3 files changed, 32 insertions(+)
> 
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index cbff6ad00b0..0157b292b36 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -937,6 +937,9 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  
>  	git_config(git_default_config, NULL);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +
>  	if (argc > 0) {
>  		if (!strcmp(argv[0], "list"))
>  			return sparse_checkout_list(argc, argv);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 93bcfd20bbc..614357fc48c 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1552,6 +1552,31 @@ test_expect_success 'ls-files' '
>  	ensure_not_expanded ls-files --sparse
>  '
>  
> +test_expect_success 'sparse index is not expanded: sparse-checkout' '
> +	init_repos &&
> +
> +	ensure_not_expanded sparse-checkout set deep/deeper2 &&
> +	ensure_not_expanded sparse-checkout set deep/deeper1 &&
> +	ensure_not_expanded sparse-checkout set deep &&
> +	ensure_not_expanded sparse-checkout add folder1 &&
> +	ensure_not_expanded sparse-checkout set deep/deeper1 &&
> +	ensure_not_expanded sparse-checkout set folder2 &&
> +
> +	# Demonstrate that the checks that "folder1/a" is a file
> +	# do not cause a sparse-index expansion (since it is in the
> +	# sparse-checkout cone).
> +	echo >>sparse-index/folder2/a &&
> +	git -C sparse-index add folder2/a &&
> +
> +	ensure_not_expanded sparse-checkout add folder1 &&
> +
> +	# Skip checks here, since deep/deeper1 is inside a sparse directory
> +	# that must be expanded to check whether `deep/deeper1` is a file
> +	# or not.
> +	ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 &&
> +	ensure_not_expanded sparse-checkout set
> +'
> +

These tests look good for ensuring sparsity is preserved, but it'd be nice
to also have some "stress tests" of 'sparse-checkout (add|set)'. The purpose
would be to make sure the index has the right contents for various types of
pattern changes, e.g. running 'sparse-checkout (add|set) <path>', then
verifying index contents with 'ls-files --sparse'. Paths might be:

- in vs. out of (current) cone
- match an existing vs. nonexistent directory

etc.

>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>  # in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 7f528d35cc2..9745e0dfc34 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -18,6 +18,7 @@
>  #include "promisor-remote.h"
>  #include "entry.h"
>  #include "parallel-checkout.h"
> +#include "sparse-index.h"
>  
>  /*
>   * Error messages expected by scripts out of plumbing commands such as
> @@ -2018,6 +2019,9 @@ enum update_sparsity_result update_sparsity(struct unpack_trees_options *o)
>  			goto skip_sparse_checkout;
>  	}
>  
> +	/* Expand sparse directories as needed */
> +	expand_to_pattern_list(o->src_index, o->pl);
> +
>  	/* Set NEW_SKIP_WORKTREE on existing entries. */
>  	mark_all_ce_unused(o->src_index);
>  	mark_new_skip_worktree(o->pl, o->src_index, 0,


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/8] sparse-index: create expand_to_pattern_list()
  2022-05-16 20:36   ` Victoria Dye
@ 2022-05-16 20:49     ` Derrick Stolee
  0 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-16 20:49 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee

On 5/16/2022 4:36 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:

>> -void ensure_full_index(struct index_state *istate)
>> +void expand_to_pattern_list(struct index_state *istate,
>> +			      struct pattern_list *pl)
> 
> Hyper-nit: I don't think this is aligned (it's probably not worth fixing
> unless you end up rerolling for something else).

You're right. When I saw this I thought it was just the
"off by one" issue when patches have a "+" in the beginning,
but it's actually the backwards order. Thanks for catching it.

-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/8] sparse-index: complete partial expansion
  2022-05-16 20:38   ` Victoria Dye
@ 2022-05-17 13:23     ` Derrick Stolee
  0 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-17 13:23 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee

On 5/16/2022 4:38 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>

>> +
>> +		/* We now have a sparse directory entry. Should we expand? */
>> +		if (pl &&
>> +		    path_matches_pattern_list(ce->name, ce->ce_namelen,
>> +					      NULL, &dtype,
>> +					      pl, istate) <= 0) {
> 
> If I'm reading this correctly, what this is doing is:
> 
> - if we have a sparse directory entry
> - ...and we're expanding only what matches the pattern list (i.e., not
>   'ensure_full_index')
> - ...and that sparse directory path is either *not matching* or *undecided
>   whether it matches* the pattern list
> - ...then we add the sparse directory to the result index and continue. 
> 
> The part that's confusing me is the "<= 0", which means a return value of
> 'UNDECIDED' from 'path_matches_pattern_list' adds the sparse directory
> as-is. At the moment, it looks like 'UNDECIDED' is only returned if not
> using cone patterns (so it shouldn't make a functional difference at this
> point), but wouldn't that return value indicate that the pattern *may or may
> not* match the path, so we should continue on to 'read_tree_at()'?
> 
> All that to say, should the condition be:
> 
> 		/* We now have a sparse directory entry. Should we expand? */
> 		if (pl &&
> 		    path_matches_pattern_list(ce->name, ce->ce_namelen,
> 					      NULL, &dtype,
> 					      pl, istate) == NOT_MATCHED) {
> 
> to reflect that a sparse directory should only be added to the index if we
> *know* it isn't matched?
> 
> To be clear, this is ultimately a non-functional nit - my question is mostly
> to make sure I understand the intent of the code.

You are 100% correct, and using "== NOT_MATCHED" makes the intention clear,
as well as being more robust to a breakage in path_matches_pattern_list().

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: integrate with sparse index
  2022-05-16 20:38   ` Victoria Dye
@ 2022-05-17 13:28     ` Derrick Stolee
  0 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-17 13:28 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git
  Cc: gitster, shaoxuan.yuan02, Derrick Stolee

On 5/16/2022 4:38 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>

>> +test_expect_success 'sparse index is not expanded: sparse-checkout' '
>> +	init_repos &&
>> +
>> +	ensure_not_expanded sparse-checkout set deep/deeper2 &&
>> +	ensure_not_expanded sparse-checkout set deep/deeper1 &&
>> +	ensure_not_expanded sparse-checkout set deep &&
>> +	ensure_not_expanded sparse-checkout add folder1 &&
>> +	ensure_not_expanded sparse-checkout set deep/deeper1 &&
>> +	ensure_not_expanded sparse-checkout set folder2 &&
>> +
>> +	# Demonstrate that the checks that "folder1/a" is a file
>> +	# do not cause a sparse-index expansion (since it is in the
>> +	# sparse-checkout cone).
>> +	echo >>sparse-index/folder2/a &&
>> +	git -C sparse-index add folder2/a &&
>> +
>> +	ensure_not_expanded sparse-checkout add folder1 &&
>> +
>> +	# Skip checks here, since deep/deeper1 is inside a sparse directory
>> +	# that must be expanded to check whether `deep/deeper1` is a file
>> +	# or not.
>> +	ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 &&
>> +	ensure_not_expanded sparse-checkout set
>> +'
>> +
> 
> These tests look good for ensuring sparsity is preserved, but it'd be nice
> to also have some "stress tests" of 'sparse-checkout (add|set)'. The purpose
> would be to make sure the index has the right contents for various types of
> pattern changes, e.g. running 'sparse-checkout (add|set) <path>', then
> verifying index contents with 'ls-files --sparse'. Paths might be:
> 
> - in vs. out of (current) cone
> - match an existing vs. nonexistent directory
> 
> etc.

I guess I was relying on tests added previously for the sparse index,
such as this one:

test_expect_success 'sparse-index contents' '
	init_repos &&

	git -C sparse-index ls-files --sparse --stage >cache &&
	for dir in folder1 folder2 x
	do
		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
		grep "040000 $TREE 0	$dir/" cache \
			|| return 1
	done &&

	git -C sparse-index sparse-checkout set folder1 &&

	git -C sparse-index ls-files --sparse --stage >cache &&
	for dir in deep folder2 x
	do
		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
		grep "040000 $TREE 0	$dir/" cache \
			|| return 1
	done &&

	git -C sparse-index sparse-checkout set deep/deeper1 &&

	git -C sparse-index ls-files --sparse --stage >cache &&
	for dir in deep/deeper2 folder1 folder2 x
	do
		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
		grep "040000 $TREE 0	$dir/" cache \
			|| return 1
	done &&

	# Disabling the sparse-index replaces tree entries with full ones
	git -C sparse-index sparse-checkout init --no-sparse-index &&
	test_sparse_match git ls-files --stage --sparse
'

But this test isn't covering enough interesting cases that might
cause issues with the changes in this series. I'll add a patch that
increases coverage in this area.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 00/10] Sparse index: integrate with sparse-checkout
  2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52 ` Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
                     ` (10 more replies)
  8 siblings, 11 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee

This series is based on ds/sparse-colon-path.

This series integrates the 'git sparse-checkout' builtin with the sparse
index. This is the last integration that we fast-tracked into the
microsoft/git fork. After this, we have no work in flight that would
conflict with a Google Summer of Code project in this area.

The tricky part about the sparse-checkout builtin is that we actually do
need to expand the index when growing the sparse-checkout boundary. The
trick is to expand it only as far as we need it, and then ensure that we
collapse any removed directories before the command completes.

To do this, we introduce a concept of a "partially expanded" index. In fact,
we break the boolean sparse_index member into an enum with three states:

 * COMPLETELY_FULL (0): No sparse directories exist.

 * COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
   sparse-checkout cone are reduced to sparse directory entries whenever
   possible.

 * PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
   outside the sparse-checkout cone may exist. Running convert_to_sparse()
   may further reduce those files to sparse directory entries.

Most of the patches in this series focus on introducing this enum and
carefully converting previous uses of the boolean to use the enum. Some
additional work is required to refactor ensure_full_index() into a new
expand_to_pattern_list() method, as they are doing essentially the same
thing, but with different scopes.

The result is improved performance on the sparse-checkout builtin as
demonstrated in a new p2000-sparse-operations.sh performance test:


Test HEAD~1 HEAD
================

2000.24: git sparse-checkout ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53)
-26.6% 2000.25: git sparse-checkout ... (sparse-v4) 2.20(1.62+0.57)
1.58(0.98+0.59) -28.2%

The improvement here is less dramatic because the operation is dominated by
writing and deleting a lot of files in the worktree. A repeated no-op
operation such as git sparse-checkout set $SPARSE_CONE would show a greater
improvement, but is less interesting since it could gain that improvement
without satisfying the "hard" parts of this builtin.

I specifically chose how to update the tests in t1092 and p2000 to avoid
conflicts with Victoria's 'git stash' work.


Updates in v2
=============

 * Typo fixes.
 * Two patches are added to the start to (a) refactor existing sparse index
   content tests, and (b) add new sparse index content tests with additional
   scenarios.
 * Use NOT_MATCHED directly instead of implicitly allowing UNDECIDED when
   matching in cone mode.

Thanks, -Stolee

Derrick Stolee (10):
  t1092: refactor 'sparse-index contents' test
  t1092: stress test 'git sparse-checkout set'
  sparse-index: create expand_to_pattern_list()
  sparse-index: introduce partially-sparse indexes
  cache-tree: implement cache_tree_find_path()
  sparse-checkout: --no-sparse-index needs a full index
  sparse-index: partially expand directories
  sparse-index: complete partial expansion
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-checkout: integrate with sparse index

 builtin/sparse-checkout.c                |   8 +-
 cache-tree.c                             |  24 +++++
 cache-tree.h                             |   2 +
 cache.h                                  |  32 ++++--
 read-cache.c                             |   6 +-
 sparse-index.c                           | 126 ++++++++++++++++++++---
 sparse-index.h                           |  14 +++
 t/perf/p2000-sparse-operations.sh        |   1 +
 t/t1092-sparse-checkout-compatibility.sh |  95 +++++++++++++----
 unpack-trees.c                           |   4 +
 10 files changed, 265 insertions(+), 47 deletions(-)


base-commit: 124b05b23005437fa5fb91863bde2a8f5840e164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1208%2Fderrickstolee%2Fsparse-index%2Fsparse-checkout-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1208/derrickstolee/sparse-index/sparse-checkout-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1208

Range-diff vs v1:

  -:  ----------- >  1:  f2960747ed8 t1092: refactor 'sparse-index contents' test
  -:  ----------- >  2:  5030eeecf4f t1092: stress test 'git sparse-checkout set'
  1:  f1194d56d33 !  3:  d15338573e5 sparse-index: create expand_to_pattern_list()
     @@ sparse-index.c: static int add_path_to_index(const struct object_id *oid,
       
      -void ensure_full_index(struct index_state *istate)
      +void expand_to_pattern_list(struct index_state *istate,
     -+			      struct pattern_list *pl)
     ++			    struct pattern_list *pl)
       {
       	int i;
       	struct index_state *full;
  2:  d394d0e20e8 =  4:  269c206c331 sparse-index: introduce partially-sparse indexes
  3:  c0e81be97aa =  5:  c977001c033 cache-tree: implement cache_tree_find_path()
  4:  d1fb2e0e0d3 =  6:  e42463de0d2 sparse-checkout: --no-sparse-index needs a full index
  5:  5c7546ab070 !  7:  346c56bf256 sparse-index: partially expand directories
     @@ sparse-index.c: static int add_path_to_index(const struct object_id *oid,
      -	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
      +		/*
      +		 * The path "{base}{path}/" is a sparse directory. Create the correct
     -+		 * name for inserting the entry into the idnex.
     ++		 * name for inserting the entry into the index.
      +		 */
      +		strbuf_setlen(base, base->len - 1);
      +	} else {
  6:  eba63cc12af !  8:  ed640e3645b sparse-index: complete partial expansion
     @@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
      +		if (pl &&
      +		    path_matches_pattern_list(ce->name, ce->ce_namelen,
      +					      NULL, &dtype,
     -+					      pl, istate) <= 0) {
     ++					      pl, istate) == NOT_MATCHED) {
      +			set_index_entry(full, full->cache_nr++, ce);
      +			continue;
      +		}
  7:  2804326c8bb =  9:  089ab086f58 p2000: add test for 'git sparse-checkout [add|set]'
  8:  b8a349c6dee = 10:  ad9ed6973d5 sparse-checkout: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before expanding this test with more involved cases, first extract the
repeated logic into a new test_sparse_checkout_set helper. This helper
checks that 'git sparse-checkout set ...' succeeds and then verifies
that certain directories have sparse directory entries in the sparse
index. It also verifies that the in-cone directories are _not_ sparse
directory entries in the sparse index.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 53 ++++++++++++++++--------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 93bcfd20bbc..e7c0ae9b953 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -205,36 +205,53 @@ test_sparse_unstaged () {
 	done
 }
 
-test_expect_success 'sparse-index contents' '
-	init_repos &&
-
+# Usage: test_sprase_checkout_set "<c1> ... <cN>" "<s1> ... <sM>"
+# Verifies that "git sparse-checkout set <c1> ... <cN>" succeeds and
+# leaves the sparse index in a state where <s1> ... <sM> are sparse
+# directories (and <c1> ... <cN> are not).
+test_sparse_checkout_set () {
+	CONE_DIRS=$1 &&
+	SPARSE_DIRS=$2 &&
+	git -C sparse-index sparse-checkout set $CONE_DIRS &&
 	git -C sparse-index ls-files --sparse --stage >cache &&
-	for dir in folder1 folder2 x
+
+	# Check that the directories outside of the sparse-checkout cone
+	# have sparse directory entries.
+	for dir in $SPARSE_DIRS
 	do
 		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
 		grep "040000 $TREE 0	$dir/" cache \
 			|| return 1
 	done &&
 
-	git -C sparse-index sparse-checkout set folder1 &&
-
-	git -C sparse-index ls-files --sparse --stage >cache &&
-	for dir in deep folder2 x
+	# Check that the directories in the sparse-checkout cone
+	# are not sparse directory entries.
+	for dir in $CONE_DIRS
 	do
 		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
-		grep "040000 $TREE 0	$dir/" cache \
+		! grep "040000 $TREE 0	$dir/" cache \
 			|| return 1
-	done &&
+	done
+}
 
-	git -C sparse-index sparse-checkout set deep/deeper1 &&
+test_expect_success 'sparse-index contents' '
+	init_repos &&
 
-	git -C sparse-index ls-files --sparse --stage >cache &&
-	for dir in deep/deeper2 folder1 folder2 x
-	do
-		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
-		grep "040000 $TREE 0	$dir/" cache \
-			|| return 1
-	done &&
+	# Remove deep, add three other directories.
+	test_sparse_checkout_set \
+		"folder1 folder2 x" \
+		"before deep" &&
+
+	# Remove folder1, add deep
+	test_sparse_checkout_set \
+		"deep folder2 x" \
+		"before folder1" &&
+
+	# Replace deep with deep/deeper2 (dropping deep/deeper1)
+	# Add folder1
+	test_sparse_checkout_set \
+		"deep/deeper2 folder1 folder2 x" \
+		"before deep/deeper1" &&
 
 	# Disabling the sparse-index replaces tree entries with full ones
 	git -C sparse-index sparse-checkout init --no-sparse-index &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set'
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The 'sparse-index contents' test checks that the sparse index has the
correct set of sparse directories in the index after modifying the cone
mode patterns using 'git sparse-checkout set'. Add to the coverage here
by adding more complicated scenarios that were not previously tested.

In order to check paths that do not exist at HEAD, we need to modify the
test_sparse_checkout_set helper slightly:

1. Add the --skip-checks argument to the 'set' command to avoid failures
   when passing paths that do not exist at HEAD.

2. When looking for the non-existence of sparse directories for the
   paths in $CONE_DIRS, allow the rev-list command to fail because the
   path does not exist at HEAD.

This allows us to add some interesting test cases.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e7c0ae9b953..785820f9fd5 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -212,7 +212,7 @@ test_sparse_unstaged () {
 test_sparse_checkout_set () {
 	CONE_DIRS=$1 &&
 	SPARSE_DIRS=$2 &&
-	git -C sparse-index sparse-checkout set $CONE_DIRS &&
+	git -C sparse-index sparse-checkout set --skip-checks $CONE_DIRS &&
 	git -C sparse-index ls-files --sparse --stage >cache &&
 
 	# Check that the directories outside of the sparse-checkout cone
@@ -228,7 +228,9 @@ test_sparse_checkout_set () {
 	# are not sparse directory entries.
 	for dir in $CONE_DIRS
 	do
-		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
+		# Allow TREE to not exist because
+		# $dir does not exist at HEAD.
+		TREE=$(git -C sparse-index rev-parse HEAD:$dir) ||
 		! grep "040000 $TREE 0	$dir/" cache \
 			|| return 1
 	done
@@ -253,6 +255,19 @@ test_expect_success 'sparse-index contents' '
 		"deep/deeper2 folder1 folder2 x" \
 		"before deep/deeper1" &&
 
+	# Replace deep/deeper2 with deep/deeper1
+	# Replace folder1 with folder1/0/0
+	# Replace folder2 with non-existent folder2/2/3
+	# Add non-existent "bogus"
+	test_sparse_checkout_set \
+		"bogus deep/deeper1 folder1/0/0 folder2/2/3 x" \
+		"before deep/deeper2 folder2/0" &&
+
+	# Drop down to only files at root
+	test_sparse_checkout_set \
+		"" \
+		"before deep folder1 folder2 x" &&
+
 	# Disabling the sparse-index replaces tree entries with full ones
 	git -C sparse-index sparse-checkout init --no-sparse-index &&
 	test_sparse_match git ls-files --stage --sparse
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 03/10] sparse-index: create expand_to_pattern_list()
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 19:50     ` Junio C Hamano
  2022-05-19 17:52   ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This is the first change in a series to allow modifying the
sparse-checkout pattern set without expanding a sparse index to a full
one in the process. Here, we focus on the problem of expanding the
pattern set through a command like 'git sparse-checkout add <path>'
which needs to create new index entries for the paths now being written
to the worktree.

To achieve this, we need to be able to replace sparse directory entries
with their contained files and subdirectories. Once this is complete,
other code paths can discover those cache entries and write the
corresponding files to disk before committing the index.

We already have logic in ensure_full_index() that expands the index
entries, so we will use that as our base. Create a new method,
expand_to_pattern_list(), which takes a pattern list, but for now mostly
ignores it. The current implementation is only correct when the pattern
list is NULL as that does the same as ensure_full_index(). In fact,
ensure_full_index() is converted to a shim over
expand_to_pattern_list().

A future update will actually implement expand_to_pattern_list() to its
full capabilities. For now, it is created and documented.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 35 ++++++++++++++++++++++++++++++++---
 sparse-index.h | 14 ++++++++++++++
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 8636af72de5..2a06ef58051 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -248,19 +248,41 @@ static int add_path_to_index(const struct object_id *oid,
 	return 0;
 }
 
-void ensure_full_index(struct index_state *istate)
+void expand_to_pattern_list(struct index_state *istate,
+			    struct pattern_list *pl)
 {
 	int i;
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
 
+	/*
+	 * If the index is already full, then keep it full. We will convert
+	 * it to a sparse index on write, if possible.
+	 */
 	if (!istate || !istate->sparse_index)
 		return;
 
+	/*
+	 * If our index is sparse, but our new pattern set does not use
+	 * cone mode patterns, then we need to expand the index before we
+	 * continue. A NULL pattern set indicates a full expansion to a
+	 * full index.
+	 */
+	if (pl && !pl->use_cone_patterns)
+		pl = NULL;
+
 	if (!istate->repo)
 		istate->repo = the_repository;
 
-	trace2_region_enter("index", "ensure_full_index", istate->repo);
+	/*
+	 * A NULL pattern set indicates we are expanding a full index, so
+	 * we use a special region name that indicates the full expansion.
+	 * This is used by test cases, but also helps to differentiate the
+	 * two cases.
+	 */
+	trace2_region_enter("index",
+			    pl ? "expand_to_pattern_list" : "ensure_full_index",
+			    istate->repo);
 
 	/* initialize basics of new index */
 	full = xcalloc(1, sizeof(struct index_state));
@@ -322,7 +344,14 @@ void ensure_full_index(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
-	trace2_region_leave("index", "ensure_full_index", istate->repo);
+	trace2_region_leave("index",
+			    pl ? "expand_to_pattern_list" : "ensure_full_index",
+			    istate->repo);
+}
+
+void ensure_full_index(struct index_state *istate)
+{
+	expand_to_pattern_list(istate, NULL);
 }
 
 void ensure_correct_sparsity(struct index_state *istate)
diff --git a/sparse-index.h b/sparse-index.h
index 633d4fb7e31..037b541f49d 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -23,4 +23,18 @@ void expand_to_path(struct index_state *istate,
 struct repository;
 int set_sparse_index_config(struct repository *repo, int enable);
 
+struct pattern_list;
+
+/**
+ * Scan the given index and compare its entries to the given pattern list.
+ * If the index is sparse and the pattern list uses cone mode patterns,
+ * then modify the index to contain the all of the file entries within that
+ * new pattern list. This expands sparse directories only as far as needed.
+ *
+ * If the pattern list is NULL or does not use cone mode patterns, then the
+ * index is expanded to a full index.
+ */
+void expand_to_pattern_list(struct index_state *istate,
+			      struct pattern_list *pl);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 20:05     ` Junio C Hamano
  2022-05-19 17:52   ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* COMPLETELY_FULL (0): No sparse directories exist.

* COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
  outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
COMPLETELY_SPARSE mode but to actually do the necessary work when in
PARTIALLY_SPARSE mode.

The PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c |  2 +-
 cache.h                   | 32 ++++++++++++++++++++++++--------
 read-cache.c              |  6 +++---
 sparse-index.c            |  6 +++---
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 0217d44c5b1..88eea069ad4 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -128,7 +128,7 @@ static void clean_tracked_sparse_directories(struct repository *r)
 	 * sparse index will not delete directories that contain
 	 * conflicted entries or submodules.
 	 */
-	if (!r->index->sparse_index) {
+	if (r->index->sparse_index == COMPLETELY_FULL) {
 		/*
 		 * If something, such as a merge conflict or other concern,
 		 * prevents us from converting to a sparse index, then do
diff --git a/cache.h b/cache.h
index 6226f6a8a53..2d067aca2fd 100644
--- a/cache.h
+++ b/cache.h
@@ -310,6 +310,28 @@ struct untracked_cache;
 struct progress;
 struct pattern_list;
 
+enum sparse_index_mode {
+	/*
+	 * COMPLETELY_FULL: there are no sparse directories
+	 * in the index at all.
+	 */
+	COMPLETELY_FULL = 0,
+
+	/*
+	 * COLLAPSED: the index has already been collapsed to sparse
+	 * directories whereever possible.
+	 */
+	COLLAPSED = 1,
+
+	/*
+	 * PARTIALLY_SPARSE: the sparse directories that exist are
+	 * outside the sparse-checkout boundary, but it is possible
+	 * that some file entries could collapse to sparse directory
+	 * entries.
+	 */
+	PARTIALLY_SPARSE = 2,
+};
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -323,14 +345,8 @@ struct index_state {
 		 drop_cache_tree : 1,
 		 updated_workdir : 1,
 		 updated_skipworktree : 1,
-		 fsmonitor_has_run_once : 1,
-
-		 /*
-		  * sparse_index == 1 when sparse-directory
-		  * entries exist. Requires sparse-checkout
-		  * in cone mode.
-		  */
-		 sparse_index : 1;
+		 fsmonitor_has_run_once : 1;
+	enum sparse_index_mode sparse_index;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	struct object_id oid;
diff --git a/read-cache.c b/read-cache.c
index 4df97e185e9..cb9b33169fd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -112,7 +112,7 @@ static const char *alternate_index_output;
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	if (S_ISSPARSEDIR(ce->ce_mode))
-		istate->sparse_index = 1;
+		istate->sparse_index = COLLAPSED;
 
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
@@ -1856,7 +1856,7 @@ static int read_index_extension(struct index_state *istate,
 		break;
 	case CACHE_EXT_SPARSE_DIRECTORIES:
 		/* no content, only an indicator */
-		istate->sparse_index = 1;
+		istate->sparse_index = COLLAPSED;
 		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
@@ -3149,7 +3149,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 				 unsigned flags)
 {
 	int ret;
-	int was_full = !istate->sparse_index;
+	int was_full = istate->sparse_index == COMPLETELY_FULL;
 
 	ret = convert_to_sparse(istate, 0);
 
diff --git a/sparse-index.c b/sparse-index.c
index 2a06ef58051..c2cd3bdb614 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -173,7 +173,7 @@ int convert_to_sparse(struct index_state *istate, int flags)
 	 * If the index is already sparse, empty, or otherwise
 	 * cannot be converted to sparse, do not convert.
 	 */
-	if (istate->sparse_index || !istate->cache_nr ||
+	if (istate->sparse_index == COLLAPSED || !istate->cache_nr ||
 	    !is_sparse_index_allowed(istate, flags))
 		return 0;
 
@@ -214,7 +214,7 @@ int convert_to_sparse(struct index_state *istate, int flags)
 	FREE_AND_NULL(istate->fsmonitor_dirty);
 	FREE_AND_NULL(istate->fsmonitor_last_update);
 
-	istate->sparse_index = 1;
+	istate->sparse_index = COLLAPSED;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
 }
@@ -259,7 +259,7 @@ void expand_to_pattern_list(struct index_state *istate,
 	 * If the index is already full, then keep it full. We will convert
 	 * it to a sparse index on write, if possible.
 	 */
-	if (!istate || !istate->sparse_index)
+	if (!istate || istate->sparse_index == COMPLETELY_FULL)
 		return;
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 05/10] cache-tree: implement cache_tree_find_path()
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 20:14     ` Junio C Hamano
  2022-05-19 17:52   ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.

The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way.

This new method is not currently used, but will be in an upcoming
change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 cache-tree.c | 24 ++++++++++++++++++++++++
 cache-tree.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index 6752f69d515..23893a7b113 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -100,6 +100,30 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *it, const char *path)
 	return find_subtree(it, path, pathlen, 1);
 }
 
+struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path)
+{
+	const char *slash;
+	int namelen;
+	struct cache_tree_sub *down;
+
+	if (!it)
+		return NULL;
+	slash = strchrnul(path, '/');
+	namelen = slash - path;
+	it->entry_count = -1;
+	if (!*slash) {
+		int pos;
+		pos = cache_tree_subtree_pos(it, path, namelen);
+		if (0 <= pos)
+			return it->down[pos]->cache_tree;
+		return NULL;
+	}
+	down = find_subtree(it, path, namelen, 0);
+	if (down)
+		return cache_tree_find_path(down->cache_tree, slash + 1);
+	return NULL;
+}
+
 static int do_invalidate_path(struct cache_tree *it, const char *path)
 {
 	/* a/b/c
diff --git a/cache-tree.h b/cache-tree.h
index 8efeccebfc9..f75f8e74dcd 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -29,6 +29,8 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
 
+struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path);
+
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 20:19     ` Junio C Hamano
  2022-05-19 17:52   ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When the --no-sparse-index option is supplied, the sparse-checkout
builtin should explicitly ask to expand a sparse index to a full one.
This is currently done implicitly due to the command_requires_full_index
protection, but that will be removed in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 88eea069ad4..cbff6ad00b0 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -413,6 +413,9 @@ static int update_modes(int *cone_mode, int *sparse_index)
 		/* force an index rewrite */
 		repo_read_index(the_repository);
 		the_repository->index->updated_workdir = 1;
+
+		if (!*sparse_index)
+			ensure_full_index(the_repository->index);
 	}
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 07/10] sparse-index: partially expand directories
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-20 18:17     ` Junio C Hamano
  2022-05-19 17:52   ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate->sparse_checkout_patterns.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 46 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index c2cd3bdb614..73b82e5017b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -9,6 +9,11 @@
 #include "dir.h"
 #include "fsmonitor.h"
 
+struct modify_index_context {
+	struct index_state *write;
+	struct pattern_list *pl;
+};
+
 static struct cache_entry *construct_sparse_dir_entry(
 				struct index_state *istate,
 				const char *sparse_dir,
@@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
 			     struct strbuf *base, const char *path,
 			     unsigned int mode, void *context)
 {
-	struct index_state *istate = (struct index_state *)context;
+	struct modify_index_context *ctx = (struct modify_index_context *)context;
 	struct cache_entry *ce;
 	size_t len = base->len;
 
-	if (S_ISDIR(mode))
-		return READ_TREE_RECURSIVE;
+	if (S_ISDIR(mode)) {
+		int dtype;
+		size_t baselen = base->len;
+		if (!ctx->pl)
+			return READ_TREE_RECURSIVE;
 
-	strbuf_addstr(base, path);
+		/*
+		 * Have we expanded to a point outside of the sparse-checkout?
+		 */
+		strbuf_addstr(base, path);
+		strbuf_add(base, "/-", 2);
+
+		if (path_matches_pattern_list(base->buf, base->len,
+					      NULL, &dtype,
+					      ctx->pl, ctx->write)) {
+			strbuf_setlen(base, baselen);
+			return READ_TREE_RECURSIVE;
+		}
 
-	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
+		/*
+		 * The path "{base}{path}/" is a sparse directory. Create the correct
+		 * name for inserting the entry into the index.
+		 */
+		strbuf_setlen(base, base->len - 1);
+	} else {
+		strbuf_addstr(base, path);
+	}
+
+	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
 	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
-	set_index_entry(istate, istate->cache_nr++, ce);
+	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
 	return 0;
@@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate,
 	int i;
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
+	struct modify_index_context ctx;
 
 	/*
 	 * If the index is already full, then keep it full. We will convert
@@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate,
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
 
+	ctx.write = full;
+	ctx.pl = pl;
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
@@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate,
 		strbuf_add(&base, ce->name, strlen(ce->name));
 
 		read_tree_at(istate->repo, tree, &base, &ps,
-			     add_path_to_index, full);
+			     add_path_to_index, &ctx);
 
 		/* free directory entries. full entries are re-used */
 		discard_cache_entry(ce);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-21  7:45     ` Junio C Hamano
  2022-05-19 17:52   ` [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

To complete the implementation of expand_to_pattern_list(), we need to
detect when a sparse directory entry should remain sparse. This avoids a
full expansion, so we now need to use the PARTIALLY_SPARSE mode to
indicate this state.

There still are no callers to this method, but we will add one in the
next change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 73b82e5017b..a65169030a2 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -297,8 +297,24 @@ void expand_to_pattern_list(struct index_state *istate,
 	 * continue. A NULL pattern set indicates a full expansion to a
 	 * full index.
 	 */
-	if (pl && !pl->use_cone_patterns)
+	if (pl && !pl->use_cone_patterns) {
 		pl = NULL;
+	} else {
+		/*
+		 * We might contract file entries into sparse-directory
+		 * entries, and for that we will need the cache tree to
+		 * be recomputed.
+		 */
+		cache_tree_free(&istate->cache_tree);
+
+		/*
+		 * If there is a problem creating the cache tree, then we
+		 * need to expand to a full index since we cannot satisfy
+		 * the current request as a sparse index.
+		 */
+		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
+			pl = NULL;
+	}
 
 	if (!istate->repo)
 		istate->repo = the_repository;
@@ -317,8 +333,14 @@ void expand_to_pattern_list(struct index_state *istate,
 	full = xcalloc(1, sizeof(struct index_state));
 	memcpy(full, istate, sizeof(struct index_state));
 
+	/*
+	 * This slightly-misnamed 'full' index might still be sparse if we
+	 * are only modifying the list of sparse directories. This hinges
+	 * on whether we have a non-NULL pattern list.
+	 */
+	full->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
+
 	/* then change the necessary things */
-	full->sparse_index = 0;
 	full->cache_alloc = (3 * istate->cache_alloc) / 2;
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
@@ -330,11 +352,22 @@ void expand_to_pattern_list(struct index_state *istate,
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
 		struct pathspec ps;
+		int dtype;
 
 		if (!S_ISSPARSEDIR(ce->ce_mode)) {
 			set_index_entry(full, full->cache_nr++, ce);
 			continue;
 		}
+
+		/* We now have a sparse directory entry. Should we expand? */
+		if (pl &&
+		    path_matches_pattern_list(ce->name, ce->ce_namelen,
+					      NULL, &dtype,
+					      pl, istate) == NOT_MATCHED) {
+			set_index_entry(full, full->cache_nr++, ce);
+			continue;
+		}
+
 		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
 			warning(_("index entry is a directory, but not sparse (%08x)"),
 				ce->ce_flags);
@@ -360,7 +393,7 @@ void expand_to_pattern_list(struct index_state *istate,
 	/* Copy back into original index. */
 	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
 	memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash));
-	istate->sparse_index = 0;
+	istate->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
 	free(istate->cache);
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
@@ -374,7 +407,7 @@ void expand_to_pattern_list(struct index_state *istate,
 
 	/* Clear and recompute the cache-tree */
 	cache_tree_free(&istate->cache_tree);
-	cache_tree_update(istate, 0);
+	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
 
 	trace2_region_leave("index",
 			    pl ? "expand_to_pattern_list" : "ensure_full_index",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]'
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-19 17:52   ` [PATCH v2 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The sparse-checkout builtin is almost completely integrated with the
sparse index, allowing the sparse-checkout boundary to be modified
without expanding a sparse index to a full one. Add a test to
p2000-sparse-operations.sh that adds a directory to the sparse-checkout
definition, then removes it. Using both operations is important to
ensure that the operation is doing the same work in each repetition as
well as leaving the test repo in a good state for later tests.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/perf/p2000-sparse-operations.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 382716cfca9..ce5cfac5714 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,6 +110,7 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all "git sparse-checkout add f2/f3/f1 && git sparse-checkout set $SPARSE_CONE"
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 10/10] sparse-checkout: integrate with sparse index
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
@ 2022-05-19 17:52   ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-19 17:52 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

We can see the improved performance in the p2000 test script:

Test                           HEAD~1            HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%

These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c                |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++++
 unpack-trees.c                           |  4 ++++
 3 files changed, 32 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index cbff6ad00b0..0157b292b36 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -937,6 +937,9 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (argc > 0) {
 		if (!strcmp(argv[0], "list"))
 			return sparse_checkout_list(argc, argv);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 785820f9fd5..73f4cf47314 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1584,6 +1584,31 @@ test_expect_success 'ls-files' '
 	ensure_not_expanded ls-files --sparse
 '
 
+test_expect_success 'sparse index is not expanded: sparse-checkout' '
+	init_repos &&
+
+	ensure_not_expanded sparse-checkout set deep/deeper2 &&
+	ensure_not_expanded sparse-checkout set deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set deep &&
+	ensure_not_expanded sparse-checkout add folder1 &&
+	ensure_not_expanded sparse-checkout set deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set folder2 &&
+
+	# Demonstrate that the checks that "folder1/a" is a file
+	# do not cause a sparse-index expansion (since it is in the
+	# sparse-checkout cone).
+	echo >>sparse-index/folder2/a &&
+	git -C sparse-index add folder2/a &&
+
+	ensure_not_expanded sparse-checkout add folder1 &&
+
+	# Skip checks here, since deep/deeper1 is inside a sparse directory
+	# that must be expanded to check whether `deep/deeper1` is a file
+	# or not.
+	ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
diff --git a/unpack-trees.c b/unpack-trees.c
index 7f528d35cc2..9745e0dfc34 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -18,6 +18,7 @@
 #include "promisor-remote.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "sparse-index.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -2018,6 +2019,9 @@ enum update_sparsity_result update_sparsity(struct unpack_trees_options *o)
 			goto skip_sparse_checkout;
 	}
 
+	/* Expand sparse directories as needed */
+	expand_to_pattern_list(o->src_index, o->pl);
+
 	/* Set NEW_SKIP_WORKTREE on existing entries. */
 	mark_all_ce_unused(o->src_index);
 	mark_new_skip_worktree(o->pl, o->src_index, 0,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/10] sparse-index: create expand_to_pattern_list()
  2022-05-19 17:52   ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
@ 2022-05-19 19:50     ` Junio C Hamano
  2022-05-20 18:01       ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2022-05-19 19:50 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> list is NULL as that does the same as ensure_full_index(). In fact,
> ensure_full_index() is converted to a shim over
> expand_to_pattern_list().

Sounds like a natural evolution of the API that used to be
all-or-none to expand-only-those-that-match.

The old one had a sensible name to tell us that it is about the
in-core index (and "full index" implied it was about sparse-index
feature because what state other than "full" the index can be---some
are shrunk into tree entries, which by definition is the
sparse-index feature).  Contrasted to that, the name of the new one
is horrible.  It does not even have index anywhere in the name.

I wonder expand_index() would work?

> +void expand_to_pattern_list(struct index_state *istate,
> +			    struct pattern_list *pl)
>  {
>  	int i;
>  	struct index_state *full;
>  	struct strbuf base = STRBUF_INIT;
>  
> +	/*
> +	 * If the index is already full, then keep it full. We will convert
> +	 * it to a sparse index on write, if possible.
> +	 */
>  	if (!istate || !istate->sparse_index)
>  		return;
>  
> +	/*
> +	 * If our index is sparse, but our new pattern set does not use
> +	 * cone mode patterns, then we need to expand the index before we
> +	 * continue. A NULL pattern set indicates a full expansion to a
> +	 * full index.
> +	 */
> +	if (pl && !pl->use_cone_patterns)
> +		pl = NULL;
> +
>  	if (!istate->repo)
>  		istate->repo = the_repository;
>  
> -	trace2_region_enter("index", "ensure_full_index", istate->repo);
> +	/*
> +	 * A NULL pattern set indicates we are expanding a full index, so
> +	 * we use a special region name that indicates the full expansion.
> +	 * This is used by test cases, but also helps to differentiate the
> +	 * two cases.
> +	 */

Except that we lost the distinction for non-cone mode, which I am
not sure matters, but I suspect we do not have to, if we do not want
to.  Nobody used "pl" up to this point, so resetting it to NULL can
be done much later.  In later phases of this series, we add another
case where we can lose pl even if we are not using cone mode, so
this distinction may start to matter later.  I dunno.

I'd invent a separate "const char *tr2_region_label" variable and
set it at the beginning, regardless of where we clobber pl and why,
and use that label variable for trace2 calls, if I were doing this
patch.  That feels much simpler and cleaner.

> +	trace2_region_enter("index",
> +			    pl ? "expand_to_pattern_list" : "ensure_full_index",
> +			    istate->repo);

> diff --git a/sparse-index.h b/sparse-index.h
> index 633d4fb7e31..037b541f49d 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -23,4 +23,18 @@ void expand_to_path(struct index_state *istate,
>  struct repository;
>  int set_sparse_index_config(struct repository *repo, int enable);
>  
> +struct pattern_list;
> +
> +/**
> + * Scan the given index and compare its entries to the given pattern list.
> + * If the index is sparse and the pattern list uses cone mode patterns,
> + * then modify the index to contain the all of the file entries within that
> + * new pattern list. This expands sparse directories only as far as needed.
> + *
> + * If the pattern list is NULL or does not use cone mode patterns, then the
> + * index is expanded to a full index.
> + */
> +void expand_to_pattern_list(struct index_state *istate,
> +			      struct pattern_list *pl);
> +
>  #endif

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes
  2022-05-19 17:52   ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
@ 2022-05-19 20:05     ` Junio C Hamano
  2022-05-20 18:05       ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2022-05-19 20:05 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +enum sparse_index_mode {
> +	/*
> +	 * COMPLETELY_FULL: there are no sparse directories
> +	 * in the index at all.
> +	 */
> +	COMPLETELY_FULL = 0,

Two things that make me wonder are:

 * Do concrete values assigned to these symbols matter?  If we do

	if (some_function_that_returns_this_type())
		/* ok, we know it is full */
		do_full_thing();

   then the answer is yes.  If we write these values to on-disk
   file, and other versions and reimplementations of Git need to
   know the concrete values, then the answer is yes.  Otherwise, we
   may not want to say "= 0" and the like here.

 * Is it worth repeating the label in the comment?  IOW, I am
   wondering if

	/* there are no sparse directories in the index */
	COMPLETELY_FULL,

   is equally readable and slightly more maintainable.

Also these names being in cache.h and visible everywhere is somewhat
worrying.  Other CPP macros and enum constants in the file have
names that are prepared to be truly global, but COMPLETELY_FULL and
COLLAPSED do not hint that they are about sparse-index state as
strongly as PARTIALLY_SPARSE does.

> -		 /*
> -		  * sparse_index == 1 when sparse-directory
> -		  * entries exist. Requires sparse-checkout
> -		  * in cone mode.
> -		  */
> -		 sparse_index : 1;
> +		 fsmonitor_has_run_once : 1;
> +	enum sparse_index_mode sparse_index;

Good that we can lose the comment on a single bit.  Are some of the
enum sparse_index_mode values only possible in cone mode, just like
having true in this bit was only possible in cone mode?  Perhaps a
comment before "enum sparse_index_mode {" can say "in non-cone mode,
COMPLETELY_FULL is the only possible value" or somesuch?

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 05/10] cache-tree: implement cache_tree_find_path()
  2022-05-19 17:52   ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
@ 2022-05-19 20:14     ` Junio C Hamano
  2022-05-20 18:13       ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2022-05-19 20:14 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path)
> +{
> +	const char *slash;
> +	int namelen;
> +	struct cache_tree_sub *down;
> +
> +	if (!it)
> +		return NULL;
> +	slash = strchrnul(path, '/');
> +	namelen = slash - path;
> +	it->entry_count = -1;
> +	if (!*slash) {
> +		int pos;
> +		pos = cache_tree_subtree_pos(it, path, namelen);
> +		if (0 <= pos)
> +			return it->down[pos]->cache_tree;
> +		return NULL;
> +	}
> +	down = find_subtree(it, path, namelen, 0);
> +	if (down)
> +		return cache_tree_find_path(down->cache_tree, slash + 1);
> +	return NULL;
> +}

The tail recursion (and the one in the orginal) may want to be
trivially converted to iteration with "goto".

It is somewhat surprising that we didn't have any external interface
to expose a sub-part of the cache_tree at all so far.  It may be
because the API was so well designed that the abstraction did not
have to be exposed.  I dunno.

> diff --git a/cache-tree.h b/cache-tree.h
> index 8efeccebfc9..f75f8e74dcd 100644
> --- a/cache-tree.h
> +++ b/cache-tree.h
> @@ -29,6 +29,8 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
>  
>  int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
>  
> +struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path);
> +
>  void cache_tree_write(struct strbuf *, struct cache_tree *root);
>  struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index
  2022-05-19 17:52   ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
@ 2022-05-19 20:19     ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2022-05-19 20:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> When the --no-sparse-index option is supplied, the sparse-checkout
> builtin should explicitly ask to expand a sparse index to a full one.
> This is currently done implicitly due to the command_requires_full_index
> protection, but that will be removed in an upcoming change.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/sparse-checkout.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 88eea069ad4..cbff6ad00b0 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -413,6 +413,9 @@ static int update_modes(int *cone_mode, int *sparse_index)
>  		/* force an index rewrite */
>  		repo_read_index(the_repository);
>  		the_repository->index->updated_workdir = 1;
> +
> +		if (!*sparse_index)
> +			ensure_full_index(the_repository->index);

init_opts.sparse_index is initialized to -1 (unknown) and parse_options()
may set it to 0 (false) or 1 (true).  We call ensure_full only when
the caller explicitly asks with --no-sparse-index.

Makes sense.

>  	}
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 03/10] sparse-index: create expand_to_pattern_list()
  2022-05-19 19:50     ` Junio C Hamano
@ 2022-05-20 18:01       ` Derrick Stolee
  0 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-20 18:01 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

On 5/19/2022 3:50 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> list is NULL as that does the same as ensure_full_index(). In fact,
>> ensure_full_index() is converted to a shim over
>> expand_to_pattern_list().
> 
> Sounds like a natural evolution of the API that used to be
> all-or-none to expand-only-those-that-match.
> 
> The old one had a sensible name to tell us that it is about the
> in-core index (and "full index" implied it was about sparse-index
> feature because what state other than "full" the index can be---some
> are shrunk into tree entries, which by definition is the
> sparse-index feature).  Contrasted to that, the name of the new one
> is horrible.  It does not even have index anywhere in the name.
> 
> I wonder expand_index() would work?

Makes sense. Good suggestion.
 
>> -	trace2_region_enter("index", "ensure_full_index", istate->repo);
>> +	/*
>> +	 * A NULL pattern set indicates we are expanding a full index, so
>> +	 * we use a special region name that indicates the full expansion.
>> +	 * This is used by test cases, but also helps to differentiate the
>> +	 * two cases.
>> +	 */
> 
> Except that we lost the distinction for non-cone mode, which I am
> not sure matters, but I suspect we do not have to, if we do not want
> to.  Nobody used "pl" up to this point, so resetting it to NULL can
> be done much later.  In later phases of this series, we add another
> case where we can lose pl even if we are not using cone mode, so
> this distinction may start to matter later.  I dunno.
> 
> I'd invent a separate "const char *tr2_region_label" variable and
> set it at the beginning, regardless of where we clobber pl and why,
> and use that label variable for trace2 calls, if I were doing this
> patch.  That feels much simpler and cleaner.

Good idea.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes
  2022-05-19 20:05     ` Junio C Hamano
@ 2022-05-20 18:05       ` Derrick Stolee
  2022-05-20 18:23         ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee @ 2022-05-20 18:05 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

On 5/19/2022 4:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +enum sparse_index_mode {
>> +	/*
>> +	 * COMPLETELY_FULL: there are no sparse directories
>> +	 * in the index at all.
>> +	 */
>> +	COMPLETELY_FULL = 0,
> 
> Two things that make me wonder are:
> 
>  * Do concrete values assigned to these symbols matter?  If we do
> 
> 	if (some_function_that_returns_this_type())
> 		/* ok, we know it is full */
> 		do_full_thing();
> 
>    then the answer is yes.  If we write these values to on-disk
>    file, and other versions and reimplementations of Git need to
>    know the concrete values, then the answer is yes.  Otherwise, we
>    may not want to say "= 0" and the like here.

The = 0 case matters because that is the default when the index
struct is initialized to zero. The other values are not important.
 
>  * Is it worth repeating the label in the comment?  IOW, I am
>    wondering if
> 
> 	/* there are no sparse directories in the index */
> 	COMPLETELY_FULL,
> 
>    is equally readable and slightly more maintainable.

Can do.

> Also these names being in cache.h and visible everywhere is somewhat
> worrying.  Other CPP macros and enum constants in the file have
> names that are prepared to be truly global, but COMPLETELY_FULL and
> COLLAPSED do not hint that they are about sparse-index state as
> strongly as PARTIALLY_SPARSE does.

Would prepending with "INDEX_" be sufficient?

>> -		 /*
>> -		  * sparse_index == 1 when sparse-directory
>> -		  * entries exist. Requires sparse-checkout
>> -		  * in cone mode.
>> -		  */
>> -		 sparse_index : 1;
>> +		 fsmonitor_has_run_once : 1;
>> +	enum sparse_index_mode sparse_index;
> 
> Good that we can lose the comment on a single bit.  Are some of the
> enum sparse_index_mode values only possible in cone mode, just like
> having true in this bit was only possible in cone mode?  Perhaps a
> comment before "enum sparse_index_mode {" can say "in non-cone mode,
> COMPLETELY_FULL is the only possible value" or somesuch?

I can add that comment for the COMPLETELY_FULL enum value.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 05/10] cache-tree: implement cache_tree_find_path()
  2022-05-19 20:14     ` Junio C Hamano
@ 2022-05-20 18:13       ` Derrick Stolee
  0 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-20 18:13 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

On 5/19/2022 4:14 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path)
>> +{
>> +	const char *slash;
>> +	int namelen;
>> +	struct cache_tree_sub *down;
>> +
>> +	if (!it)
>> +		return NULL;
>> +	slash = strchrnul(path, '/');
>> +	namelen = slash - path;
>> +	it->entry_count = -1;
>> +	if (!*slash) {
>> +		int pos;
>> +		pos = cache_tree_subtree_pos(it, path, namelen);
>> +		if (0 <= pos)
>> +			return it->down[pos]->cache_tree;
>> +		return NULL;
>> +	}
>> +	down = find_subtree(it, path, namelen, 0);
>> +	if (down)
>> +		return cache_tree_find_path(down->cache_tree, slash + 1);
>> +	return NULL;
>> +}
> 
> The tail recursion (and the one in the orginal) may want to be
> trivially converted to iteration with "goto".

Good idea!
 
> It is somewhat surprising that we didn't have any external interface
> to expose a sub-part of the cache_tree at all so far.  It may be
> because the API was so well designed that the abstraction did not
> have to be exposed.  I dunno.

I think the abstraction definitely fit the shape of its consumers
pretty perfectly. I wonder if having a method like this will make
us think of new ways to use the cache tree for other things.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 07/10] sparse-index: partially expand directories
  2022-05-19 17:52   ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
@ 2022-05-20 18:17     ` Junio C Hamano
  2022-05-20 18:33       ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2022-05-20 18:17 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>  			     struct strbuf *base, const char *path,
>  			     unsigned int mode, void *context)
>  {

OK, this function is a callback of read_tree_at(), whose caller is
walking the source (=current) index, and while it is copying
existing non-directory entries to the destination (=ctx->write)
index, it found a tree in the source in-core index that does not
match the cone pattern (or we are expanding fully).  Our job is to
expand the given tree as a "subdirectory" in the project into the
destination index.

We used to just let the calling read_tree_at() fully and recursively
walk the tree while adding any non-tree entries to the destination.
Now, we have the pattern list in the context, we go more selective.

The design makes sense.

> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>  	struct cache_entry *ce;
>  	size_t len = base->len;
>  
> +	if (S_ISDIR(mode)) {
> +		int dtype;
> +		size_t baselen = base->len;
> +		if (!ctx->pl)
> +			return READ_TREE_RECURSIVE;

Fully recursive case.  We can do without any match, just like the
caller (i.e. expand_to_pattern_list) did, when the pattern list is
NULL.

> +		/*
> +		 * Have we expanded to a point outside of the sparse-checkout?
> +		 */
> +		strbuf_addstr(base, path);
> +		strbuf_add(base, "/-", 2);
> +
> +		if (path_matches_pattern_list(base->buf, base->len,
> +					      NULL, &dtype,
> +					      ctx->pl, ctx->write)) {

If that sample path in the directory matches (MATCHED or
MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.

Can the function return UNDECIDED here?  If so what should happen?
As written, the code will behave as if it matched, and it is not
quite clear if that is the behaviour intended by this patch or an
accident.

> +			strbuf_setlen(base, baselen);
> +			return READ_TREE_RECURSIVE;
> +		}

The caller found a tree at path <path> in the index.  We check if
our patterns match a fictitious path <path> + "/-", which may exist
if the <path> is a directory and if there is a funny named file or
directory "-" in it.

Why "-"?  Are we trying ot see if "ctx->pl" matches "anything" in
the directory that is at <path>?  Is the assumption here that pl
only names directories literally without blobs (I presume that is
the same thing as assuming the cone mode)?

I am trying to see if there is a way that expresses the intention of
this code more clearly than using an arbitrary path "-" (and trying
to figure out if I got the intention right in the first place ;-)..

> +		/*
> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
> +		 * name for inserting the entry into the index.
> +		 */
> +		strbuf_setlen(base, base->len - 1);

This removes that phoney "-" while keeping the trailing "/".  Just
like "-" was unclear, understanding this "- 1" requires that the
reader understands why "/-" was used earlier.

The resulting "base" is used in the newly created cache entry to
represent the (unexpanded) directory below, and such a cache entry
is supposed to have a path with a trailing slash, so it makes sense.

> +	} else {
> +		strbuf_addstr(base, path);

For any non-tree thing, we use the given path for the cache entry to
represent it.

> +	}
> +
> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
> -	set_index_entry(istate, istate->cache_nr++, ce);
> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);

And the cache entry (newly created) is added to the destination.
Unlike what happens in the caller (i.e. expand_to_pattern_list) that
moves the cache entry taken from the source index to the destination
index, the caller will discard the cache entry taken from the source
index after read_tree_at() returns, and we create a new instance for
ourselves here, even if we _could_ have reused it (by passing it in
the context structure, for example).

>  	strbuf_setlen(base, len);

And we restore the length of the path in the base to the original
before we return.

>  	return 0;

Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
dealt with the entry; don't recurse into subtree even if you called
us with a tree", which is the right thing to do here, as we did have
done all we need to here.

OK, except for that "/-" thing, all of the above makes sense to me.

Thanks.

> @@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  	int i;
>  	struct index_state *full;
>  	struct strbuf base = STRBUF_INIT;
> +	struct modify_index_context ctx;
>  
>  	/*
>  	 * If the index is already full, then keep it full. We will convert
> @@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate,
>  	full->cache_nr = 0;
>  	ALLOC_ARRAY(full->cache, full->cache_alloc);
>  
> +	ctx.write = full;
> +	ctx.pl = pl;
> +
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>  		struct tree *tree;
> @@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  		strbuf_add(&base, ce->name, strlen(ce->name));
>  
>  		read_tree_at(istate->repo, tree, &base, &ps,
> -			     add_path_to_index, full);
> +			     add_path_to_index, &ctx);
>  
>  		/* free directory entries. full entries are re-used */
>  		discard_cache_entry(ce);

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes
  2022-05-20 18:05       ` Derrick Stolee
@ 2022-05-20 18:23         ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2022-05-20 18:23 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, vdye, shaoxuan.yuan02,
	Derrick Stolee

Derrick Stolee <derrickstolee@github.com> writes:

>> Also these names being in cache.h and visible everywhere is somewhat
>> worrying.  Other CPP macros and enum constants in the file have
>> names that are prepared to be truly global, but COMPLETELY_FULL and
>> COLLAPSED do not hint that they are about sparse-index state as
>> strongly as PARTIALLY_SPARSE does.
>
> Would prepending with "INDEX_" be sufficient?

Yeah, index-collapsed, index-expanded, etc., sounds distinctive
enough.  Sorry, but I cannot think of a good phrase for partially
expanded case that is short, succinct and would go well with the
fully expanded and fully collapsed cases.

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 07/10] sparse-index: partially expand directories
  2022-05-20 18:17     ` Junio C Hamano
@ 2022-05-20 18:33       ` Derrick Stolee
  0 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-20 18:33 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

On 5/20/2022 2:17 PM, Junio C Hamano wrote:> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>>  			     struct strbuf *base, const char *path,
>>  			     unsigned int mode, void *context)
>>  {
> 
> OK, this function is a callback of read_tree_at(), whose caller is
> walking the source (=current) index, and while it is copying
> existing non-directory entries to the destination (=ctx->write)
> index, it found a tree in the source in-core index that does not
> match the cone pattern (or we are expanding fully).  Our job is to
> expand the given tree as a "subdirectory" in the project into the
> destination index.
> 
> We used to just let the calling read_tree_at() fully and recursively
> walk the tree while adding any non-tree entries to the destination.
> Now, we have the pattern list in the context, we go more selective.
> 
> The design makes sense.
> 
>> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>>  	struct cache_entry *ce;
>>  	size_t len = base->len;
>>  
>> +	if (S_ISDIR(mode)) {
>> +		int dtype;
>> +		size_t baselen = base->len;
>> +		if (!ctx->pl)
>> +			return READ_TREE_RECURSIVE;
> 
> Fully recursive case.  We can do without any match, just like the
> caller (i.e. expand_to_pattern_list) did, when the pattern list is
> NULL.
> 
>> +		/*
>> +		 * Have we expanded to a point outside of the sparse-checkout?
>> +		 */
>> +		strbuf_addstr(base, path);
>> +		strbuf_add(base, "/-", 2);
>> +
>> +		if (path_matches_pattern_list(base->buf, base->len,
>> +					      NULL, &dtype,
>> +					      ctx->pl, ctx->write)) {
> 
> If that sample path in the directory matches (MATCHED or
> MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.
> 
> Can the function return UNDECIDED here?  If so what should happen?
> As written, the code will behave as if it matched, and it is not
> quite clear if that is the behaviour intended by this patch or an
> accident.

UNDECIDED can only be returned when not in cone mode. We are in cone mode
when this helper is used.

>> +			strbuf_setlen(base, baselen);
>> +			return READ_TREE_RECURSIVE;
>> +		}
> 
> The caller found a tree at path <path> in the index.  We check if
> our patterns match a fictitious path <path> + "/-", which may exist
> if the <path> is a directory and if there is a funny named file or
> directory "-" in it.
> 
> Why "-"?  Are we trying ot see if "ctx->pl" matches "anything" in
> the directory that is at <path>?  Is the assumption here that pl
> only names directories literally without blobs (I presume that is
> the same thing as assuming the cone mode)?
> 
> I am trying to see if there is a way that expresses the intention of
> this code more clearly than using an arbitrary path "-" (and trying
> to figure out if I got the intention right in the first place ;-)..

The '+ "/-"' is there to give us an example file path for "<path>" as a
directory. This is specifically because path_matches_pattern_list() checks
the parent directories for matches, expecting the input to be a file name.

This could use a comment. I'm thinking...

/*
 * Have we expanded to a point outside of the sparse-checkout?
 *
 * Artificially pad the path name with a slash "/" to
 * indicate it as a directory, and add an arbitrary file
 * name ("-") so we can consider base->buf as a file name
 * to match against the cone-mode patterns.
 *
 * If we compared just "path", then we would expand more
 * than we should. Since every file at root is always
 * included, we would expand every directory at root at
 * least one level deep instead of using sparse directory
 * entries.
 */

>> +		/*
>> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
>> +		 * name for inserting the entry into the index.
>> +		 */
>> +		strbuf_setlen(base, base->len - 1);
> 
> This removes that phoney "-" while keeping the trailing "/".  Just
> like "-" was unclear, understanding this "- 1" requires that the
> reader understands why "/-" was used earlier.
> 
> The resulting "base" is used in the newly created cache entry to
> represent the (unexpanded) directory below, and such a cache entry
> is supposed to have a path with a trailing slash, so it makes sense.
> 
>> +	} else {
>> +		strbuf_addstr(base, path);
> 
> For any non-tree thing, we use the given path for the cache entry to
> represent it.
> 
>> +	}
>> +
>> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>> -	set_index_entry(istate, istate->cache_nr++, ce);
>> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
> 
> And the cache entry (newly created) is added to the destination.
> Unlike what happens in the caller (i.e. expand_to_pattern_list) that
> moves the cache entry taken from the source index to the destination
> index, the caller will discard the cache entry taken from the source
> index after read_tree_at() returns, and we create a new instance for
> ourselves here, even if we _could_ have reused it (by passing it in
> the context structure, for example).
> 
>>  	strbuf_setlen(base, len);
> 
> And we restore the length of the path in the base to the original
> before we return.
> 
>>  	return 0;
> 
> Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
> dealt with the entry; don't recurse into subtree even if you called
> us with a tree", which is the right thing to do here, as we did have
> done all we need to here.
> 
> OK, except for that "/-" thing, all of the above makes sense to me.

Thanks for your careful read. Hopefully my explanation of the "/-" thing
helps.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-19 17:52   ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
@ 2022-05-21  7:45     ` Junio C Hamano
  2022-05-23 13:13       ` Derrick Stolee
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2022-05-21  7:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	if (pl && !pl->use_cone_patterns) {
>  		pl = NULL;
> +	} else {
> +		/*
> +		 * We might contract file entries into sparse-directory
> +		 * entries, and for that we will need the cache tree to
> +		 * be recomputed.
> +		 */
> +		cache_tree_free(&istate->cache_tree);
> +
> +		/*
> +		 * If there is a problem creating the cache tree, then we
> +		 * need to expand to a full index since we cannot satisfy
> +		 * the current request as a sparse index.
> +		 */
> +		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
> +			pl = NULL;
> +	}

So, if the current index has some irrelevant (i.e. do not match the
pattern list) subtrees in collapsed form, presense of an unmerged
entry, presumably inside the cone(s) we are interested in, makes us
lose the pattern list here, and we end up expanding everything?

I suspect that is a situation that is not so uncommon.  Working
inside a narrow cone of a wide tree, performing a merge would
hopefully allow many subtrees that are outside of the cones of our
interest merged without getting expanded at all (e.g. only the other
side touched these subtrees we are not working on, so their version
will become the merge result), while changes to some paths in the
cone of our interest may result in true conflicts represented as
cache entries at higher stages, needing conflict resolution
concluded with "git add".  Having to expand these subtrees that we
managed to merge while still collapsed, only because we have
conflicts in some other parts of the tree, feels somewhat sad.

By the way, why are we passing the "--missing-ok" option to "git
write-tree" here?

> @@ -330,11 +352,22 @@ void expand_to_pattern_list(struct index_state *istate,
>  		struct cache_entry *ce = istate->cache[i];
>  		struct tree *tree;
>  		struct pathspec ps;
> +		int dtype;
>  
>  		if (!S_ISSPARSEDIR(ce->ce_mode)) {
>  			set_index_entry(full, full->cache_nr++, ce);
>  			continue;
>  		}
> +
> +		/* We now have a sparse directory entry. Should we expand? */
> +		if (pl &&
> +		    path_matches_pattern_list(ce->name, ce->ce_namelen,
> +					      NULL, &dtype,
> +					      pl, istate) == NOT_MATCHED) {
> +			set_index_entry(full, full->cache_nr++, ce);
> +			continue;
> +		}
> +
>  		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
>  			warning(_("index entry is a directory, but not sparse (%08x)"),
>  				ce->ce_flags);
> @@ -360,7 +393,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  	/* Copy back into original index. */
>  	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
>  	memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash));
> -	istate->sparse_index = 0;
> +	istate->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
>  	free(istate->cache);
>  	istate->cache = full->cache;
>  	istate->cache_nr = full->cache_nr;
> @@ -374,7 +407,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  
>  	/* Clear and recompute the cache-tree */
>  	cache_tree_free(&istate->cache_tree);
> -	cache_tree_update(istate, 0);
> +	cache_tree_update(istate, WRITE_TREE_MISSING_OK);

The same question here.  We didn't say "missing trees are OK".  What
made it OK in this change?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-21  7:45     ` Junio C Hamano
@ 2022-05-23 13:13       ` Derrick Stolee
  2022-05-23 13:18         ` Derrick Stolee
  2022-05-23 22:48         ` Junio C Hamano
  0 siblings, 2 replies; 55+ messages in thread
From: Derrick Stolee @ 2022-05-23 13:13 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

On 5/21/2022 3:45 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	if (pl && !pl->use_cone_patterns) {
>>  		pl = NULL;
>> +	} else {
>> +		/*
>> +		 * We might contract file entries into sparse-directory
>> +		 * entries, and for that we will need the cache tree to
>> +		 * be recomputed.
>> +		 */
>> +		cache_tree_free(&istate->cache_tree);
>> +
>> +		/*
>> +		 * If there is a problem creating the cache tree, then we
>> +		 * need to expand to a full index since we cannot satisfy
>> +		 * the current request as a sparse index.
>> +		 */
>> +		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
>> +			pl = NULL;
>> +	}
> 
> So, if the current index has some irrelevant (i.e. do not match the
> pattern list) subtrees in collapsed form, presense of an unmerged
> entry, presumably inside the cone(s) we are interested in, makes us
> lose the pattern list here, and we end up expanding everything?
> 
> I suspect that is a situation that is not so uncommon.  Working
> inside a narrow cone of a wide tree, performing a merge would
> hopefully allow many subtrees that are outside of the cones of our
> interest merged without getting expanded at all (e.g. only the other
> side touched these subtrees we are not working on, so their version
> will become the merge result), while changes to some paths in the
> cone of our interest may result in true conflicts represented as
> cache entries at higher stages, needing conflict resolution
> concluded with "git add".  Having to expand these subtrees that we
> managed to merge while still collapsed, only because we have
> conflicts in some other parts of the tree, feels somewhat sad.

You are correct that conflicts outside of the sparse-checkout cone will
cause index expansion. That happens during the 'git merge' command, but
the index will continue to fail to collapse as long as those conflicts
still exist in the index.

When there are conflicts like this during the merge, then the index
expansion is not as large of a portion of the command as normal, because
the conflict resolution also takes some time to compute. The commands
afterwards do take longer purely because of the expanded index.

However, this state is also not as common as you might think. If a user
has a sparse-checkout cone specified, then they are unlikely to change
files outside of the sparse-checkout cone. They would not be the reason
that those files have a conflict. The conflicts would exist only if they
are merging branches that had conflicts outside of the cone. Typically,
any merge of external changes like this are of the form of "git pull" or
"git rebase", in which case the conflicts are still "local" to the
developer's changes.

You are right that there is additional work that could be done here,
specifically allowing the cache tree to partially succeed and use the
successfully generated trees to create sparse directory entries where
possible. This was not viable before because we lacked the "partially
expanded" index state. This series establishes the necessary vocabulary to
do such an improvement later.
> By the way, why are we passing the "--missing-ok" option to "git
> write-tree" here?
> 
>> +	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
> 
> The same question here.  We didn't say "missing trees are OK".  What
> made it OK in this change?
 
Both of these additions of WRITE_TREE_MISSING_OK are not needed. I
think I added them in an earlier version, thinking they were needed
due to something in the Scalar functional tests. I confirmed just now
that they are not needed for that. I will remove them.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-23 13:13       ` Derrick Stolee
@ 2022-05-23 13:18         ` Derrick Stolee
  2022-05-23 18:01           ` Junio C Hamano
  2022-05-23 22:48         ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Derrick Stolee @ 2022-05-23 13:18 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

On 5/23/2022 9:13 AM, Derrick Stolee wrote:
> On 5/21/2022 3:45 AM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>> By the way, why are we passing the "--missing-ok" option to "git
>> write-tree" here?
>>
>>> +	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
>>
>> The same question here.  We didn't say "missing trees are OK".  What
>> made it OK in this change?
>  
> Both of these additions of WRITE_TREE_MISSING_OK are not needed. I
> think I added them in an earlier version, thinking they were needed
> due to something in the Scalar functional tests. I confirmed just now
> that they are not needed for that. I will remove them.

As I went to remove these from sparse-index.c, I found another that
exists, but for good reason. See 8a96b9d0a (sparse-index: use
WRITE_TREE_MISSING_OK, 2021-09-08), which explains that that instance
needs the flag because it could be used during 'git add'.

It still isn't necessary in the instances being added here, so I will
remove them.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 00/10] Sparse index: integrate with sparse-checkout
  2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2022-05-19 17:52   ` [PATCH v2 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48   ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
                       ` (9 more replies)
  10 siblings, 10 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee

This series is based on ds/sparse-colon-path.

This series integrates the 'git sparse-checkout' builtin with the sparse
index. This is the last integration that we fast-tracked into the
microsoft/git fork. After this, we have no work in flight that would
conflict with a Google Summer of Code project in this area.

The tricky part about the sparse-checkout builtin is that we actually do
need to expand the index when growing the sparse-checkout boundary. The
trick is to expand it only as far as we need it, and then ensure that we
collapse any removed directories before the command completes.

To do this, we introduce a concept of a "partially expanded" index. In fact,
we break the boolean sparse_index member into an enum with three states:

 * INDEX_EXPANDED (0): No sparse directories exist.

 * INDEX_COLLAPSED: Sparse directories may exist. Files outside the
   sparse-checkout cone are reduced to sparse directory entries whenever
   possible.

 * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries
   outside the sparse-checkout cone may exist. Running convert_to_sparse()
   may further reduce those files to sparse directory entries.

Most of the patches in this series focus on introducing this enum and
carefully converting previous uses of the boolean to use the enum. Some
additional work is required to refactor ensure_full_index() into a new
expand_index() method, as they are doing essentially the same thing, but
with different scopes.

The result is improved performance on the sparse-checkout builtin as
demonstrated in a new p2000-sparse-operations.sh performance test:


Test HEAD~1 HEAD
================

2000.24: git sparse-checkout ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53)
-26.6% 2000.25: git sparse-checkout ... (sparse-v4) 2.20(1.62+0.57)
1.58(0.98+0.59) -28.2%

The improvement here is less dramatic because the operation is dominated by
writing and deleting a lot of files in the worktree. A repeated no-op
operation such as git sparse-checkout set $SPARSE_CONE would show a greater
improvement, but is less interesting since it could gain that improvement
without satisfying the "hard" parts of this builtin.

I specifically chose how to update the tests in t1092 and p2000 to avoid
conflicts with Victoria's 'git stash' work.


Updates in v3
=============

 * expand_to_pattern_list() is renamed to expand_index().

 * The enum values are renamed to be less ambiguous.

 * Use a local variable to store the trace2 region label.

 * A tail-recursive method is converted to an iterative one.

 * Comments are added.

 * WRITE_MISSING_TREE_OK flags are removed as unnecessary.


Updates in v2
=============

 * Typo fixes.
 * Two patches are added to the start to (a) refactor existing sparse index
   content tests, and (b) add new sparse index content tests with additional
   scenarios.
 * Use NOT_MATCHED directly instead of implicitly allowing UNDECIDED when
   matching in cone mode.

Thanks, -Stolee

Derrick Stolee (10):
  t1092: refactor 'sparse-index contents' test
  t1092: stress test 'git sparse-checkout set'
  sparse-index: create expand_index()
  sparse-index: introduce partially-sparse indexes
  cache-tree: implement cache_tree_find_path()
  sparse-checkout: --no-sparse-index needs a full index
  sparse-index: partially expand directories
  sparse-index: complete partial expansion
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-checkout: integrate with sparse index

 builtin/sparse-checkout.c                |   8 +-
 cache-tree.c                             |  27 +++++
 cache-tree.h                             |   2 +
 cache.h                                  |  33 ++++--
 read-cache.c                             |   6 +-
 sparse-index.c                           | 132 ++++++++++++++++++++---
 sparse-index.h                           |  13 +++
 t/perf/p2000-sparse-operations.sh        |   1 +
 t/t1092-sparse-checkout-compatibility.sh |  95 ++++++++++++----
 unpack-trees.c                           |   4 +
 10 files changed, 275 insertions(+), 46 deletions(-)


base-commit: 124b05b23005437fa5fb91863bde2a8f5840e164
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1208%2Fderrickstolee%2Fsparse-index%2Fsparse-checkout-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1208/derrickstolee/sparse-index/sparse-checkout-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1208

Range-diff vs v2:

  1:  f2960747ed8 =  1:  f2960747ed8 t1092: refactor 'sparse-index contents' test
  2:  5030eeecf4f =  2:  5030eeecf4f t1092: stress test 'git sparse-checkout set'
  3:  d15338573e5 !  3:  44b0549a288 sparse-index: create expand_to_pattern_list()
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    sparse-index: create expand_to_pattern_list()
     +    sparse-index: create expand_index()
      
          This is the first change in a series to allow modifying the
          sparse-checkout pattern set without expanding a sparse index to a full
     @@ Commit message
      
          We already have logic in ensure_full_index() that expands the index
          entries, so we will use that as our base. Create a new method,
     -    expand_to_pattern_list(), which takes a pattern list, but for now mostly
     -    ignores it. The current implementation is only correct when the pattern
     -    list is NULL as that does the same as ensure_full_index(). In fact,
     -    ensure_full_index() is converted to a shim over
     -    expand_to_pattern_list().
     +    expand_index(), which takes a pattern list, but for now mostly ignores
     +    it. The current implementation is only correct when the pattern list is
     +    NULL as that does the same as ensure_full_index(). In fact,
     +    ensure_full_index() is converted to a shim over expand_index().
      
     -    A future update will actually implement expand_to_pattern_list() to its
     -    full capabilities. For now, it is created and documented.
     +    A future update will actually implement expand_index() to its full
     +    capabilities. For now, it is created and documented.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     @@ sparse-index.c: static int add_path_to_index(const struct object_id *oid,
       }
       
      -void ensure_full_index(struct index_state *istate)
     -+void expand_to_pattern_list(struct index_state *istate,
     -+			    struct pattern_list *pl)
     ++void expand_index(struct index_state *istate, struct pattern_list *pl)
       {
       	int i;
       	struct index_state *full;
       	struct strbuf base = STRBUF_INIT;
     ++	const char *tr_region;
       
      +	/*
      +	 * If the index is already full, then keep it full. We will convert
     @@ sparse-index.c: static int add_path_to_index(const struct object_id *oid,
      +	 * This is used by test cases, but also helps to differentiate the
      +	 * two cases.
      +	 */
     -+	trace2_region_enter("index",
     -+			    pl ? "expand_to_pattern_list" : "ensure_full_index",
     -+			    istate->repo);
     ++	tr_region = pl ? "expand_index" : "ensure_full_index";
     ++	trace2_region_enter("index", tr_region, istate->repo);
       
       	/* initialize basics of new index */
       	full = xcalloc(1, sizeof(struct index_state));
     @@ sparse-index.c: void ensure_full_index(struct index_state *istate)
       	cache_tree_update(istate, 0);
       
      -	trace2_region_leave("index", "ensure_full_index", istate->repo);
     -+	trace2_region_leave("index",
     -+			    pl ? "expand_to_pattern_list" : "ensure_full_index",
     -+			    istate->repo);
     ++	trace2_region_leave("index", tr_region, istate->repo);
      +}
      +
      +void ensure_full_index(struct index_state *istate)
      +{
     -+	expand_to_pattern_list(istate, NULL);
     ++	expand_index(istate, NULL);
       }
       
       void ensure_correct_sparsity(struct index_state *istate)
     @@ sparse-index.h: void expand_to_path(struct index_state *istate,
      + * If the pattern list is NULL or does not use cone mode patterns, then the
      + * index is expanded to a full index.
      + */
     -+void expand_to_pattern_list(struct index_state *istate,
     -+			      struct pattern_list *pl);
     ++void expand_index(struct index_state *istate, struct pattern_list *pl);
      +
       #endif
  4:  269c206c331 !  4:  8a459d6c67d sparse-index: introduce partially-sparse indexes
     @@ Commit message
          For now, convert the single-bit member 'sparse_index' in 'struct
          index_state' to be a an 'enum sparse_index_mode' with three modes:
      
     -    * COMPLETELY_FULL (0): No sparse directories exist.
     +    * INDEX_EXPANDED (0): No sparse directories exist. This is always the
     +      case for repositories that do not use cone-mode sparse-checkout.
      
     -    * COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
     +    * INDEX_COLLAPSED: Sparse directories may exist. Files outside the
            sparse-checkout cone are reduced to sparse directory entries whenever
            possible.
      
     -    * PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
     -      outside the sparse-checkout cone may exist. Running
     +    * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
     +      entries outside the sparse-checkout cone may exist. Running
            convert_to_sparse() may further reduce those files to sparse directory
            entries.
      
          The main reason to store this extra information is to allow
          convert_to_sparse() to short-circuit when the index is already in
     -    COMPLETELY_SPARSE mode but to actually do the necessary work when in
     -    PARTIALLY_SPARSE mode.
     +    INDEX_EXPANDED mode but to actually do the necessary work when in
     +    INDEX_PARTIALLY_SPARSE mode.
      
     -    The PARTIALLY_SPARSE mode will be used in an upcoming change.
     +    The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     @@ builtin/sparse-checkout.c: static void clean_tracked_sparse_directories(struct r
       	 * conflicted entries or submodules.
       	 */
      -	if (!r->index->sparse_index) {
     -+	if (r->index->sparse_index == COMPLETELY_FULL) {
     ++	if (r->index->sparse_index == INDEX_EXPANDED) {
       		/*
       		 * If something, such as a merge conflict or other concern,
       		 * prevents us from converting to a sparse index, then do
     @@ cache.h: struct untracked_cache;
       
      +enum sparse_index_mode {
      +	/*
     -+	 * COMPLETELY_FULL: there are no sparse directories
     -+	 * in the index at all.
     ++	 * There are no sparse directories in the index at all.
     ++	 *
     ++	 * Repositories that don't use cone-mode sparse-checkout will
     ++	 * always have their indexes in this mode.
      +	 */
     -+	COMPLETELY_FULL = 0,
     ++	INDEX_EXPANDED = 0,
      +
      +	/*
     -+	 * COLLAPSED: the index has already been collapsed to sparse
     -+	 * directories whereever possible.
     ++	 * The index has already been collapsed to sparse directories
     ++	 * whereever possible.
      +	 */
     -+	COLLAPSED = 1,
     ++	INDEX_COLLAPSED,
      +
      +	/*
     -+	 * PARTIALLY_SPARSE: the sparse directories that exist are
     -+	 * outside the sparse-checkout boundary, but it is possible
     -+	 * that some file entries could collapse to sparse directory
     -+	 * entries.
     ++	 * The sparse directories that exist are outside the
     ++	 * sparse-checkout boundary, but it is possible that some file
     ++	 * entries could collapse to sparse directory entries.
      +	 */
     -+	PARTIALLY_SPARSE = 2,
     ++	INDEX_PARTIALLY_SPARSE,
      +};
      +
       struct index_state {
     @@ read-cache.c: static const char *alternate_index_output;
       {
       	if (S_ISSPARSEDIR(ce->ce_mode))
      -		istate->sparse_index = 1;
     -+		istate->sparse_index = COLLAPSED;
     ++		istate->sparse_index = INDEX_COLLAPSED;
       
       	istate->cache[nr] = ce;
       	add_name_hash(istate, ce);
     @@ read-cache.c: static int read_index_extension(struct index_state *istate,
       	case CACHE_EXT_SPARSE_DIRECTORIES:
       		/* no content, only an indicator */
      -		istate->sparse_index = 1;
     -+		istate->sparse_index = COLLAPSED;
     ++		istate->sparse_index = INDEX_COLLAPSED;
       		break;
       	default:
       		if (*ext < 'A' || 'Z' < *ext)
     @@ read-cache.c: static int do_write_locked_index(struct index_state *istate, struc
       {
       	int ret;
      -	int was_full = !istate->sparse_index;
     -+	int was_full = istate->sparse_index == COMPLETELY_FULL;
     ++	int was_full = istate->sparse_index == INDEX_EXPANDED;
       
       	ret = convert_to_sparse(istate, 0);
       
     @@ sparse-index.c: int convert_to_sparse(struct index_state *istate, int flags)
       	 * cannot be converted to sparse, do not convert.
       	 */
      -	if (istate->sparse_index || !istate->cache_nr ||
     -+	if (istate->sparse_index == COLLAPSED || !istate->cache_nr ||
     ++	if (istate->sparse_index == INDEX_COLLAPSED || !istate->cache_nr ||
       	    !is_sparse_index_allowed(istate, flags))
       		return 0;
       
     @@ sparse-index.c: int convert_to_sparse(struct index_state *istate, int flags)
       	FREE_AND_NULL(istate->fsmonitor_last_update);
       
      -	istate->sparse_index = 1;
     -+	istate->sparse_index = COLLAPSED;
     ++	istate->sparse_index = INDEX_COLLAPSED;
       	trace2_region_leave("index", "convert_to_sparse", istate->repo);
       	return 0;
       }
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       	 * If the index is already full, then keep it full. We will convert
       	 * it to a sparse index on write, if possible.
       	 */
      -	if (!istate || !istate->sparse_index)
     -+	if (!istate || istate->sparse_index == COMPLETELY_FULL)
     ++	if (!istate || istate->sparse_index == INDEX_EXPANDED)
       		return;
       
       	/*
  5:  c977001c033 !  5:  9103584ed75 cache-tree: implement cache_tree_find_path()
     @@ Commit message
          path exists.
      
          The implementation is adapted from do_invalidate_path() which does a
     -    similar search but also modifies the nodes it finds along the way.
     +    similar search but also modifies the nodes it finds along the way. The
     +    method could be implemented simply using tail-recursion, but this while
     +    loop does the same thing.
      
          This new method is not currently used, but will be in an upcoming
          change.
      
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## cache-tree.c ##
     @@ cache-tree.c: struct cache_tree_sub *cache_tree_sub(struct cache_tree *it, const
      +{
      +	const char *slash;
      +	int namelen;
     -+	struct cache_tree_sub *down;
     ++	struct cache_tree_sub it_sub = {
     ++		.cache_tree = it,
     ++	};
     ++	struct cache_tree_sub *down = &it_sub;
      +
     -+	if (!it)
     -+		return NULL;
     -+	slash = strchrnul(path, '/');
     -+	namelen = slash - path;
     -+	it->entry_count = -1;
     -+	if (!*slash) {
     -+		int pos;
     -+		pos = cache_tree_subtree_pos(it, path, namelen);
     -+		if (0 <= pos)
     -+			return it->down[pos]->cache_tree;
     -+		return NULL;
     ++	while (down) {
     ++		slash = strchrnul(path, '/');
     ++		namelen = slash - path;
     ++		down->cache_tree->entry_count = -1;
     ++		if (!*slash) {
     ++			int pos;
     ++			pos = cache_tree_subtree_pos(down->cache_tree, path, namelen);
     ++			if (0 <= pos)
     ++				return down->cache_tree->down[pos]->cache_tree;
     ++			return NULL;
     ++		}
     ++		down = find_subtree(it, path, namelen, 0);
     ++		path = slash + 1;
      +	}
     -+	down = find_subtree(it, path, namelen, 0);
     -+	if (down)
     -+		return cache_tree_find_path(down->cache_tree, slash + 1);
     ++
      +	return NULL;
      +}
      +
  6:  e42463de0d2 =  6:  75ecb579a9f sparse-checkout: --no-sparse-index needs a full index
  7:  346c56bf256 !  7:  310e59d9f0e sparse-index: partially expand directories
     @@ sparse-index.c: static int add_path_to_index(const struct object_id *oid,
      -	strbuf_addstr(base, path);
      +		/*
      +		 * Have we expanded to a point outside of the sparse-checkout?
     ++		 *
     ++		 * Artificially pad the path name with a slash "/" to
     ++		 * indicate it as a directory, and add an arbitrary file
     ++		 * name ("-") so we can consider base->buf as a file name
     ++		 * to match against the cone-mode patterns.
     ++		 *
     ++		 * If we compared just "path", then we would expand more
     ++		 * than we should. Since every file at root is always
     ++		 * included, we would expand every directory at root at
     ++		 * least one level deep instead of using sparse directory
     ++		 * entries.
      +		 */
      +		strbuf_addstr(base, path);
      +		strbuf_add(base, "/-", 2);
     @@ sparse-index.c: static int add_path_to_index(const struct object_id *oid,
       
       	strbuf_setlen(base, len);
       	return 0;
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     - 	int i;
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       	struct index_state *full;
       	struct strbuf base = STRBUF_INIT;
     + 	const char *tr_region;
      +	struct modify_index_context ctx;
       
       	/*
       	 * If the index is already full, then keep it full. We will convert
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       	full->cache_nr = 0;
       	ALLOC_ARRAY(full->cache, full->cache_alloc);
       
     @@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
       		struct tree *tree;
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       		strbuf_add(&base, ce->name, strlen(ce->name));
       
       		read_tree_at(istate->repo, tree, &base, &ps,
  8:  ed640e3645b !  8:  b3cbfd3f136 sparse-index: complete partial expansion
     @@ Commit message
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## sparse-index.c ##
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       	 * continue. A NULL pattern set indicates a full expansion to a
       	 * full index.
       	 */
     @@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
      +		 * need to expand to a full index since we cannot satisfy
      +		 * the current request as a sparse index.
      +		 */
     -+		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
     ++		if (cache_tree_update(istate, 0))
      +			pl = NULL;
      +	}
       
       	if (!istate->repo)
       		istate->repo = the_repository;
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       	full = xcalloc(1, sizeof(struct index_state));
       	memcpy(full, istate, sizeof(struct index_state));
       
     @@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
      +	 * are only modifying the list of sparse directories. This hinges
      +	 * on whether we have a non-NULL pattern list.
      +	 */
     -+	full->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
     ++	full->sparse_index = pl ? INDEX_PARTIALLY_SPARSE : INDEX_EXPANDED;
      +
       	/* then change the necessary things */
      -	full->sparse_index = 0;
       	full->cache_alloc = (3 * istate->cache_alloc) / 2;
       	full->cache_nr = 0;
       	ALLOC_ARRAY(full->cache, full->cache_alloc);
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       		struct cache_entry *ce = istate->cache[i];
       		struct tree *tree;
       		struct pathspec ps;
     @@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
       		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
       			warning(_("index entry is a directory, but not sparse (%08x)"),
       				ce->ce_flags);
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     +@@ sparse-index.c: void expand_index(struct index_state *istate, struct pattern_list *pl)
       	/* Copy back into original index. */
       	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
       	memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash));
      -	istate->sparse_index = 0;
     -+	istate->sparse_index = pl ? PARTIALLY_SPARSE : COMPLETELY_FULL;
     ++	istate->sparse_index = pl ? INDEX_PARTIALLY_SPARSE : INDEX_EXPANDED;
       	free(istate->cache);
       	istate->cache = full->cache;
       	istate->cache_nr = full->cache_nr;
     -@@ sparse-index.c: void expand_to_pattern_list(struct index_state *istate,
     - 
     - 	/* Clear and recompute the cache-tree */
     - 	cache_tree_free(&istate->cache_tree);
     --	cache_tree_update(istate, 0);
     -+	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
     - 
     - 	trace2_region_leave("index",
     - 			    pl ? "expand_to_pattern_list" : "ensure_full_index",
  9:  089ab086f58 =  9:  aa3bdcf705c p2000: add test for 'git sparse-checkout [add|set]'
 10:  ad9ed6973d5 ! 10:  11728619120 sparse-checkout: integrate with sparse index
     @@ unpack-trees.c: enum update_sparsity_result update_sparsity(struct unpack_trees_
       	}
       
      +	/* Expand sparse directories as needed */
     -+	expand_to_pattern_list(o->src_index, o->pl);
     ++	expand_index(o->src_index, o->pl);
      +
       	/* Set NEW_SKIP_WORKTREE on existing entries. */
       	mark_all_ce_unused(o->src_index);

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before expanding this test with more involved cases, first extract the
repeated logic into a new test_sparse_checkout_set helper. This helper
checks that 'git sparse-checkout set ...' succeeds and then verifies
that certain directories have sparse directory entries in the sparse
index. It also verifies that the in-cone directories are _not_ sparse
directory entries in the sparse index.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 53 ++++++++++++++++--------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 93bcfd20bbc..e7c0ae9b953 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -205,36 +205,53 @@ test_sparse_unstaged () {
 	done
 }
 
-test_expect_success 'sparse-index contents' '
-	init_repos &&
-
+# Usage: test_sprase_checkout_set "<c1> ... <cN>" "<s1> ... <sM>"
+# Verifies that "git sparse-checkout set <c1> ... <cN>" succeeds and
+# leaves the sparse index in a state where <s1> ... <sM> are sparse
+# directories (and <c1> ... <cN> are not).
+test_sparse_checkout_set () {
+	CONE_DIRS=$1 &&
+	SPARSE_DIRS=$2 &&
+	git -C sparse-index sparse-checkout set $CONE_DIRS &&
 	git -C sparse-index ls-files --sparse --stage >cache &&
-	for dir in folder1 folder2 x
+
+	# Check that the directories outside of the sparse-checkout cone
+	# have sparse directory entries.
+	for dir in $SPARSE_DIRS
 	do
 		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
 		grep "040000 $TREE 0	$dir/" cache \
 			|| return 1
 	done &&
 
-	git -C sparse-index sparse-checkout set folder1 &&
-
-	git -C sparse-index ls-files --sparse --stage >cache &&
-	for dir in deep folder2 x
+	# Check that the directories in the sparse-checkout cone
+	# are not sparse directory entries.
+	for dir in $CONE_DIRS
 	do
 		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
-		grep "040000 $TREE 0	$dir/" cache \
+		! grep "040000 $TREE 0	$dir/" cache \
 			|| return 1
-	done &&
+	done
+}
 
-	git -C sparse-index sparse-checkout set deep/deeper1 &&
+test_expect_success 'sparse-index contents' '
+	init_repos &&
 
-	git -C sparse-index ls-files --sparse --stage >cache &&
-	for dir in deep/deeper2 folder1 folder2 x
-	do
-		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
-		grep "040000 $TREE 0	$dir/" cache \
-			|| return 1
-	done &&
+	# Remove deep, add three other directories.
+	test_sparse_checkout_set \
+		"folder1 folder2 x" \
+		"before deep" &&
+
+	# Remove folder1, add deep
+	test_sparse_checkout_set \
+		"deep folder2 x" \
+		"before folder1" &&
+
+	# Replace deep with deep/deeper2 (dropping deep/deeper1)
+	# Add folder1
+	test_sparse_checkout_set \
+		"deep/deeper2 folder1 folder2 x" \
+		"before deep/deeper1" &&
 
 	# Disabling the sparse-index replaces tree entries with full ones
 	git -C sparse-index sparse-checkout init --no-sparse-index &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set'
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 03/10] sparse-index: create expand_index() Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The 'sparse-index contents' test checks that the sparse index has the
correct set of sparse directories in the index after modifying the cone
mode patterns using 'git sparse-checkout set'. Add to the coverage here
by adding more complicated scenarios that were not previously tested.

In order to check paths that do not exist at HEAD, we need to modify the
test_sparse_checkout_set helper slightly:

1. Add the --skip-checks argument to the 'set' command to avoid failures
   when passing paths that do not exist at HEAD.

2. When looking for the non-existence of sparse directories for the
   paths in $CONE_DIRS, allow the rev-list command to fail because the
   path does not exist at HEAD.

This allows us to add some interesting test cases.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e7c0ae9b953..785820f9fd5 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -212,7 +212,7 @@ test_sparse_unstaged () {
 test_sparse_checkout_set () {
 	CONE_DIRS=$1 &&
 	SPARSE_DIRS=$2 &&
-	git -C sparse-index sparse-checkout set $CONE_DIRS &&
+	git -C sparse-index sparse-checkout set --skip-checks $CONE_DIRS &&
 	git -C sparse-index ls-files --sparse --stage >cache &&
 
 	# Check that the directories outside of the sparse-checkout cone
@@ -228,7 +228,9 @@ test_sparse_checkout_set () {
 	# are not sparse directory entries.
 	for dir in $CONE_DIRS
 	do
-		TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
+		# Allow TREE to not exist because
+		# $dir does not exist at HEAD.
+		TREE=$(git -C sparse-index rev-parse HEAD:$dir) ||
 		! grep "040000 $TREE 0	$dir/" cache \
 			|| return 1
 	done
@@ -253,6 +255,19 @@ test_expect_success 'sparse-index contents' '
 		"deep/deeper2 folder1 folder2 x" \
 		"before deep/deeper1" &&
 
+	# Replace deep/deeper2 with deep/deeper1
+	# Replace folder1 with folder1/0/0
+	# Replace folder2 with non-existent folder2/2/3
+	# Add non-existent "bogus"
+	test_sparse_checkout_set \
+		"bogus deep/deeper1 folder1/0/0 folder2/2/3 x" \
+		"before deep/deeper2 folder2/0" &&
+
+	# Drop down to only files at root
+	test_sparse_checkout_set \
+		"" \
+		"before deep folder1 folder2 x" &&
+
 	# Disabling the sparse-index replaces tree entries with full ones
 	git -C sparse-index sparse-checkout init --no-sparse-index &&
 	test_sparse_match git ls-files --stage --sparse
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 03/10] sparse-index: create expand_index()
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

This is the first change in a series to allow modifying the
sparse-checkout pattern set without expanding a sparse index to a full
one in the process. Here, we focus on the problem of expanding the
pattern set through a command like 'git sparse-checkout add <path>'
which needs to create new index entries for the paths now being written
to the worktree.

To achieve this, we need to be able to replace sparse directory entries
with their contained files and subdirectories. Once this is complete,
other code paths can discover those cache entries and write the
corresponding files to disk before committing the index.

We already have logic in ensure_full_index() that expands the index
entries, so we will use that as our base. Create a new method,
expand_index(), which takes a pattern list, but for now mostly ignores
it. The current implementation is only correct when the pattern list is
NULL as that does the same as ensure_full_index(). In fact,
ensure_full_index() is converted to a shim over expand_index().

A future update will actually implement expand_index() to its full
capabilities. For now, it is created and documented.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 32 +++++++++++++++++++++++++++++---
 sparse-index.h | 13 +++++++++++++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 8636af72de5..a11b5cf1314 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -248,19 +248,40 @@ static int add_path_to_index(const struct object_id *oid,
 	return 0;
 }
 
-void ensure_full_index(struct index_state *istate)
+void expand_index(struct index_state *istate, struct pattern_list *pl)
 {
 	int i;
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
+	const char *tr_region;
 
+	/*
+	 * If the index is already full, then keep it full. We will convert
+	 * it to a sparse index on write, if possible.
+	 */
 	if (!istate || !istate->sparse_index)
 		return;
 
+	/*
+	 * If our index is sparse, but our new pattern set does not use
+	 * cone mode patterns, then we need to expand the index before we
+	 * continue. A NULL pattern set indicates a full expansion to a
+	 * full index.
+	 */
+	if (pl && !pl->use_cone_patterns)
+		pl = NULL;
+
 	if (!istate->repo)
 		istate->repo = the_repository;
 
-	trace2_region_enter("index", "ensure_full_index", istate->repo);
+	/*
+	 * A NULL pattern set indicates we are expanding a full index, so
+	 * we use a special region name that indicates the full expansion.
+	 * This is used by test cases, but also helps to differentiate the
+	 * two cases.
+	 */
+	tr_region = pl ? "expand_index" : "ensure_full_index";
+	trace2_region_enter("index", tr_region, istate->repo);
 
 	/* initialize basics of new index */
 	full = xcalloc(1, sizeof(struct index_state));
@@ -322,7 +343,12 @@ void ensure_full_index(struct index_state *istate)
 	cache_tree_free(&istate->cache_tree);
 	cache_tree_update(istate, 0);
 
-	trace2_region_leave("index", "ensure_full_index", istate->repo);
+	trace2_region_leave("index", tr_region, istate->repo);
+}
+
+void ensure_full_index(struct index_state *istate)
+{
+	expand_index(istate, NULL);
 }
 
 void ensure_correct_sparsity(struct index_state *istate)
diff --git a/sparse-index.h b/sparse-index.h
index 633d4fb7e31..b1f2cdbb164 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -23,4 +23,17 @@ void expand_to_path(struct index_state *istate,
 struct repository;
 int set_sparse_index_config(struct repository *repo, int enable);
 
+struct pattern_list;
+
+/**
+ * Scan the given index and compare its entries to the given pattern list.
+ * If the index is sparse and the pattern list uses cone mode patterns,
+ * then modify the index to contain the all of the file entries within that
+ * new pattern list. This expands sparse directories only as far as needed.
+ *
+ * If the pattern list is NULL or does not use cone mode patterns, then the
+ * index is expanded to a full index.
+ */
+void expand_index(struct index_state *istate, struct pattern_list *pl);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 03/10] sparse-index: create expand_index() Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* INDEX_EXPANDED (0): No sparse directories exist. This is always the
  case for repositories that do not use cone-mode sparse-checkout.

* INDEX_COLLAPSED: Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
  entries outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
INDEX_EXPANDED mode but to actually do the necessary work when in
INDEX_PARTIALLY_SPARSE mode.

The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c |  2 +-
 cache.h                   | 33 +++++++++++++++++++++++++--------
 read-cache.c              |  6 +++---
 sparse-index.c            |  6 +++---
 4 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 0217d44c5b1..5b054400bf3 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -128,7 +128,7 @@ static void clean_tracked_sparse_directories(struct repository *r)
 	 * sparse index will not delete directories that contain
 	 * conflicted entries or submodules.
 	 */
-	if (!r->index->sparse_index) {
+	if (r->index->sparse_index == INDEX_EXPANDED) {
 		/*
 		 * If something, such as a merge conflict or other concern,
 		 * prevents us from converting to a sparse index, then do
diff --git a/cache.h b/cache.h
index 6226f6a8a53..e171ce882a2 100644
--- a/cache.h
+++ b/cache.h
@@ -310,6 +310,29 @@ struct untracked_cache;
 struct progress;
 struct pattern_list;
 
+enum sparse_index_mode {
+	/*
+	 * There are no sparse directories in the index at all.
+	 *
+	 * Repositories that don't use cone-mode sparse-checkout will
+	 * always have their indexes in this mode.
+	 */
+	INDEX_EXPANDED = 0,
+
+	/*
+	 * The index has already been collapsed to sparse directories
+	 * whereever possible.
+	 */
+	INDEX_COLLAPSED,
+
+	/*
+	 * The sparse directories that exist are outside the
+	 * sparse-checkout boundary, but it is possible that some file
+	 * entries could collapse to sparse directory entries.
+	 */
+	INDEX_PARTIALLY_SPARSE,
+};
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -323,14 +346,8 @@ struct index_state {
 		 drop_cache_tree : 1,
 		 updated_workdir : 1,
 		 updated_skipworktree : 1,
-		 fsmonitor_has_run_once : 1,
-
-		 /*
-		  * sparse_index == 1 when sparse-directory
-		  * entries exist. Requires sparse-checkout
-		  * in cone mode.
-		  */
-		 sparse_index : 1;
+		 fsmonitor_has_run_once : 1;
+	enum sparse_index_mode sparse_index;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	struct object_id oid;
diff --git a/read-cache.c b/read-cache.c
index 4df97e185e9..b236042eee1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -112,7 +112,7 @@ static const char *alternate_index_output;
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	if (S_ISSPARSEDIR(ce->ce_mode))
-		istate->sparse_index = 1;
+		istate->sparse_index = INDEX_COLLAPSED;
 
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
@@ -1856,7 +1856,7 @@ static int read_index_extension(struct index_state *istate,
 		break;
 	case CACHE_EXT_SPARSE_DIRECTORIES:
 		/* no content, only an indicator */
-		istate->sparse_index = 1;
+		istate->sparse_index = INDEX_COLLAPSED;
 		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
@@ -3149,7 +3149,7 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 				 unsigned flags)
 {
 	int ret;
-	int was_full = !istate->sparse_index;
+	int was_full = istate->sparse_index == INDEX_EXPANDED;
 
 	ret = convert_to_sparse(istate, 0);
 
diff --git a/sparse-index.c b/sparse-index.c
index a11b5cf1314..7848910c154 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -173,7 +173,7 @@ int convert_to_sparse(struct index_state *istate, int flags)
 	 * If the index is already sparse, empty, or otherwise
 	 * cannot be converted to sparse, do not convert.
 	 */
-	if (istate->sparse_index || !istate->cache_nr ||
+	if (istate->sparse_index == INDEX_COLLAPSED || !istate->cache_nr ||
 	    !is_sparse_index_allowed(istate, flags))
 		return 0;
 
@@ -214,7 +214,7 @@ int convert_to_sparse(struct index_state *istate, int flags)
 	FREE_AND_NULL(istate->fsmonitor_dirty);
 	FREE_AND_NULL(istate->fsmonitor_last_update);
 
-	istate->sparse_index = 1;
+	istate->sparse_index = INDEX_COLLAPSED;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
 }
@@ -259,7 +259,7 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 	 * If the index is already full, then keep it full. We will convert
 	 * it to a sparse index on write, if possible.
 	 */
-	if (!istate || !istate->sparse_index)
+	if (!istate || istate->sparse_index == INDEX_EXPANDED)
 		return;
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 05/10] cache-tree: implement cache_tree_find_path()
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.

The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way. The
method could be implemented simply using tail-recursion, but this while
loop does the same thing.

This new method is not currently used, but will be in an upcoming
change.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 cache-tree.c | 27 +++++++++++++++++++++++++++
 cache-tree.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index 6752f69d515..f42db920d10 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -100,6 +100,33 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *it, const char *path)
 	return find_subtree(it, path, pathlen, 1);
 }
 
+struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path)
+{
+	const char *slash;
+	int namelen;
+	struct cache_tree_sub it_sub = {
+		.cache_tree = it,
+	};
+	struct cache_tree_sub *down = &it_sub;
+
+	while (down) {
+		slash = strchrnul(path, '/');
+		namelen = slash - path;
+		down->cache_tree->entry_count = -1;
+		if (!*slash) {
+			int pos;
+			pos = cache_tree_subtree_pos(down->cache_tree, path, namelen);
+			if (0 <= pos)
+				return down->cache_tree->down[pos]->cache_tree;
+			return NULL;
+		}
+		down = find_subtree(it, path, namelen, 0);
+		path = slash + 1;
+	}
+
+	return NULL;
+}
+
 static int do_invalidate_path(struct cache_tree *it, const char *path)
 {
 	/* a/b/c
diff --git a/cache-tree.h b/cache-tree.h
index 8efeccebfc9..f75f8e74dcd 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -29,6 +29,8 @@ struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
 
 int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen);
 
+struct cache_tree *cache_tree_find_path(struct cache_tree *it, const char *path);
+
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When the --no-sparse-index option is supplied, the sparse-checkout
builtin should explicitly ask to expand a sparse index to a full one.
This is currently done implicitly due to the command_requires_full_index
protection, but that will be removed in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 5b054400bf3..1db51c3fd72 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -413,6 +413,9 @@ static int update_modes(int *cone_mode, int *sparse_index)
 		/* force an index rewrite */
 		repo_read_index(the_repository);
 		the_repository->index->updated_workdir = 1;
+
+		if (!*sparse_index)
+			ensure_full_index(the_repository->index);
 	}
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 07/10] sparse-index: partially expand directories
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate->sparse_checkout_patterns.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 57 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 7 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 7848910c154..a881f851810 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -9,6 +9,11 @@
 #include "dir.h"
 #include "fsmonitor.h"
 
+struct modify_index_context {
+	struct index_state *write;
+	struct pattern_list *pl;
+};
+
 static struct cache_entry *construct_sparse_dir_entry(
 				struct index_state *istate,
 				const char *sparse_dir,
@@ -231,18 +236,52 @@ static int add_path_to_index(const struct object_id *oid,
 			     struct strbuf *base, const char *path,
 			     unsigned int mode, void *context)
 {
-	struct index_state *istate = (struct index_state *)context;
+	struct modify_index_context *ctx = (struct modify_index_context *)context;
 	struct cache_entry *ce;
 	size_t len = base->len;
 
-	if (S_ISDIR(mode))
-		return READ_TREE_RECURSIVE;
+	if (S_ISDIR(mode)) {
+		int dtype;
+		size_t baselen = base->len;
+		if (!ctx->pl)
+			return READ_TREE_RECURSIVE;
 
-	strbuf_addstr(base, path);
+		/*
+		 * Have we expanded to a point outside of the sparse-checkout?
+		 *
+		 * Artificially pad the path name with a slash "/" to
+		 * indicate it as a directory, and add an arbitrary file
+		 * name ("-") so we can consider base->buf as a file name
+		 * to match against the cone-mode patterns.
+		 *
+		 * If we compared just "path", then we would expand more
+		 * than we should. Since every file at root is always
+		 * included, we would expand every directory at root at
+		 * least one level deep instead of using sparse directory
+		 * entries.
+		 */
+		strbuf_addstr(base, path);
+		strbuf_add(base, "/-", 2);
+
+		if (path_matches_pattern_list(base->buf, base->len,
+					      NULL, &dtype,
+					      ctx->pl, ctx->write)) {
+			strbuf_setlen(base, baselen);
+			return READ_TREE_RECURSIVE;
+		}
 
-	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
+		/*
+		 * The path "{base}{path}/" is a sparse directory. Create the correct
+		 * name for inserting the entry into the index.
+		 */
+		strbuf_setlen(base, base->len - 1);
+	} else {
+		strbuf_addstr(base, path);
+	}
+
+	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
 	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
-	set_index_entry(istate, istate->cache_nr++, ce);
+	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
 	return 0;
@@ -254,6 +293,7 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
 	const char *tr_region;
+	struct modify_index_context ctx;
 
 	/*
 	 * If the index is already full, then keep it full. We will convert
@@ -293,6 +333,9 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
 
+	ctx.write = full;
+	ctx.pl = pl;
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
@@ -318,7 +361,7 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 		strbuf_add(&base, ce->name, strlen(ce->name));
 
 		read_tree_at(istate->repo, tree, &base, &ps,
-			     add_path_to_index, full);
+			     add_path_to_index, &ctx);
 
 		/* free directory entries. full entries are re-used */
 		discard_cache_entry(ce);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 08/10] sparse-index: complete partial expansion
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

To complete the implementation of expand_to_pattern_list(), we need to
detect when a sparse directory entry should remain sparse. This avoids a
full expansion, so we now need to use the PARTIALLY_SPARSE mode to
indicate this state.

There still are no callers to this method, but we will add one in the
next change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index a881f851810..2c0a18380f1 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -308,8 +308,24 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 	 * continue. A NULL pattern set indicates a full expansion to a
 	 * full index.
 	 */
-	if (pl && !pl->use_cone_patterns)
+	if (pl && !pl->use_cone_patterns) {
 		pl = NULL;
+	} else {
+		/*
+		 * We might contract file entries into sparse-directory
+		 * entries, and for that we will need the cache tree to
+		 * be recomputed.
+		 */
+		cache_tree_free(&istate->cache_tree);
+
+		/*
+		 * If there is a problem creating the cache tree, then we
+		 * need to expand to a full index since we cannot satisfy
+		 * the current request as a sparse index.
+		 */
+		if (cache_tree_update(istate, 0))
+			pl = NULL;
+	}
 
 	if (!istate->repo)
 		istate->repo = the_repository;
@@ -327,8 +343,14 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 	full = xcalloc(1, sizeof(struct index_state));
 	memcpy(full, istate, sizeof(struct index_state));
 
+	/*
+	 * This slightly-misnamed 'full' index might still be sparse if we
+	 * are only modifying the list of sparse directories. This hinges
+	 * on whether we have a non-NULL pattern list.
+	 */
+	full->sparse_index = pl ? INDEX_PARTIALLY_SPARSE : INDEX_EXPANDED;
+
 	/* then change the necessary things */
-	full->sparse_index = 0;
 	full->cache_alloc = (3 * istate->cache_alloc) / 2;
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
@@ -340,11 +362,22 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
 		struct pathspec ps;
+		int dtype;
 
 		if (!S_ISSPARSEDIR(ce->ce_mode)) {
 			set_index_entry(full, full->cache_nr++, ce);
 			continue;
 		}
+
+		/* We now have a sparse directory entry. Should we expand? */
+		if (pl &&
+		    path_matches_pattern_list(ce->name, ce->ce_namelen,
+					      NULL, &dtype,
+					      pl, istate) == NOT_MATCHED) {
+			set_index_entry(full, full->cache_nr++, ce);
+			continue;
+		}
+
 		if (!(ce->ce_flags & CE_SKIP_WORKTREE))
 			warning(_("index entry is a directory, but not sparse (%08x)"),
 				ce->ce_flags);
@@ -370,7 +403,7 @@ void expand_index(struct index_state *istate, struct pattern_list *pl)
 	/* Copy back into original index. */
 	memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash));
 	memcpy(&istate->dir_hash, &full->dir_hash, sizeof(full->dir_hash));
-	istate->sparse_index = 0;
+	istate->sparse_index = pl ? INDEX_PARTIALLY_SPARSE : INDEX_EXPANDED;
 	free(istate->cache);
 	istate->cache = full->cache;
 	istate->cache_nr = full->cache_nr;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]'
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  2022-05-23 13:48     ` [PATCH v3 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The sparse-checkout builtin is almost completely integrated with the
sparse index, allowing the sparse-checkout boundary to be modified
without expanding a sparse index to a full one. Add a test to
p2000-sparse-operations.sh that adds a directory to the sparse-checkout
definition, then removes it. Using both operations is important to
ensure that the operation is doing the same work in each repetition as
well as leaving the test repo in a good state for later tests.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/perf/p2000-sparse-operations.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 382716cfca9..ce5cfac5714 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -110,6 +110,7 @@ test_perf_on_all git add -A
 test_perf_on_all git add .
 test_perf_on_all git commit -a -m A
 test_perf_on_all git checkout -f -
+test_perf_on_all "git sparse-checkout add f2/f3/f1 && git sparse-checkout set $SPARSE_CONE"
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 10/10] sparse-checkout: integrate with sparse index
  2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2022-05-23 13:48     ` [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
@ 2022-05-23 13:48     ` Derrick Stolee via GitGitGadget
  9 siblings, 0 replies; 55+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-05-23 13:48 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.

Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.

This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.

We can see the improved performance in the p2000 test script:

Test                           HEAD~1            HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3)   2.14(1.55+0.58)   1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4)   2.20(1.62+0.57)   1.58(0.98+0.59) -28.2%

These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c                |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++++
 unpack-trees.c                           |  4 ++++
 3 files changed, 32 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 1db51c3fd72..67d1d146de5 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -937,6 +937,9 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (argc > 0) {
 		if (!strcmp(argv[0], "list"))
 			return sparse_checkout_list(argc, argv);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 785820f9fd5..73f4cf47314 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1584,6 +1584,31 @@ test_expect_success 'ls-files' '
 	ensure_not_expanded ls-files --sparse
 '
 
+test_expect_success 'sparse index is not expanded: sparse-checkout' '
+	init_repos &&
+
+	ensure_not_expanded sparse-checkout set deep/deeper2 &&
+	ensure_not_expanded sparse-checkout set deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set deep &&
+	ensure_not_expanded sparse-checkout add folder1 &&
+	ensure_not_expanded sparse-checkout set deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set folder2 &&
+
+	# Demonstrate that the checks that "folder1/a" is a file
+	# do not cause a sparse-index expansion (since it is in the
+	# sparse-checkout cone).
+	echo >>sparse-index/folder2/a &&
+	git -C sparse-index add folder2/a &&
+
+	ensure_not_expanded sparse-checkout add folder1 &&
+
+	# Skip checks here, since deep/deeper1 is inside a sparse directory
+	# that must be expanded to check whether `deep/deeper1` is a file
+	# or not.
+	ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 &&
+	ensure_not_expanded sparse-checkout set
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
diff --git a/unpack-trees.c b/unpack-trees.c
index 7f528d35cc2..8908b27c03e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -18,6 +18,7 @@
 #include "promisor-remote.h"
 #include "entry.h"
 #include "parallel-checkout.h"
+#include "sparse-index.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -2018,6 +2019,9 @@ enum update_sparsity_result update_sparsity(struct unpack_trees_options *o)
 			goto skip_sparse_checkout;
 	}
 
+	/* Expand sparse directories as needed */
+	expand_index(o->src_index, o->pl);
+
 	/* Set NEW_SKIP_WORKTREE on existing entries. */
 	mark_all_ce_unused(o->src_index);
 	mark_new_skip_worktree(o->pl, o->src_index, 0,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-23 13:18         ` Derrick Stolee
@ 2022-05-23 18:01           ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2022-05-23 18:01 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, vdye, shaoxuan.yuan02,
	Derrick Stolee

Derrick Stolee <derrickstolee@github.com> writes:

> As I went to remove these from sparse-index.c, I found another that
> exists, but for good reason. See 8a96b9d0a (sparse-index: use
> WRITE_TREE_MISSING_OK, 2021-09-08), which explains that that instance
> needs the flag because it could be used during 'git add'.

Hmph, re-reading that explanation, it appears to me that it is
working around a lack of another option that could be useful in such
a situation, namely, "update the cache tree, and if a tree object is
missing then create one locally; do not bother asking any promisor
remote".  I am reasonably sure that back when I invented missing-ok
there wasn't any such thing as "promisor remote" or "lazy fetch",
and it sounds like those who added "promissor remote" broke the
cache-tree subsystem by introducing calls to fetch from elsewhere?

How does a normal "new object creation" codepath work when promisor
remotes exist?  Does write_object_file() first ask if any promisor
remote have that object and fetch before creating one locally?  That
sounds absurd (i.e. if we have data to create locally why ask others
if they have one already?).  Why should cache-tree behave any
differently?

All of the above are mere observations on changes that have nothing
to do with the change in this series, but cache_tree_update()
knowing about promisor remotes feels serious breakage to me.

> It still isn't necessary in the instances being added here, so I will
> remove them.

OK.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-23 13:13       ` Derrick Stolee
  2022-05-23 13:18         ` Derrick Stolee
@ 2022-05-23 22:48         ` Junio C Hamano
  2022-05-25 14:26           ` Derrick Stolee
  1 sibling, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2022-05-23 22:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, vdye, shaoxuan.yuan02,
	Derrick Stolee

Derrick Stolee <derrickstolee@github.com> writes:

>> I suspect that is a situation that is not so uncommon.  Working
>> inside a narrow cone of a wide tree, performing a merge would
>> hopefully allow many subtrees that are outside of the cones of our
>> interest merged without getting expanded at all (e.g. only the other
>> side touched these subtrees we are not working on, so their version
>> will become the merge result), while changes to some paths in the
>> cone of our interest may result in true conflicts represented as
>> cache entries at higher stages, needing conflict resolution
>> concluded with "git add".  Having to expand these subtrees that we
>> managed to merge while still collapsed, only because we have
>> conflicts in some other parts of the tree, feels somewhat sad.
>
> You are correct that conflicts outside of the sparse-checkout cone will
> cause index expansion. That happens during the 'git merge' command, but
> the index will continue to fail to collapse as long as those conflicts
> still exist in the index.
>
> When there are conflicts like this during the merge, then the index
> expansion is not as large of a portion of the command as normal, because
> the conflict resolution also takes some time to compute. The commands
> afterwards do take longer purely because of the expanded index.

I was imagining a situation more like "tech-writers only have
Documentation/ inside the cone of interest, attempt a pull from
somebody else, have conflicts inside Documentation/, but everything
else could be resolved cleanly without expanding the index".  If the
puller's tree is based on the pristine upstream release tag, and the
pullee's tree is based on a slightly newer version of upstream
snapshot, everything that happened outside Documentation/ in their
trees would fast-forward, so such a merge wouldn't have to expand
directories like "builtin/" or "contrib/" in the index and instead
can merge at the tree level, right?

On the other hand, ...

> However, this state is also not as common as you might think. If a user
> has a sparse-checkout cone specified, then they are unlikely to change
> files outside of the sparse-checkout cone. They would not be the reason
> that those files have a conflict. The conflicts would exist only if they
> are merging branches that had conflicts outside of the cone. Typically,
> any merge of external changes like this are of the form of "git pull" or
> "git rebase", in which case the conflicts are still "local" to the
> developer's changes.

... you seem to be talking about the opposite case (e.g. in the
above paragraph), where a conflict happens outside the cone of
interest of the person who is making a merge.  So, I am a bit
puzzled.

> You are right that there is additional work that could be done here,
> specifically allowing the cache tree to partially succeed and use the
> successfully generated trees to create sparse directory entries where
> possible. This was not viable before because we lacked the "partially
> expanded" index state. This series establishes the necessary vocabulary to
> do such an improvement later.
>> By the way, why are we passing the "--missing-ok" option to "git
>> write-tree" here?
>> 
>>> +	cache_tree_update(istate, WRITE_TREE_MISSING_OK);
>> 
>> The same question here.  We didn't say "missing trees are OK".  What
>> made it OK in this change?
>  
> Both of these additions of WRITE_TREE_MISSING_OK are not needed. I
> think I added them in an earlier version, thinking they were needed
> due to something in the Scalar functional tests. I confirmed just now
> that they are not needed for that. I will remove them.
>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-23 22:48         ` Junio C Hamano
@ 2022-05-25 14:26           ` Derrick Stolee
  2022-05-25 16:32             ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Derrick Stolee @ 2022-05-25 14:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, vdye, shaoxuan.yuan02,
	Derrick Stolee

On 5/23/2022 6:48 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>>> I suspect that is a situation that is not so uncommon.  Working
>>> inside a narrow cone of a wide tree, performing a merge would
>>> hopefully allow many subtrees that are outside of the cones of our
>>> interest merged without getting expanded at all (e.g. only the other
>>> side touched these subtrees we are not working on, so their version
>>> will become the merge result), while changes to some paths in the
>>> cone of our interest may result in true conflicts represented as
>>> cache entries at higher stages, needing conflict resolution
>>> concluded with "git add".  Having to expand these subtrees that we
>>> managed to merge while still collapsed, only because we have
>>> conflicts in some other parts of the tree, feels somewhat sad.
>>
>> You are correct that conflicts outside of the sparse-checkout cone will
>> cause index expansion. That happens during the 'git merge' command, but
>> the index will continue to fail to collapse as long as those conflicts
>> still exist in the index.
>>
>> When there are conflicts like this during the merge, then the index
>> expansion is not as large of a portion of the command as normal, because
>> the conflict resolution also takes some time to compute. The commands
>> afterwards do take longer purely because of the expanded index.
> 
> I was imagining a situation more like "tech-writers only have
> Documentation/ inside the cone of interest, attempt a pull from
> somebody else, have conflicts inside Documentation/, but everything
> else could be resolved cleanly without expanding the index".  If the
> puller's tree is based on the pristine upstream release tag, and the
> pullee's tree is based on a slightly newer version of upstream
> snapshot, everything that happened outside Documentation/ in their
> trees would fast-forward, so such a merge wouldn't have to expand
> directories like "builtin/" or "contrib/" in the index and instead
> can merge at the tree level, right?
> 
> On the other hand, ...
> 
>> However, this state is also not as common as you might think. If a user
>> has a sparse-checkout cone specified, then they are unlikely to change
>> files outside of the sparse-checkout cone. They would not be the reason
>> that those files have a conflict. The conflicts would exist only if they
>> are merging branches that had conflicts outside of the cone. Typically,
>> any merge of external changes like this are of the form of "git pull" or
>> "git rebase", in which case the conflicts are still "local" to the
>> developer's changes.
> 
> ... you seem to be talking about the opposite case (e.g. in the
> above paragraph), where a conflict happens outside the cone of
> interest of the person who is making a merge.  So, I am a bit
> puzzled.

Hm. We must be getting mixed up with each other. Let me try again
from the beginning.

When the conflict happens inside of the sparse-checkout cone,
then the sparse index is not expanded. This is checked by the
test 'sparse-index is not expanded: merge conflict in cone' in
t1092.

Most merges get to "fast forward" changes that are outside of
the sparse-checkout cone because the only interesting changes
that could lead to a conflict are those created by the current
user (and hence within the sparse-checkout cone).

So, the typical case of a tech writer only editing "Documentation/"
should only see conflicts within "Documentation/".

The case where we would see conflicts outside of the cone include
cases where long-lived branches are being merged by someone with
a small cone. I can imagine an automated process using an empty
sparse-checkout cone to occasionally merge a "deployed" and a
"develop" branch, and it gets conflicts when someone ships a
hotfix directly to "deployed" without first going through "develop".
Any conflict is likely to cause index expansion in this case.

Let's re-introduce the patch section we are talking about:

+	if (pl && !pl->use_cone_patterns) {
 		pl = NULL;
+	} else {
+		/*
+		 * We might contract file entries into sparse-directory
+		 * entries, and for that we will need the cache tree to
+		 * be recomputed.
+		 */
+		cache_tree_free(&istate->cache_tree);
+
+		/*
+		 * If there is a problem creating the cache tree, then we
+		 * need to expand to a full index since we cannot satisfy
+		 * the current request as a sparse index.
+		 */
+		if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
+			pl = NULL;
+	}

If a user is in a conflict state and modifies their sparse-checkout
cone, then we will hit this "recompute the cache-tree" state, fail,
and cause full index expansion. I think that combination (have a
conflict AND modify sparse-checkout cone) is rare enough to not
optimize for (yet).

Does that make the situation more clear?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 08/10] sparse-index: complete partial expansion
  2022-05-25 14:26           ` Derrick Stolee
@ 2022-05-25 16:32             ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2022-05-25 16:32 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, vdye, shaoxuan.yuan02,
	Derrick Stolee

Derrick Stolee <derrickstolee@github.com> writes:

> If a user is in a conflict state and modifies their sparse-checkout
> cone, then we will hit this "recompute the cache-tree" state, fail,
> and cause full index expansion. I think that combination (have a
> conflict AND modify sparse-checkout cone) is rare enough to not
> optimize for (yet).
>
> Does that make the situation more clear?

Yes.  "AND modify sparse-checkout cone" part was what I was missing.
I didn't see it because it wasn't there at all in the code that was
removed or added---it comes from the fact that the user is running
the "sparse-checkout" command in the first place, and that was what
I missed.

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2022-05-25 16:32 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16 18:11 [PATCH 0/8] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 1/8] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-16 20:36   ` Victoria Dye
2022-05-16 20:49     ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 2/8] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 3/8] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 4/8] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 5/8] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-16 20:36   ` Victoria Dye
2022-05-16 18:11 ` [PATCH 6/8] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-16 20:38   ` Victoria Dye
2022-05-17 13:23     ` Derrick Stolee
2022-05-16 18:11 ` [PATCH 7/8] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-16 18:11 ` [PATCH 8/8] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-16 20:38   ` Victoria Dye
2022-05-17 13:28     ` Derrick Stolee
2022-05-19 17:52 ` [PATCH v2 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 03/10] sparse-index: create expand_to_pattern_list() Derrick Stolee via GitGitGadget
2022-05-19 19:50     ` Junio C Hamano
2022-05-20 18:01       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-19 20:05     ` Junio C Hamano
2022-05-20 18:05       ` Derrick Stolee
2022-05-20 18:23         ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-19 20:14     ` Junio C Hamano
2022-05-20 18:13       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-19 20:19     ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-20 18:17     ` Junio C Hamano
2022-05-20 18:33       ` Derrick Stolee
2022-05-19 17:52   ` [PATCH v2 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-21  7:45     ` Junio C Hamano
2022-05-23 13:13       ` Derrick Stolee
2022-05-23 13:18         ` Derrick Stolee
2022-05-23 18:01           ` Junio C Hamano
2022-05-23 22:48         ` Junio C Hamano
2022-05-25 14:26           ` Derrick Stolee
2022-05-25 16:32             ` Junio C Hamano
2022-05-19 17:52   ` [PATCH v2 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-19 17:52   ` [PATCH v2 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget
2022-05-23 13:48   ` [PATCH v3 00/10] Sparse index: integrate with sparse-checkout Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 01/10] t1092: refactor 'sparse-index contents' test Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 02/10] t1092: stress test 'git sparse-checkout set' Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 03/10] sparse-index: create expand_index() Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 04/10] sparse-index: introduce partially-sparse indexes Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 05/10] cache-tree: implement cache_tree_find_path() Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 06/10] sparse-checkout: --no-sparse-index needs a full index Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 07/10] sparse-index: partially expand directories Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 08/10] sparse-index: complete partial expansion Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 09/10] p2000: add test for 'git sparse-checkout [add|set]' Derrick Stolee via GitGitGadget
2022-05-23 13:48     ` [PATCH v3 10/10] sparse-checkout: integrate with sparse index Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).