git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/27] Sparse Index: API protections
@ 2021-03-16 21:16 Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
                   ` (28 more replies)
  0 siblings, 29 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee

Here is the second patch series submission coming out of the sparse-index
RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/

This is based on v3 of the format series [2].

[2]
https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@gmail.com/

The point of this series is to insert protections for the consumers of the
in-memory index. We mark certain regions of code as needing a full index, so
we call ensure_full_index() to expand a sparse index to a full one, if
necessary. These protections are inserted file-by-file in every loop over
all cache entries. Well, "most" loops, because some are going to be handled
in the very next series so I leave them out.

Many callers use index_name_pos() to find a path by name. In these cases, we
can check if that position resolves to a sparse directory instance. In those
cases, we just expand to a full index and run the search again.

The last few patches deal with the name-hash hashtable for doing O(1)
lookups.

These protections don't do much right now, since the previous series created
the_repository->settings.command_requires_full_index to guard all index
reads and writes to ensure the in-memory copy is full for commands that have
not been tested with the sparse index yet.

However, after this series is complete, we now have a straight-forward plan
for making commands "sparse aware" one-by-one:

 1. Disable settings.command_requires_full_index to allow an in-memory
    sparse-index.
 2. Run versions of that command under a debugger, breaking on
    ensure_full_index().
 3. Examine the call stack to determine the context of that expansion, then
    implement the proper behavior in those locations.
 4. Add tests to ensure we are checking this logic in the presence of sparse
    directory entries.

I will admit that mostly it is the writing of the test cases that takes the
most time in the conversions I've done so far.

Thanks, -Stolee

Derrick Stolee (27):
  *: remove 'const' qualifier for struct index_state
  read-cache: expand on query into sparse-directory entry
  sparse-index: API protection strategy
  cache: move ensure_full_index() to cache.h
  add: ensure full index
  checkout-index: ensure full index
  checkout: ensure full index
  commit: ensure full index
  difftool: ensure full index
  fsck: ensure full index
  grep: ensure full index
  ls-files: ensure full index
  merge-index: ensure full index
  rm: ensure full index
  sparse-checkout: ensure full index
  update-index: ensure full index
  diff-lib: ensure full index
  dir: ensure full index
  entry: ensure full index
  merge-ort: ensure full index
  merge-recursive: ensure full index
  pathspec: ensure full index
  read-cache: ensure full index
  resolve-undo: ensure full index
  revision: ensure full index
  sparse-index: expand_to_path()
  name-hash: use expand_to_path()

 Documentation/technical/sparse-index.txt | 32 ++++++++++-
 attr.c                                   | 14 ++---
 attr.h                                   |  4 +-
 builtin/add.c                            |  1 +
 builtin/checkout-index.c                 |  1 +
 builtin/checkout.c                       |  2 +
 builtin/commit.c                         |  2 +
 builtin/difftool.c                       |  2 +
 builtin/fsck.c                           |  1 +
 builtin/grep.c                           |  1 +
 builtin/ls-files.c                       | 12 ++--
 builtin/merge-index.c                    |  3 +
 builtin/rm.c                             |  1 +
 builtin/stash.c                          |  1 +
 builtin/update-index.c                   |  1 +
 cache.h                                  |  7 ++-
 convert.c                                | 26 ++++-----
 convert.h                                | 20 +++----
 diff-lib.c                               |  1 +
 dir.c                                    | 13 +++--
 dir.h                                    |  8 +--
 entry.c                                  |  1 +
 merge-ort.c                              |  1 +
 merge-recursive.c                        |  3 +-
 name-hash.c                              | 10 ++++
 pathspec.c                               |  7 ++-
 pathspec.h                               |  6 +-
 read-cache.c                             | 33 +++++++++--
 resolve-undo.c                           |  2 +
 revision.c                               |  1 +
 sparse-index.c                           | 72 ++++++++++++++++++++++++
 sparse-index.h                           | 14 ++++-
 submodule.c                              |  6 +-
 submodule.h                              |  6 +-
 34 files changed, 243 insertions(+), 72 deletions(-)


base-commit: 3db06ac46dd5c61e83d7fc4747615d616fdbbdda
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/906
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 01/27] *: remove 'const' qualifier for struct index_state
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-19 21:01   ` Junio C Hamano
  2021-03-16 21:16 ` [PATCH 02/27] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.

This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 attr.c             | 14 +++++++-------
 attr.h             |  4 ++--
 builtin/ls-files.c | 10 +++++-----
 cache.h            |  6 +++---
 convert.c          | 26 +++++++++++++-------------
 convert.h          | 20 ++++++++++----------
 dir.c              | 12 ++++++------
 dir.h              |  8 ++++----
 merge-recursive.c  |  2 +-
 pathspec.c         |  6 +++---
 pathspec.h         |  6 +++---
 read-cache.c       | 10 +++++-----
 submodule.c        |  6 +++---
 submodule.h        |  6 +++---
 14 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/attr.c b/attr.c
index 4ef85d668b54..8de553293ee4 100644
--- a/attr.c
+++ b/attr.c
@@ -718,7 +718,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 	return res;
 }
 
-static struct attr_stack *read_attr_from_index(const struct index_state *istate,
+static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path,
 					       int macro_ok)
 {
@@ -748,7 +748,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
 	return res;
 }
 
-static struct attr_stack *read_attr(const struct index_state *istate,
+static struct attr_stack *read_attr(struct index_state *istate,
 				    const char *path, int macro_ok)
 {
 	struct attr_stack *res = NULL;
@@ -840,7 +840,7 @@ static void push_stack(struct attr_stack **attr_stack_p,
 	}
 }
 
-static void bootstrap_attr_stack(const struct index_state *istate,
+static void bootstrap_attr_stack(struct index_state *istate,
 				 struct attr_stack **stack)
 {
 	struct attr_stack *e;
@@ -878,7 +878,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
 	push_stack(stack, e, NULL, 0);
 }
 
-static void prepare_attr_stack(const struct index_state *istate,
+static void prepare_attr_stack(struct index_state *istate,
 			       const char *path, int dirlen,
 			       struct attr_stack **stack)
 {
@@ -1078,7 +1078,7 @@ static void determine_macros(struct all_attrs_item *all_attrs,
  * If check->check_nr is non-zero, only attributes in check[] are collected.
  * Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const struct index_state *istate,
+static void collect_some_attrs(struct index_state *istate,
 			       const char *path,
 			       struct attr_check *check)
 {
@@ -1107,7 +1107,7 @@ static void collect_some_attrs(const struct index_state *istate,
 	fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
 }
 
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
 		    const char *path,
 		    struct attr_check *check)
 {
@@ -1124,7 +1124,7 @@ void git_check_attr(const struct index_state *istate,
 	}
 }
 
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
 		   const char *path, struct attr_check *check)
 {
 	int i;
diff --git a/attr.h b/attr.h
index 404548f028a8..3732505edae8 100644
--- a/attr.h
+++ b/attr.h
@@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check);
  */
 const char *git_attr_name(const struct git_attr *);
 
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
 		    const char *path, struct attr_check *check);
 
 /*
  * Retrieve all attributes that apply to the specified path.
  * check holds the attributes and their values.
  */
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
 		   const char *path, struct attr_check *check);
 
 enum git_attr_direction {
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 13bcc2d84735..76a0d35c1a01 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -57,7 +57,7 @@ static const char *tag_modified = "";
 static const char *tag_skip_worktree = "";
 static const char *tag_resolve_undo = "";
 
-static void write_eolinfo(const struct index_state *istate,
+static void write_eolinfo(struct index_state *istate,
 			  const struct cache_entry *ce, const char *path)
 {
 	if (show_eol) {
@@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce)
 	}
 }
 
-static void show_dir_entry(const struct index_state *istate,
+static void show_dir_entry(struct index_state *istate,
 			   const char *tag, struct dir_entry *ent)
 {
 	int len = max_prefix_len;
@@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate,
 	write_name(ent->name);
 }
 
-static void show_other_files(const struct index_state *istate,
+static void show_other_files(struct index_state *istate,
 			     const struct dir_struct *dir)
 {
 	int i;
@@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate,
 	}
 }
 
-static void show_killed_files(const struct index_state *istate,
+static void show_killed_files(struct index_state *istate,
 			      const struct dir_struct *dir)
 {
 	int i;
@@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
 	}
 }
 
-static void show_ru_info(const struct index_state *istate)
+static void show_ru_info(struct index_state *istate)
 {
 	struct string_list_item *item;
 
diff --git a/cache.h b/cache.h
index 4ca6cd7f782c..f244e6ac90fd 100644
--- a/cache.h
+++ b/cache.h
@@ -800,7 +800,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  * index_name_pos(&index, "f", 1) -> -3
  * index_name_pos(&index, "g", 1) -> -5
  */
-int index_name_pos(const struct index_state *, const char *name, int namelen);
+int index_name_pos(struct index_state *, const char *name, int namelen);
 
 /*
  * Some functions return the negative complement of an insert position when a
@@ -850,8 +850,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags);
 int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
 int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
 void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
-int index_name_is_other(const struct index_state *, const char *, int);
-void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
+int index_name_is_other(struct index_state *, const char *, int);
+void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
 
 /* do stat comparison even if CE_VALID is true */
 #define CE_MATCH_IGNORE_VALID		01
diff --git a/convert.c b/convert.c
index ee360c2f07ce..20b19abffa36 100644
--- a/convert.c
+++ b/convert.c
@@ -138,7 +138,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
 	}
 }
 
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
 					   const char *path)
 {
 	const char *ret;
@@ -222,7 +222,7 @@ static void check_global_conv_flags_eol(const char *path,
 	}
 }
 
-static int has_crlf_in_index(const struct index_state *istate, const char *path)
+static int has_crlf_in_index(struct index_state *istate, const char *path)
 {
 	unsigned long sz;
 	void *data;
@@ -496,7 +496,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
 	return 1;
 }
 
-static int crlf_to_git(const struct index_state *istate,
+static int crlf_to_git(struct index_state *istate,
 		       const char *path, const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, int conv_flags)
@@ -1307,7 +1307,7 @@ struct conv_attrs {
 
 static struct attr_check *check;
 
-static void convert_attrs(const struct index_state *istate,
+static void convert_attrs(struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
 	struct attr_check_item *ccheck = NULL;
@@ -1369,7 +1369,7 @@ void reset_parsed_attributes(void)
 	user_convert_tail = NULL;
 }
 
-int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
+int would_convert_to_git_filter_fd(struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
 
@@ -1388,7 +1388,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char
 	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL);
 }
 
-const char *get_convert_attr_ascii(const struct index_state *istate, const char *path)
+const char *get_convert_attr_ascii(struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
 
@@ -1414,7 +1414,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char
 	return "";
 }
 
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
 		   const char *path, const char *src, size_t len,
 		   struct strbuf *dst, int conv_flags)
 {
@@ -1448,7 +1448,7 @@ int convert_to_git(const struct index_state *istate,
 	return ret | ident_to_git(src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
 			      const char *path, int fd, struct strbuf *dst,
 			      int conv_flags)
 {
@@ -1466,7 +1466,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 	ident_to_git(dst->buf, dst->len, dst, ca.ident);
 }
 
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(struct index_state *istate,
 					    const char *path, const char *src,
 					    size_t len, struct strbuf *dst,
 					    int normalizing,
@@ -1510,7 +1510,7 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
 	return ret | ret_filter;
 }
 
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
 				  const char *path, const char *src,
 				  size_t len, struct strbuf *dst,
 				  const struct checkout_metadata *meta,
@@ -1519,7 +1519,7 @@ int async_convert_to_working_tree(const struct index_state *istate,
 	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
 }
 
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
 			    const char *path, const char *src,
 			    size_t len, struct strbuf *dst,
 			    const struct checkout_metadata *meta)
@@ -1527,7 +1527,7 @@ int convert_to_working_tree(const struct index_state *istate,
 	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
 }
 
-int renormalize_buffer(const struct index_state *istate, const char *path,
+int renormalize_buffer(struct index_state *istate, const char *path,
 		       const char *src, size_t len, struct strbuf *dst)
 {
 	int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
@@ -1964,7 +1964,7 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
  * Note that you would be crazy to set CRLF, smudge/clean or ident to a
  * large binary blob you would want us not to slurp into the memory!
  */
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
 					const char *path,
 					const struct object_id *oid)
 {
diff --git a/convert.h b/convert.h
index e29d1026a686..0f7c8a1f0446 100644
--- a/convert.h
+++ b/convert.h
@@ -65,41 +65,41 @@ struct checkout_metadata {
 
 extern enum eol core_eol;
 extern char *check_roundtrip_encoding;
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
 					   const char *path);
 const char *get_wt_convert_stats_ascii(const char *path);
-const char *get_convert_attr_ascii(const struct index_state *istate,
+const char *get_convert_attr_ascii(struct index_state *istate,
 				   const char *path);
 
 /* returns 1 if *dst was used */
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
 		   const char *path, const char *src, size_t len,
 		   struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
 			    const char *path, const char *src,
 			    size_t len, struct strbuf *dst,
 			    const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
 				  const char *path, const char *src,
 				  size_t len, struct strbuf *dst,
 				  const struct checkout_metadata *meta,
 				  void *dco);
 int async_query_available_blobs(const char *cmd,
 				struct string_list *available_paths);
-int renormalize_buffer(const struct index_state *istate,
+int renormalize_buffer(struct index_state *istate,
 		       const char *path, const char *src, size_t len,
 		       struct strbuf *dst);
-static inline int would_convert_to_git(const struct index_state *istate,
+static inline int would_convert_to_git(struct index_state *istate,
 				       const char *path)
 {
 	return convert_to_git(istate, path, NULL, 0, NULL, 0);
 }
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
 			      const char *path, int fd,
 			      struct strbuf *dst,
 			      int conv_flags);
-int would_convert_to_git_filter_fd(const struct index_state *istate,
+int would_convert_to_git_filter_fd(struct index_state *istate,
 				   const char *path);
 
 /*
@@ -133,7 +133,7 @@ void reset_parsed_attributes(void);
 
 struct stream_filter; /* opaque */
 
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
 					const char *path,
 					const struct object_id *);
 void free_stream_filter(struct stream_filter *);
diff --git a/dir.c b/dir.c
index fd8aa7c40faa..5b00dfb5b144 100644
--- a/dir.c
+++ b/dir.c
@@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
  * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
  * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
-static int match_pathspec_item(const struct index_state *istate,
+static int match_pathspec_item(struct index_state *istate,
 			       const struct pathspec_item *item, int prefix,
 			       const char *name, int namelen, unsigned flags)
 {
@@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate,
  * pathspec did not match any names, which could indicate that the
  * user mistyped the nth pathspec.
  */
-static int do_match_pathspec(const struct index_state *istate,
+static int do_match_pathspec(struct index_state *istate,
 			     const struct pathspec *ps,
 			     const char *name, int namelen,
 			     int prefix, char *seen,
@@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate,
 	return retval;
 }
 
-static int match_pathspec_with_flags(const struct index_state *istate,
+static int match_pathspec_with_flags(struct index_state *istate,
 				     const struct pathspec *ps,
 				     const char *name, int namelen,
 				     int prefix, char *seen, unsigned flags)
@@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate,
 	return negative ? 0 : positive;
 }
 
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
 		   const struct pathspec *ps,
 		   const char *name, int namelen,
 		   int prefix, char *seen, int is_dir)
@@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate,
 /**
  * Check if a submodule is a superset of the pathspec
  */
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
 			 const struct pathspec *ps,
 			 const char *submodule_name,
 			 char *seen)
@@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base,
 	add_pattern_to_hashsets(pl, pattern);
 }
 
-static int read_skip_worktree_file_from_index(const struct index_state *istate,
+static int read_skip_worktree_file_from_index(struct index_state *istate,
 					      const char *path,
 					      size_t *size_out, char **data_out,
 					      struct oid_stat *oid_stat)
diff --git a/dir.h b/dir.h
index facfae47402a..51cb0e217247 100644
--- a/dir.h
+++ b/dir.h
@@ -354,7 +354,7 @@ int count_slashes(const char *s);
 int simple_length(const char *match);
 int no_wildcard(const char *string);
 char *common_prefix(const struct pathspec *pathspec);
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
 		   const struct pathspec *pathspec,
 		   const char *name, int namelen,
 		   int prefix, char *seen, int is_dir);
@@ -492,12 +492,12 @@ int git_fnmatch(const struct pathspec_item *item,
 		const char *pattern, const char *string,
 		int prefix);
 
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
 			 const struct pathspec *ps,
 			 const char *submodule_name,
 			 char *seen);
 
-static inline int ce_path_match(const struct index_state *istate,
+static inline int ce_path_match(struct index_state *istate,
 				const struct cache_entry *ce,
 				const struct pathspec *pathspec,
 				char *seen)
@@ -506,7 +506,7 @@ static inline int ce_path_match(const struct index_state *istate,
 			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
 }
 
-static inline int dir_path_match(const struct index_state *istate,
+static inline int dir_path_match(struct index_state *istate,
 				 const struct dir_entry *ent,
 				 const struct pathspec *pathspec,
 				 int prefix, char *seen)
diff --git a/merge-recursive.c b/merge-recursive.c
index 3d9207455b3a..b8de7a704eae 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2987,7 +2987,7 @@ static int blob_unchanged(struct merge_options *opt,
 	struct strbuf obuf = STRBUF_INIT;
 	struct strbuf abuf = STRBUF_INIT;
 	int ret = 0; /* assume changed for safety */
-	const struct index_state *idx = opt->repo->index;
+	struct index_state *idx = opt->repo->index;
 
 	if (a->mode != o->mode)
 		return 0;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22f2..b6e333965cb4 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -20,7 +20,7 @@
  * to use find_pathspecs_matching_against_index() instead.
  */
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
-					const struct index_state *istate,
+					struct index_state *istate,
 					char *seen)
 {
 	int num_unmatched = 0, i;
@@ -51,7 +51,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
  * given pathspecs achieves against all items in the index.
  */
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate)
+					    struct index_state *istate)
 {
 	char *seen = xcalloc(pathspec->nr, 1);
 	add_pathspec_matches_against_index(pathspec, istate, seen);
@@ -702,7 +702,7 @@ void clear_pathspec(struct pathspec *pathspec)
 	pathspec->nr = 0;
 }
 
-int match_pathspec_attrs(const struct index_state *istate,
+int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item)
 {
diff --git a/pathspec.h b/pathspec.h
index 454ce364fac7..2ccc8080b6cc 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -150,11 +150,11 @@ static inline int ps_strcmp(const struct pathspec_item *item,
 }
 
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
-					const struct index_state *istate,
+					struct index_state *istate,
 					char *seen);
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate);
-int match_pathspec_attrs(const struct index_state *istate,
+					    struct index_state *istate);
+int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
 
diff --git a/read-cache.c b/read-cache.c
index b9c08773466c..dfec221d9128 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -548,7 +548,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(const struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
 {
 	int first, last;
 
@@ -569,7 +569,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na
 	return -first-1;
 }
 
-int index_name_pos(const struct index_state *istate, const char *name, int namelen)
+int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
 	return index_name_stage_pos(istate, name, namelen, 0);
 }
@@ -3380,8 +3380,8 @@ int repo_read_index_unmerged(struct repository *repo)
  * We helpfully remove a trailing "/" from directories so that
  * the output of read_directory can be used as-is.
  */
-int index_name_is_other(const struct index_state *istate, const char *name,
-		int namelen)
+int index_name_is_other(struct index_state *istate, const char *name,
+			int namelen)
 {
 	int pos;
 	if (namelen && name[namelen - 1] == '/')
@@ -3399,7 +3399,7 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 	return 1;
 }
 
-void *read_blob_data_from_index(const struct index_state *istate,
+void *read_blob_data_from_index(struct index_state *istate,
 				const char *path, unsigned long *size)
 {
 	int pos, len;
diff --git a/submodule.c b/submodule.c
index 9767ba9893cc..83809a4f7bd2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -33,7 +33,7 @@ static struct oid_array ref_tips_after_fetch;
  * will be disabled because we can't guess what might be configured in
  * .gitmodules unless the user resolves the conflict.
  */
-int is_gitmodules_unmerged(const struct index_state *istate)
+int is_gitmodules_unmerged(struct index_state *istate)
 {
 	int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE));
 	if (pos < 0) { /* .gitmodules not found or isn't merged */
@@ -301,7 +301,7 @@ int is_submodule_populated_gently(const char *path, int *return_error_code)
 /*
  * Dies if the provided 'prefix' corresponds to an unpopulated submodule
  */
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
 				  const char *prefix)
 {
 	int i, prefixlen;
@@ -331,7 +331,7 @@ void die_in_unpopulated_submodule(const struct index_state *istate,
 /*
  * Dies if any paths in the provided pathspec descends into a submodule
  */
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
 			       const struct pathspec *ps)
 {
 	int i, j;
diff --git a/submodule.h b/submodule.h
index 4ac6e31cf1f7..84640c49c114 100644
--- a/submodule.h
+++ b/submodule.h
@@ -39,7 +39,7 @@ struct submodule_update_strategy {
 };
 #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
 
-int is_gitmodules_unmerged(const struct index_state *istate);
+int is_gitmodules_unmerged(struct index_state *istate);
 int is_writing_gitmodules_ok(void);
 int is_staging_gitmodules_ok(struct index_state *istate);
 int update_path_in_gitmodules(const char *oldpath, const char *newpath);
@@ -60,9 +60,9 @@ int is_submodule_active(struct repository *repo, const char *path);
  * Otherwise the return error code is the same as of resolve_gitdir_gently.
  */
 int is_submodule_populated_gently(const char *path, int *return_error_code);
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
 				  const char *prefix);
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
 			       const struct pathspec *ps);
 enum submodule_update_type parse_submodule_update_type(const char *value);
 int parse_submodule_update_strategy(const char *value,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 02/27] read-cache: expand on query into sparse-directory entry
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 03/27] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.

In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.

Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index dfec221d9128..b143277ed106 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -566,6 +566,27 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		}
 		first = next+1;
 	}
+
+	if (istate->sparse_index &&
+	    first > 0) {
+		/* Note: first <= istate->cache_nr */
+		struct cache_entry *ce = istate->cache[first - 1];
+
+		/*
+		 * If we are in a sparse-index _and_ the entry before the
+		 * insertion position is a sparse-directory entry that is
+		 * an ancestor of 'name', then we need to expand the index
+		 * and search again. This will only trigger once, because
+		 * thereafter the index is fully expanded.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) < namelen &&
+		    !strncmp(name, ce->name, ce_namelen(ce))) {
+			ensure_full_index(istate);
+			return index_name_stage_pos(istate, name, namelen, stage);
+		}
+	}
+
 	return -first-1;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 03/27] sparse-index: API protection strategy
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 02/27] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 04/27] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Edit and expand the sparse-index design document with the plan for
guarding index operations with ensure_full_index().

Notably, the plan has changed to not have an expand_to_path() method in
favor of checking for a sparse-directory hit inside of the
index_path_pos() API.

The changes that follow this one will incrementally add
ensure_full_index() guards to iterations over all cache entries. Some
iterations over the cache entries are not protected due to a few
categories listed in the document. Since these are not being modified,
here is a short list of the files and methods that will not receive
these guards:

Looking for non-zero stage:
* builtin/add.c:chmod_pathspec()
* builtin/merge.c:count_unmerged_entries()
* read-cache.c:unmerged_index()
* rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
* revision.c:prepare_show_merge()
* sequencer.c:append_conflicts_hint()
* wt-status.c:wt_status_collect_changes_initial()

Looking for submodules:
* builtin/submodule--helper.c:module_list_compute()
* submodule.c: several methods
* worktree.c:validate_no_submodules()

Part of the index API:
* name-hash.c: lazy init methods
* preload-index.c:preload_thread(), preload_index()
* read-cache.c: file format methods

Checking for correct order of cache entries:
* read-cache.c:check_ce_order()

Ignores SKIP_WORKTREE entries or already aware:
* unpack-trees.c:mark_new_skip_worktree()
* wt-status.c:wt_status_check_sparse_checkout()

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/sparse-index.txt | 32 +++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt
index aa116406a016..7ab51bf6c441 100644
--- a/Documentation/technical/sparse-index.txt
+++ b/Documentation/technical/sparse-index.txt
@@ -82,9 +82,35 @@ also introduce other features that have been considered for improving the
 index, as well.
 
 Next, consumers of the index will be guarded against operating on a
-sparse-index by inserting calls to `ensure_full_index()` or
-`expand_index_to_path()`. After these guards are in place, we can begin
-leaving sparse-directory entries in the in-memory index structure.
+sparse-index by inserting calls to `ensure_full_index()` before iterating
+over all cache entries. If a specific path is requested, then those will
+be protected from within the `index_file_exists()` and `index_name_pos()`
+API calls: they will call `ensure_full_index()` if necessary.
+
+During a scan of the codebase, not every iteration of the cache entries
+needs an `ensure_full_index()` check. The basic reasons include:
+
+1. The loop is scanning for entries with non-zero stage. These entries
+   are not collapsed into a sparse-directory entry.
+
+2. The loop is scanning for submodules. These entries are not collapsed
+   into a sparse-directory entry.
+
+3. The loop is part of the index API, especially around reading or
+   writing the format.
+
+4. The loop is checking for correct order of cache entries and that is
+   correct if and only if the sparse-directory entries are in the correct
+   location.
+
+5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is
+   otherwise already aware of sparse directory entries.
+
+6. The sparse-index is disabled at this point when using the split-index
+   feature, so no effort is made to protect the split-index API.
+
+After these guards are in place, we can begin leaving sparse-directory
+entries in the in-memory index structure.
 
 Even after inserting these guards, we will keep expanding sparse-indexes
 for most Git commands using the `command_requires_full_index` repository
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 04/27] cache: move ensure_full_index() to cache.h
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 03/27] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Soon we will insert ensure_full_index() calls across the codebase.
Instead of also adding include statements for sparse-index.h, let's just
use the fact that anything that cares about the index already has
cache.h in its includes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h        | 1 +
 sparse-index.h | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index f244e6ac90fd..9c6d254e2bae 100644
--- a/cache.h
+++ b/cache.h
@@ -350,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce);
 void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
 void free_name_hash(struct index_state *istate);
 
+void ensure_full_index(struct index_state *istate);
 
 /* Cache entry creation and cleanup */
 
diff --git a/sparse-index.h b/sparse-index.h
index 39dcc859735e..0268f38753c0 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -2,7 +2,6 @@
 #define SPARSE_INDEX_H__
 
 struct index_state;
-void ensure_full_index(struct index_state *istate);
 int convert_to_sparse(struct index_state *istate);
 
 struct repository;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 05/27] add: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 04/27] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-17 17:35   ` Elijah Newren
  2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3a2..ab2ef4a63530 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 {
 	int i, retval = 0;
 
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 06/27] checkout-index: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-17 17:50   ` Elijah Newren
  2021-03-16 21:16 ` [PATCH 07/27] checkout: " Derrick Stolee via GitGitGadget
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before we iterate over all cache entries, ensure that the index is not
sparse. This loop in checkout_all() might be safe to iterate over a
sparse index, but let's put this protection here until it can be
carefully tested.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout-index.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 023e49e271c2..ba31a036f193 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -119,6 +119,7 @@ static void checkout_all(const char *prefix, int prefix_length)
 	int i, errs = 0;
 	struct cache_entry *last_ce = NULL;
 
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr ; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce) != checkout_stage
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 07/27] checkout: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 08/27] commit: " Derrick Stolee via GitGitGadget
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries in the checkout builtin, ensure
that we have a full index to avoid any unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0e6639052001..2e64569d2e13 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -368,6 +368,7 @@ static int checkout_worktree(const struct checkout_opts *opts,
 			       NULL);
 
 	enable_delayed_checkout(&state);
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -512,6 +513,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 	 * Make sure all pathspecs participated in locating the paths
 	 * to be checked out.
 	 */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++)
 		if (opts->overlay_mode)
 			mark_ce_for_checkout_overlay(active_cache[pos],
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 08/27] commit: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 07/27] checkout: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 09/27] difftool: " Derrick Stolee via GitGitGadget
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

These two loops iterate over all cache entries, so ensure that a sparse
index is expanded to a full index before we do so.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/commit.c b/builtin/commit.c
index 739110c5a7f6..9ea9e7b53390 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -251,6 +251,7 @@ static int list_paths(struct string_list *list, const char *with_tree,
 		free(max_prefix);
 	}
 
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		struct string_list_item *item;
@@ -931,6 +932,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		if (get_oid(parent, &oid)) {
 			int i, ita_nr = 0;
 
+			ensure_full_index(&the_index);
 			for (i = 0; i < active_nr; i++)
 				if (ce_intent_to_add(active_cache[i]))
 					ita_nr++;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 09/27] difftool: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 08/27] commit: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 10/27] fsck: " Derrick Stolee via GitGitGadget
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index has
been expanded to a full one to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/difftool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fddf..1d62f77efa61 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -584,6 +584,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 		setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
 	rc = run_command_v_opt(helper_argv, flags);
 
+	ensure_full_index(&wtindex);
+
 	/*
 	 * If the diff includes working copy files and those
 	 * files were modified during the diff, then the changes
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 10/27] fsck: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 09/27] difftool: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:16 ` [PATCH 11/27] grep: " Derrick Stolee via GitGitGadget
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying all blobs reachable from the index, ensure that a sparse
index has been expanded to a full one to avoid missing some blobs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c706..524d3fca28f6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -883,6 +883,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		verify_index_checksum = 1;
 		verify_ce_order = 1;
 		read_cache();
+		ensure_full_index(&the_index);
 		for (i = 0; i < active_nr; i++) {
 			unsigned int mode;
 			struct blob *blob;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 11/27] grep: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 10/27] fsck: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:16 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 12/27] ls-files: " Derrick Stolee via GitGitGadget
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:16 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one so we do not miss blobs to scan. Later, this can
integrate more carefully with sparse indexes with proper testing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/grep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 4e91a253ac3b..652d6d60f644 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -504,6 +504,7 @@ static int grep_cache(struct grep_opt *opt,
 	if (repo_read_index(repo) < 0)
 		die(_("index file corrupt"));
 
+	ensure_full_index(repo->index);
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 12/27] ls-files: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2021-03-16 21:16 ` [PATCH 11/27] grep: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 13/27] merge-index: " Derrick Stolee via GitGitGadget
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid missing files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/ls-files.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 76a0d35c1a01..8fbef2b14f29 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -317,6 +317,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 
 	if (!(show_cached || show_stage || show_deleted || show_modified))
 		return;
+	ensure_full_index(repo->index);
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
@@ -495,6 +496,7 @@ void overlay_tree_on_index(struct index_state *istate,
 		die("bad tree-ish %s", tree_name);
 
 	/* Hoist the unmerged entries up to stage #3 to make room */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		if (!ce_stage(ce))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 13/27] merge-index: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (11 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 12/27] ls-files: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 14/27] rm: " Derrick Stolee via GitGitGadget
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/merge-index.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 38ea6ad6ca25..c1c4409b536d 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -58,6 +58,7 @@ static void merge_one_path(const char *path)
 static void merge_all(void)
 {
 	int i;
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
@@ -80,6 +81,8 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
 
 	read_cache();
 
+	ensure_full_index(&the_index);
+
 	i = 1;
 	if (!strcmp(argv[i], "-o")) {
 		one_shot = 1;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 14/27] rm: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (12 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 13/27] merge-index: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/rm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f02..99125ccb8424 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -293,6 +293,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 15/27] sparse-checkout: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (13 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 14/27] rm: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-18  5:22   ` Elijah Newren
  2021-03-16 21:17 ` [PATCH 16/27] update-index: " Derrick Stolee via GitGitGadget
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When adjusting the sparse-checkout definition, ensure that a sparse
index is expanded to a full one to avoid unexpected behavior. This is a
top candidate for later integration with the sparse-index, but requires
careful testing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/stash.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/stash.c b/builtin/stash.c
index ba774cce674f..b3d4fe814b6b 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1350,6 +1350,7 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		int i;
 		char *ps_matched = xcalloc(ps->nr, 1);
 
+		ensure_full_index(&the_index);
 		for (i = 0; i < active_nr; i++)
 			ce_path_match(&the_index, active_cache[i], ps,
 				      ps_matched);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 16/27] update-index: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (14 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/update-index.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 79087bccea4b..8eb654ddc754 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -745,6 +745,7 @@ static int do_reupdate(int ac, const char **av,
 		 */
 		has_head = 0;
  redo:
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++) {
 		const struct cache_entry *ce = active_cache[pos];
 		struct cache_entry *old = NULL;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 17/27] diff-lib: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (15 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 16/27] update-index: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-18  5:24   ` Elijah Newren
  2021-03-16 21:17 ` [PATCH 18/27] dir: " Derrick Stolee via GitGitGadget
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/diff-lib.c b/diff-lib.c
index b73cc1859a49..41d6fcec1a81 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -102,6 +102,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 
 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
+
 	entries = istate->cache_nr;
 	for (i = 0; i < entries; i++) {
 		unsigned int oldmode, newmode;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 18/27] dir: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (16 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 19/27] entry: " Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dir.c b/dir.c
index 5b00dfb5b144..1c9f61566e66 100644
--- a/dir.c
+++ b/dir.c
@@ -3533,6 +3533,7 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
 	if (repo_read_index(&subrepo) < 0)
 		die(_("index file corrupt in repo %s"), subrepo.gitdir);
 
+	ensure_full_index(subrepo.index);
 	for (i = 0; i < subrepo.index->cache_nr; i++) {
 		const struct cache_entry *ce = subrepo.index->cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 19/27] entry: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (17 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 18/27] dir: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 entry.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/entry.c b/entry.c
index 7b9f43716f76..e55a2af867cf 100644
--- a/entry.c
+++ b/entry.c
@@ -412,6 +412,7 @@ static void mark_colliding_entries(const struct checkout *state,
 
 	ce->ce_flags |= CE_MATCHED;
 
+	ensure_full_index(state->istate);
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 20/27] merge-ort: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (18 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 19/27] entry: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-18  5:31   ` Elijah Newren
  2021-03-16 21:17 ` [PATCH 21/27] merge-recursive: " Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.

This is a top candidate for updating later with a proper integration
with the sparse index, since we will likely enable the ORT strategy by
default when the sparse index is enabled. This appears to be the only
place where the ORT strategy interacts with the index in such a global
way, so that integration should be clear once the ORT strategy and the
sparse index topics stabilize.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 merge-ort.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/merge-ort.c b/merge-ort.c
index 603d30c52170..9f737212555d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3112,6 +3112,7 @@ static int record_conflicted_index_entries(struct merge_options *opt,
 	if (strmap_empty(conflicted))
 		return 0;
 
+	ensure_full_index(index);
 	original_cache_nr = index->cache_nr;
 
 	/* Put every entry from paths into plist, then sort */
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 21/27] merge-recursive: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (19 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 22/27] pathspec: " Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 merge-recursive.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index b8de7a704eae..7954c7829d27 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -522,6 +522,7 @@ static struct string_list *get_unmerged(struct index_state *istate)
 
 	unmerged->strdup_strings = 1;
 
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct string_list_item *item;
 		struct stage_data *e;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 22/27] pathspec: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (20 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 21/27] merge-recursive: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 23/27] read-cache: " Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pathspec.c b/pathspec.c
index b6e333965cb4..08736ab368d9 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -36,6 +36,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		ce_path_match(istate, ce, pathspec, seen);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 23/27] read-cache: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (21 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 22/27] pathspec: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 24/27] resolve-undo: " Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index b143277ed106..7449f31381cf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1576,6 +1576,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -2493,6 +2494,7 @@ int repo_index_has_changes(struct repository *repo,
 		diff_flush(&opt);
 		return opt.flags.has_changes != 0;
 	} else {
+		ensure_full_index(istate);
 		for (i = 0; sb && i < istate->cache_nr; i++) {
 			if (i)
 				strbuf_addch(sb, ' ');
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 24/27] resolve-undo: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (22 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 23/27] read-cache: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 25/27] revision: " Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 resolve-undo.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/resolve-undo.c b/resolve-undo.c
index 236320f179cb..3ccd3f66f80f 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -172,6 +172,7 @@ void unmerge_marked_index(struct index_state *istate)
 	if (!istate->resolve_undo)
 		return;
 
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (ce->ce_flags & CE_MATCHED)
@@ -186,6 +187,7 @@ void unmerge_index(struct index_state *istate, const struct pathspec *pathspec)
 	if (!istate->resolve_undo)
 		return;
 
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (!ce_path_match(istate, ce, pathspec, NULL))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 25/27] revision: ensure full index
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (23 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 24/27] resolve-undo: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 26/27] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all index entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior. This case could
be integrated later by ensuring that we walk the tree in the
sparse-directory entry, but the current behavior is only expecting
blobs. Save this integration for later when it can be properly tested.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/revision.c b/revision.c
index b78733f5089b..1eee55934c08 100644
--- a/revision.c
+++ b/revision.c
@@ -1680,6 +1680,7 @@ static void do_add_index_objects_to_pending(struct rev_info *revs,
 {
 	int i;
 
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct blob *blob;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 26/27] sparse-index: expand_to_path()
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (24 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 25/27] revision: " Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-16 21:17 ` [PATCH 27/27] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.

When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.

Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.

The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.

There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.

To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
 sparse-index.h | 13 +++++++++
 2 files changed, 85 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 56313e805d9d..e5712d98d8b9 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -291,3 +291,75 @@ void ensure_full_index(struct index_state *istate)
 
 	trace2_region_leave("index", "ensure_full_index", istate->repo);
 }
+
+/*
+ * This static global helps avoid infinite recursion between
+ * expand_to_path() and index_file_exists().
+ */
+static int in_expand_to_path = 0;
+
+void expand_to_path(struct index_state *istate,
+		    const char *path, size_t pathlen, int icase)
+{
+	struct strbuf path_mutable = STRBUF_INIT;
+	size_t substr_len;
+
+	/* prevent extra recursion */
+	if (in_expand_to_path)
+		return;
+
+	if (!istate || !istate->sparse_index)
+		return;
+
+	if (!istate->repo)
+		istate->repo = the_repository;
+
+	in_expand_to_path = 1;
+
+	/*
+	 * We only need to actually expand a region if the
+	 * following are both true:
+	 *
+	 * 1. 'path' is not already in the index.
+	 * 2. Some parent directory of 'path' is a sparse directory.
+	 */
+
+	if (index_file_exists(istate, path, pathlen, icase))
+		goto cleanup;
+
+	strbuf_add(&path_mutable, path, pathlen);
+	strbuf_addch(&path_mutable, '/');
+
+	/* Check the name hash for all parent directories */
+	substr_len = 0;
+	while (substr_len < pathlen) {
+		char temp;
+		char *replace = strchr(path_mutable.buf + substr_len, '/');
+
+		if (!replace)
+			break;
+
+		/* replace the character _after_ the slash */
+		replace++;
+		temp = *replace;
+		*replace = '\0';
+		if (index_file_exists(istate, path_mutable.buf,
+				      path_mutable.len, icase)) {
+			/*
+			 * We found a parent directory in the name-hash
+			 * hashtable, which means that this entry could
+			 * exist within a sparse-directory entry. Expand
+			 * accordingly.
+			 */
+			ensure_full_index(istate);
+			break;
+		}
+
+		*replace = temp;
+		substr_len = replace - path_mutable.buf;
+	}
+
+cleanup:
+	strbuf_release(&path_mutable);
+	in_expand_to_path = 0;
+}
diff --git a/sparse-index.h b/sparse-index.h
index 0268f38753c0..1115a0d7dd98 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -4,6 +4,19 @@
 struct index_state;
 int convert_to_sparse(struct index_state *istate);
 
+/*
+ * Some places in the codebase expect to search for a specific path.
+ * This path might be outside of the sparse-checkout definition, in
+ * which case a sparse-index may not contain a path for that index.
+ *
+ * Given an index and a path, check to see if a leading directory for
+ * 'path' exists in the index as a sparse directory. In that case,
+ * expand that sparse directory to a full range of cache entries and
+ * populate the index accordingly.
+ */
+void expand_to_path(struct index_state *istate,
+		    const char *path, size_t pathlen, int icase);
+
 struct repository;
 int set_sparse_index_config(struct repository *repo, int enable);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 27/27] name-hash: use expand_to_path()
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (25 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 26/27] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-03-16 21:17 ` Derrick Stolee via GitGitGadget
  2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
  28 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-03-16 21:17 UTC (permalink / raw)
  To: git; +Cc: newren, gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A sparse-index loads the name-hash data for its entries, including the
sparse-directory entries. If a caller asks for a path that is contained
within a sparse-directory entry, we need to expand to a full index and
recalculate the name hash table before returning the result. Insert
calls to expand_to_path() to protect against this case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 4e03fac9bb12..75c159e06eeb 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "thread-utils.h"
 #include "trace2.h"
+#include "sparse-index.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -109,6 +110,12 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 	if (ce->ce_flags & CE_HASHED)
 		return;
 	ce->ce_flags |= CE_HASHED;
+
+	if (S_ISSPARSEDIR(ce->ce_mode)) {
+		add_dir_entry(istate, ce);
+		return;
+	}
+
 	hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
 	hashmap_add(&istate->name_hash, &ce->ent);
 
@@ -680,6 +687,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	struct dir_entry *dir;
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, 0);
 	dir = find_dir_entry(istate, name, namelen);
 	return dir && dir->nr;
 }
@@ -690,6 +698,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
 	const char *ptr = startPtr;
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, strlen(name), 0);
 	while (*ptr) {
 		while (*ptr && *ptr != '/')
 			ptr++;
@@ -713,6 +722,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
 	unsigned int hash = memihash(name, namelen);
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, icase);
 
 	ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
 					 struct cache_entry, ent);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 05/27] add: ensure full index
  2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
@ 2021-03-17 17:35   ` Elijah Newren
  2021-03-17 20:35     ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-17 17:35 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before iterating over all cache entries, ensure that a sparse index is
> expanded to a full index to avoid unexpected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index ea762a41e3a2..ab2ef4a63530 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
>  {
>         int i, retval = 0;
>
> +       ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 struct cache_entry *ce = active_cache[i];

I'm not so sure about this one; from git-add(1):

       --renormalize
           Apply the "clean" process freshly to all tracked files to forcibly
           add them again to the index. This is useful after changing
           core.autocrlf configuration or the text attribute in order to
           correct files added with wrong CRLF/LF line endings. This option
           implies -u.

... to "all tracked files".  Should that be interpreted as all files
within the sparsity paths, especially considering that we're trying to
enable folks to work within the set of files of interest to them?  If
it doesn't imply that, wouldn't users want an extra option to be able
to behave that way?  And if we add an option, why are we adding a
special option for people wanting to behave sparsely in a
sparse-index-cone-mode-sparse-checkout rather than for the special
case of folks wanting to behave densely in such a setup?

The code below already has the following check below:

    if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
      continue; /* do not touch non blobs */

suggesting that it'd correctly skip over the sparse directories, so I
think this one just isn't needed.

However, on a tangent, that S_ISLNK looks rather suspicious to me.
Why would we renormalize symlinks?  I mean, sure, symlinks aren't
likely to have CRLF/LF characters in them, but if they did, wouldn't
it be wrong to renormalize them?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/27] checkout-index: ensure full index
  2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
@ 2021-03-17 17:50   ` Elijah Newren
  2021-03-17 20:05     ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-17 17:50 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before we iterate over all cache entries, ensure that the index is not
> sparse. This loop in checkout_all() might be safe to iterate over a
> sparse index, but let's put this protection here until it can be
> carefully tested.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/checkout-index.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
> index 023e49e271c2..ba31a036f193 100644
> --- a/builtin/checkout-index.c
> +++ b/builtin/checkout-index.c
> @@ -119,6 +119,7 @@ static void checkout_all(const char *prefix, int prefix_length)
>         int i, errs = 0;
>         struct cache_entry *last_ce = NULL;
>
> +       ensure_full_index(&the_index);
>         for (i = 0; i < active_nr ; i++) {
>                 struct cache_entry *ce = active_cache[i];
>                 if (ce_stage(ce) != checkout_stage

With the caveat in the commit message, this change looks okay, but
checkout-index may be buggy regardless of the presence of
ensure_full_index().  If ensure_full_index() really is needed here
because it needs to operate on all SKIP_WORKTREE paths and not just
leading directories, that's because it's writing all those
SKIP_WORKTREE entries to the working tree.  When it writes them to the
working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
a bit of a pickle...

Might be nice to add a
/* TODO: audit if this is needed; if it is, we may have other bugs... */
or something like that.  But then again, perhaps you're considering
all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
If so, and we determine we really do need one and want to keep it
indefinitely, will we mark those with a comment about why it's
considered correct?

I just want a way to know what still needs to be audited and what
doesn't without doing a lot of history spelunking...

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 00/27] Sparse Index: API protections
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (26 preceding siblings ...)
  2021-03-16 21:17 ` [PATCH 27/27] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-03-17 18:03 ` Elijah Newren
  2021-03-18  6:32   ` Elijah Newren
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
  28 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-17 18:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the second patch series submission coming out of the sparse-index
> RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> This is based on v3 of the format series [2].
>
> [2]
> https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@gmail.com/
>
> The point of this series is to insert protections for the consumers of the
> in-memory index. We mark certain regions of code as needing a full index, so
> we call ensure_full_index() to expand a sparse index to a full one, if
> necessary. These protections are inserted file-by-file in every loop over
> all cache entries. Well, "most" loops, because some are going to be handled
> in the very next series so I leave them out.
>
> Many callers use index_name_pos() to find a path by name. In these cases, we
> can check if that position resolves to a sparse directory instance. In those
> cases, we just expand to a full index and run the search again.
>
> The last few patches deal with the name-hash hashtable for doing O(1)
> lookups.
>
> These protections don't do much right now, since the previous series created
> the_repository->settings.command_requires_full_index to guard all index
> reads and writes to ensure the in-memory copy is full for commands that have
> not been tested with the sparse index yet.
>
> However, after this series is complete, we now have a straight-forward plan
> for making commands "sparse aware" one-by-one:
>
>  1. Disable settings.command_requires_full_index to allow an in-memory
>     sparse-index.
>  2. Run versions of that command under a debugger, breaking on
>     ensure_full_index().
>  3. Examine the call stack to determine the context of that expansion, then
>     implement the proper behavior in those locations.
>  4. Add tests to ensure we are checking this logic in the presence of sparse
>     directory entries.

I started reading the series, then noticed I didn't like the first few
additions of ensure_full_index().  The first was because I thought it
just wasn't needed as per a few lines of code later, but the more
important one that stuck out to me was another where if the
ensure_full_index() call is needed to avoid the code blowing up, then
the code has a good chance that it is doing something inherently wrong
in a sparse-checkout/sparse-index anyway.

So I guess that brings me to the question I asked in 07/27 -- is the
presence of ensure_full_index() a note that the code needs to be later
audited and tweaked to work with sparse-indexes?  If so, then good,
this series may make sense.  However, will that always be the case?
If we think some of these will be left in the code, is there a plan to
annotate each one that has been audited and determined that it needs
to stay?  If not, then each ensure_full_index() might or might not
have been audited for correctness and it becomes a pain to know what's
left to fix.  If the plan is that these are to be audited, and to be
marked if they are truly deemed necessary, then the series makes sense
to me.  If not, then I'm confused about something with the series and
need some help with the goals and plans.

If I'm confused about the goals and the plans, then my reviews will
probably be less than helpful, so I'll suspend reading the series
until I understand the plan a little better.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/27] checkout-index: ensure full index
  2021-03-17 17:50   ` Elijah Newren
@ 2021-03-17 20:05     ` Derrick Stolee
  2021-03-17 21:10       ` Elijah Newren
  0 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee @ 2021-03-17 20:05 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/17/2021 1:50 PM, Elijah Newren wrote:
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> With the caveat in the commit message, this change looks okay, but
> checkout-index may be buggy regardless of the presence of
> ensure_full_index().  If ensure_full_index() really is needed here
> because it needs to operate on all SKIP_WORKTREE paths and not just
> leading directories, that's because it's writing all those
> SKIP_WORKTREE entries to the working tree.  When it writes them to the
> working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
> a bit of a pickle...

Perhaps I'm unclear in my intentions with this series: _every_
insertion of ensure_full_index() is intended to be audited with
tests in the future. Some might need behavior change, and others
will not. In this series, I'm just putting in the protections so
we don't accidentally trigger unexpected behavior.

Since tests take time to write and review, I was hoping that these
insertions were minimal enough to get us to a safe place where we
can remove the guards carefully.

So with that in mind...

> Might be nice to add a
> /* TODO: audit if this is needed; if it is, we may have other bugs... */
> or something like that.  But then again, perhaps you're considering
> all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
> If so, and we determine we really do need one and want to keep it
> indefinitely, will we mark those with a comment about why it's
> considered correct?
> 
> I just want a way to know what still needs to be audited and what
> doesn't without doing a lot of history spelunking...

...every insertion "needs to be audited" in the future. That's a
big part of the next "phases" in the implementation plan.

As you suggest, it might be a good idea to add a comment to every
insertion, to mark it as un-audited, such as:

	/* TODO: test if ensure_full_index() is necessary */

We can come back later to delete the comment if it truly is
necessary (and add tests to guarantee correct behavior). We can
also remove the comment _and_ the call by modifying the loop
behavior to do the right thing in some cases.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 05/27] add: ensure full index
  2021-03-17 17:35   ` Elijah Newren
@ 2021-03-17 20:35     ` Matheus Tavares Bernardino
  2021-03-17 20:55       ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Matheus Tavares Bernardino @ 2021-03-17 20:35 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Mar 17, 2021 at 2:36 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > Before iterating over all cache entries, ensure that a sparse index is
> > expanded to a full index to avoid unexpected behavior.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  builtin/add.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/builtin/add.c b/builtin/add.c
> > index ea762a41e3a2..ab2ef4a63530 100644
> > --- a/builtin/add.c
> > +++ b/builtin/add.c
> > @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
> >  {
> >         int i, retval = 0;
> >
> > +       ensure_full_index(&the_index);
> >         for (i = 0; i < active_nr; i++) {
> >                 struct cache_entry *ce = active_cache[i];
>
> I'm not so sure about this one; from git-add(1):
>
>        --renormalize
>            Apply the "clean" process freshly to all tracked files to forcibly
>            add them again to the index. This is useful after changing
>            core.autocrlf configuration or the text attribute in order to
>            correct files added with wrong CRLF/LF line endings. This option
>            implies -u.
>
> ... to "all tracked files".  Should that be interpreted as all files
> within the sparsity paths, especially considering that we're trying to
> enable folks to work within the set of files of interest to them?

I had the same question when working on the add/rm sparse-checkout
series. As seen at [1], --renormalize and --chmod are, currently, the
only options in git-add that do not honor the sparsity patterns and do
update SKIP_WORKTREE paths too.

But is this by design or possibly just an oversight? In my series I
ended up making both options honor the sparsity rules, with a warning
when requested to update any sparse path. (If we are going with this
approach, perhaps should we also amend the docs to remove any
ambiguity as for what "all tracked files" means in this case?)

> The code below already has the following check below:
>
>     if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
>       continue; /* do not touch non blobs */
>
> suggesting that it'd correctly skip over the sparse directories, so I
> think this one just isn't needed.

But if sparse index is not enabled, it does not skip over the sparse
files, right? So isn't the ensure_full_index() call here (and in
chmod_pathspec()) important to be consistent with the case when sparse
index is not in use? Or maybe Stolee could rebase this on top of my
series, where both these places already skip the sparse paths?
(Assuming that's the behavior we are looking for. If not, I can also
fix my series.)

[1]: https://lore.kernel.org/git/d93c3f51465a3e409819db63bd81802e7ef20ea9.1615588108.git.matheus.bernardino@usp.br/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 05/27] add: ensure full index
  2021-03-17 20:35     ` Matheus Tavares Bernardino
@ 2021-03-17 20:55       ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-17 20:55 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/17/2021 4:35 PM, Matheus Tavares Bernardino wrote:
> On Wed, Mar 17, 2021 at 2:36 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:>> I'm not so sure about this one; from git-add(1):
>>
>>        --renormalize
>>            Apply the "clean" process freshly to all tracked files to forcibly
>>            add them again to the index. This is useful after changing
>>            core.autocrlf configuration or the text attribute in order to
>>            correct files added with wrong CRLF/LF line endings. This option
>>            implies -u.
>>
>> ... to "all tracked files".  Should that be interpreted as all files
>> within the sparsity paths, especially considering that we're trying to
>> enable folks to work within the set of files of interest to them?

I wrote in reply to another similar comment that I'm being overly
cautious in creating these protections. When I can come back later
with careful tests, we can ensure that everything behaves properly.

> I had the same question when working on the add/rm sparse-checkout
> series. As seen at [1], --renormalize and --chmod are, currently, the
> only options in git-add that do not honor the sparsity patterns and do
> update SKIP_WORKTREE paths too.
> 
> But is this by design or possibly just an oversight? In my series I
> ended up making both options honor the sparsity rules, with a warning
> when requested to update any sparse path. (If we are going with this
> approach, perhaps should we also amend the docs to remove any
> ambiguity as for what "all tracked files" means in this case?)

I'd be happy to see a resolution here, and it can happen in parallel
to what I'm doing here.

>> The code below already has the following check below:
>>
>>     if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
>>       continue; /* do not touch non blobs */
>>
>> suggesting that it'd correctly skip over the sparse directories, so I
>> think this one just isn't needed.
> 
> But if sparse index is not enabled, it does not skip over the sparse
> files, right? So isn't the ensure_full_index() call here (and in
> chmod_pathspec()) important to be consistent with the case when sparse
> index is not in use?

The tests I am adding in t1092-sparse-checkout-compatibility.sh are
focused on ensuring the same behavior across sparse-checkouts with
and without the sparse-index, and also a full checkout (when possible).

Since the sparse-index requires cone-mode sparse-checkout (currently),
and the sparse files to be within a directory (or no index change
happens), the tests you created in t3705-add-sparse-checkout.sh don't
seem to apply. I'll need to find some important scenarios to duplicate
in t1092 without the full depth of t3705.

> Or maybe Stolee could rebase this on top of my
> series, where both these places already skip the sparse paths?
> (Assuming that's the behavior we are looking for. If not, I can also> fix my series.)

I think they can proceed in parallel and then continue more carefully
in the future. The series _after_ this one makes 'git status' and
'git add' work with the sparse-index, so at that point we will be
removing these protections at the right time, with tests. In addition,
those tests will ensure that we don't expand the sparse-index unless
absolutely necessary.

If that work collides with your approach, then I'll rebase onto a
merge of that topic and this one.

> [1]: https://lore.kernel.org/git/d93c3f51465a3e409819db63bd81802e7ef20ea9.1615588108.git.matheus.bernardino@usp.br/
 
(I will go take a look over here. I've been ignoring the thread for
too long.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/27] checkout-index: ensure full index
  2021-03-17 20:05     ` Derrick Stolee
@ 2021-03-17 21:10       ` Elijah Newren
  2021-03-17 21:33         ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-17 21:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Mar 17, 2021 at 1:05 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/17/2021 1:50 PM, Elijah Newren wrote:
> > On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > With the caveat in the commit message, this change looks okay, but
> > checkout-index may be buggy regardless of the presence of
> > ensure_full_index().  If ensure_full_index() really is needed here
> > because it needs to operate on all SKIP_WORKTREE paths and not just
> > leading directories, that's because it's writing all those
> > SKIP_WORKTREE entries to the working tree.  When it writes them to the
> > working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
> > a bit of a pickle...
>
> Perhaps I'm unclear in my intentions with this series: _every_
> insertion of ensure_full_index() is intended to be audited with
> tests in the future. Some might need behavior change, and others
> will not. In this series, I'm just putting in the protections so
> we don't accidentally trigger unexpected behavior.

I think this may be part of my qualms -- what do you mean by not
accidentally triggering unexpected behavior?  In particular, does your
statement imply that whatever behavior you get after putting in
ensure_full_index() is "expected"?  I think I'm reading that
implication into it, and objecting that the behavior with the
ensure_full_index() still isn't expected.  You've only removed a
certain class of unexpected behavior, namely code that wasn't written
to expect tree entries that suddenly gets them.  You haven't handled
the class of "user wants to work with a subset of files, why are all
these unrelated files being munged/updated/computed/shown/etc."
unexpected behavior.

I'm worrying that expectations are being set up such that working with
just a small section of the code will be unusably hard.  There may be
several commands/flags where it could make sense to operate on either
(a) all files in the repo or (b) just on files within your sparse
paths.  If, though, folks interpret operate-on-all-files as the
"normal" mode (and history suggests they will), then people start
adding all kinds of --no-do-this-sparsely flags to each command, and
then users who want sparse operation have to remember to type such a
flag with each and every command they ever run -- despite having taken
at least three steps already to get a sparse-index.

I believe the extended discussions (for _months_!) on just grep & rm,
plus watching a --sparse patch being floated just in the last day for
ls-files suggest to me that this is a _very_ likely outcome and I'm
worried about it.

> Since tests take time to write and review, I was hoping that these
> insertions were minimal enough to get us to a safe place where we
> can remove the guards carefully.
>
> So with that in mind...
>
> > Might be nice to add a
> > /* TODO: audit if this is needed; if it is, we may have other bugs... */
> > or something like that.  But then again, perhaps you're considering
> > all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
> > If so, and we determine we really do need one and want to keep it
> > indefinitely, will we mark those with a comment about why it's
> > considered correct?
> >
> > I just want a way to know what still needs to be audited and what
> > doesn't without doing a lot of history spelunking...
>
> ...every insertion "needs to be audited" in the future. That's a
> big part of the next "phases" in the implementation plan.
>
> As you suggest, it might be a good idea to add a comment to every
> insertion, to mark it as un-audited, such as:
>
>         /* TODO: test if ensure_full_index() is necessary */
>
> We can come back later to delete the comment if it truly is
> necessary (and add tests to guarantee correct behavior). We can
> also remove the comment _and_ the call by modifying the loop
> behavior to do the right thing in some cases.

If it's "needs to be audited for both performance reasons (can we
operate on fewer entries as an invisible doesn't-change-results
optimization) and correctness reasons (should we operate on fewer
entries and given a modified result within a sparse-index because
users would expect that, but maybe provide a special flag for the
users who want to operate on all files in the repo)" and there's also
an agreement that either audited or unaudited ones will be marked (or
both), then great, I'm happy.  If not, can we discuss which part of my
performance/correctness/marking we aren't in agreement on?


Thanks,
Elijah

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/27] checkout-index: ensure full index
  2021-03-17 21:10       ` Elijah Newren
@ 2021-03-17 21:33         ` Derrick Stolee
  2021-03-17 22:36           ` Elijah Newren
  0 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee @ 2021-03-17 21:33 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/17/2021 5:10 PM, Elijah Newren wrote:
> On Wed, Mar 17, 2021 at 1:05 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 3/17/2021 1:50 PM, Elijah Newren wrote:
>>> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>> With the caveat in the commit message, this change looks okay, but
>>> checkout-index may be buggy regardless of the presence of
>>> ensure_full_index().  If ensure_full_index() really is needed here
>>> because it needs to operate on all SKIP_WORKTREE paths and not just
>>> leading directories, that's because it's writing all those
>>> SKIP_WORKTREE entries to the working tree.  When it writes them to the
>>> working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
>>> a bit of a pickle...
>>
>> Perhaps I'm unclear in my intentions with this series: _every_
>> insertion of ensure_full_index() is intended to be audited with
>> tests in the future. Some might need behavior change, and others
>> will not. In this series, I'm just putting in the protections so
>> we don't accidentally trigger unexpected behavior.
> 
> I think this may be part of my qualms -- what do you mean by not
> accidentally triggering unexpected behavior?  In particular, does your
> statement imply that whatever behavior you get after putting in
> ensure_full_index() is "expected"?  I think I'm reading that
> implication into it, and objecting that the behavior with the
> ensure_full_index() still isn't expected.  You've only removed a
> certain class of unexpected behavior, namely code that wasn't written
> to expect tree entries that suddenly gets them.  You haven't handled
> the class of "user wants to work with a subset of files, why are all
> these unrelated files being munged/updated/computed/shown/etc."
> unexpected behavior.

My intention is to ensure that (at this moment) choosing to use
the on-disk sparse-index format does not alter Git's end-to-end
behavior.

I want to avoid as much as possible a state where enabling the
sparse-index can start changing how Git commands behave, perhaps
in destructive ways.

By adding these checks, we ensure the in-memory data structure
matches whatever a full index would have created, and then the
behavior matches what Git would do there. It might not be the
"correct" behavior, but it is _consistent_.

> I'm worrying that expectations are being set up such that working with
> just a small section of the code will be unusably hard.  There may be
> several commands/flags where it could make sense to operate on either
> (a) all files in the repo or (b) just on files within your sparse
> paths.  If, though, folks interpret operate-on-all-files as the
> "normal" mode (and history suggests they will), then people start
> adding all kinds of --no-do-this-sparsely flags to each command, and
> then users who want sparse operation have to remember to type such a
> flag with each and every command they ever run -- despite having taken
> at least three steps already to get a sparse-index.
> 
> I believe the extended discussions (for _months_!) on just grep & rm,
> plus watching a --sparse patch being floated just in the last day for
> ls-files suggest to me that this is a _very_ likely outcome and I'm
> worried about it.

It's these behavior changes that I would like to delay as much as
possible and focus on the format and making commands fast that don't
need a change in behavior.

(Yes, there will be exceptions, like when "git add" specifically
adds a file that is in a directory that should be out of the cone,
but the user added it anyway. Atypical behavior like that can be
slow for now.)

>> Since tests take time to write and review, I was hoping that these
>> insertions were minimal enough to get us to a safe place where we
>> can remove the guards carefully.
>>
>> So with that in mind...
>>
>>> Might be nice to add a
>>> /* TODO: audit if this is needed; if it is, we may have other bugs... */
>>> or something like that.  But then again, perhaps you're considering
>>> all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
>>> If so, and we determine we really do need one and want to keep it
>>> indefinitely, will we mark those with a comment about why it's
>>> considered correct?
>>>
>>> I just want a way to know what still needs to be audited and what
>>> doesn't without doing a lot of history spelunking...
>>
>> ...every insertion "needs to be audited" in the future. That's a
>> big part of the next "phases" in the implementation plan.
>>
>> As you suggest, it might be a good idea to add a comment to every
>> insertion, to mark it as un-audited, such as:
>>
>>         /* TODO: test if ensure_full_index() is necessary */
>>
>> We can come back later to delete the comment if it truly is
>> necessary (and add tests to guarantee correct behavior). We can
>> also remove the comment _and_ the call by modifying the loop
>> behavior to do the right thing in some cases.
> 
> If it's "needs to be audited for both performance reasons (can we
> operate on fewer entries as an invisible doesn't-change-results
> optimization) and correctness reasons (should we operate on fewer
> entries and given a modified result within a sparse-index because
> users would expect that, but maybe provide a special flag for the
> users who want to operate on all files in the repo)" and there's also
> an agreement that either audited or unaudited ones will be marked (or
> both), then great, I'm happy.  If not, can we discuss which part of my
> performance/correctness/marking we aren't in agreement on?

I will mark all of the ones I'm inserting. My hope is to eventually
remove it entirely except for when disabling the sparse-index. That
is likely too far out to really hope for, but it is the direction I
am trying to go.

As I indicate that we should carefully test each of these instances
where ensure_full_index() _might_ be necessary before removing them,
it is even more important to test the scenarios where the behavior
changes from a full index with sparse-checkout. Preferably, we just
change the behavior under sparse-checkout and then the sparse-index
can match that (see "test_sparse_match" in t1092).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/27] checkout-index: ensure full index
  2021-03-17 21:33         ` Derrick Stolee
@ 2021-03-17 22:36           ` Elijah Newren
  2021-03-18  1:17             ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-17 22:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On Wed, Mar 17, 2021 at 2:33 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/17/2021 5:10 PM, Elijah Newren wrote:
> > On Wed, Mar 17, 2021 at 1:05 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 3/17/2021 1:50 PM, Elijah Newren wrote:
> >>> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> >>> <gitgitgadget@gmail.com> wrote:
> >>> With the caveat in the commit message, this change looks okay, but
> >>> checkout-index may be buggy regardless of the presence of
> >>> ensure_full_index().  If ensure_full_index() really is needed here
> >>> because it needs to operate on all SKIP_WORKTREE paths and not just
> >>> leading directories, that's because it's writing all those
> >>> SKIP_WORKTREE entries to the working tree.  When it writes them to the
> >>> working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
> >>> a bit of a pickle...
> >>
> >> Perhaps I'm unclear in my intentions with this series: _every_
> >> insertion of ensure_full_index() is intended to be audited with
> >> tests in the future. Some might need behavior change, and others
> >> will not. In this series, I'm just putting in the protections so
> >> we don't accidentally trigger unexpected behavior.
> >
> > I think this may be part of my qualms -- what do you mean by not
> > accidentally triggering unexpected behavior?  In particular, does your
> > statement imply that whatever behavior you get after putting in
> > ensure_full_index() is "expected"?  I think I'm reading that
> > implication into it, and objecting that the behavior with the
> > ensure_full_index() still isn't expected.  You've only removed a
> > certain class of unexpected behavior, namely code that wasn't written
> > to expect tree entries that suddenly gets them.  You haven't handled
> > the class of "user wants to work with a subset of files, why are all
> > these unrelated files being munged/updated/computed/shown/etc."
> > unexpected behavior.
>
> My intention is to ensure that (at this moment) choosing to use
> the on-disk sparse-index format does not alter Git's end-to-end
> behavior.
>
> I want to avoid as much as possible a state where enabling the
> sparse-index can start changing how Git commands behave, perhaps
> in destructive ways.
>
> By adding these checks, we ensure the in-memory data structure
> matches whatever a full index would have created, and then the
> behavior matches what Git would do there. It might not be the
> "correct" behavior, but it is _consistent_.

That sounds good.  Could this be included in
Documentation/technical/sparse-index.txt?  There's only an oblique
reference to it when talking about grep and rm, which incidentally
were already updated by Matheus.  Perhaps also a reference to the
warning in Documentation/git-sparse-checkout.txt would be worthwhile.

> > I'm worrying that expectations are being set up such that working with
> > just a small section of the code will be unusably hard.  There may be
> > several commands/flags where it could make sense to operate on either
> > (a) all files in the repo or (b) just on files within your sparse
> > paths.  If, though, folks interpret operate-on-all-files as the
> > "normal" mode (and history suggests they will), then people start
> > adding all kinds of --no-do-this-sparsely flags to each command, and
> > then users who want sparse operation have to remember to type such a
> > flag with each and every command they ever run -- despite having taken
> > at least three steps already to get a sparse-index.
> >
> > I believe the extended discussions (for _months_!) on just grep & rm,
> > plus watching a --sparse patch being floated just in the last day for
> > ls-files suggest to me that this is a _very_ likely outcome and I'm
> > worried about it.
>
> It's these behavior changes that I would like to delay as much as
> possible and focus on the format and making commands fast that don't
> need a change in behavior.

Delaying sounds great, just so long as that delay doesn't also cement
the behavior and confuse consistency for correctness.

I still think we don't have correct behavior for sparse-checkouts in
many cases (I mean, when "git reset --hard" throws errors about not
removing files and then removes them, we've obviously got some
problems), but we had a decade long cementing of sorts with
SKIP_WORKTREE and now a year-or-two long cementing since
sparse-checkout was introduced and we never went through and cleaned
up the commands. We should at some point, especially since we put the
huge scary warning in Documenation/git-sparse-checkout.txt expressly
for this purpose.

(I would have started this sooner, but trying to feed merge-ort and
keep up with patch review already keeps me at less time on non-git
projects than I think is expected for me.  Once merge-ort is done...)

> (Yes, there will be exceptions, like when "git add" specifically
> adds a file that is in a directory that should be out of the cone,
> but the user added it anyway. Atypical behavior like that can be
> slow for now.)

I agree that there will be exceptions where we can't make the behavior
be fast, but I disagree with that specific example.  "git add" should
just give a warning message and not add any file outside the cone, for
both sparse-index and sparse-checkout.  That can be done quickly.

I know you want to delay the discussion of behavior fixes for specific
commands somewhat, but this particular discussion has already been
ongoing for about 4-12 months now (started at [1] and [2]) and it's
reached a point where I've put my Reviewed-by on it; see [3] for the
git-add piece specifically.

[1] https://lore.kernel.org/git/9f2135f90ffea7f4ccb226f506bf554deab324cc.1605205427.git.matheus.bernardino@usp.br/
[2] https://lore.kernel.org/git/0b9b4c4b414a571877163667694afa3053bf8890.1585027716.git.matheus.bernardino@usp.br/
[3] https://lore.kernel.org/git/66d5c71182274c78e1fcfe84e77deb17e4f0d7e6.1615588109.git.matheus.bernardino@usp.br/

> >> Since tests take time to write and review, I was hoping that these
> >> insertions were minimal enough to get us to a safe place where we
> >> can remove the guards carefully.
> >>
> >> So with that in mind...
> >>
> >>> Might be nice to add a
> >>> /* TODO: audit if this is needed; if it is, we may have other bugs... */
> >>> or something like that.  But then again, perhaps you're considering
> >>> all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
> >>> If so, and we determine we really do need one and want to keep it
> >>> indefinitely, will we mark those with a comment about why it's
> >>> considered correct?
> >>>
> >>> I just want a way to know what still needs to be audited and what
> >>> doesn't without doing a lot of history spelunking...
> >>
> >> ...every insertion "needs to be audited" in the future. That's a
> >> big part of the next "phases" in the implementation plan.
> >>
> >> As you suggest, it might be a good idea to add a comment to every
> >> insertion, to mark it as un-audited, such as:
> >>
> >>         /* TODO: test if ensure_full_index() is necessary */
> >>
> >> We can come back later to delete the comment if it truly is
> >> necessary (and add tests to guarantee correct behavior). We can
> >> also remove the comment _and_ the call by modifying the loop
> >> behavior to do the right thing in some cases.
> >
> > If it's "needs to be audited for both performance reasons (can we
> > operate on fewer entries as an invisible doesn't-change-results
> > optimization) and correctness reasons (should we operate on fewer
> > entries and given a modified result within a sparse-index because
> > users would expect that, but maybe provide a special flag for the
> > users who want to operate on all files in the repo)" and there's also
> > an agreement that either audited or unaudited ones will be marked (or
> > both), then great, I'm happy.  If not, can we discuss which part of my
> > performance/correctness/marking we aren't in agreement on?
>
> I will mark all of the ones I'm inserting. My hope is to eventually
> remove it entirely except for when disabling the sparse-index. That
> is likely too far out to really hope for, but it is the direction I
> am trying to go.
>
> As I indicate that we should carefully test each of these instances
> where ensure_full_index() _might_ be necessary before removing them,
> it is even more important to test the scenarios where the behavior
> changes from a full index with sparse-checkout. Preferably, we just
> change the behavior under sparse-checkout and then the sparse-index
> can match that (see "test_sparse_match" in t1092).

Makes sense.  I agree that it'd be nice to have the two generally
match, though I think we should be open to there being special cases
that differ.  The only one I can think of right now is `git ls-files`
(list the entries in the index) -- since there are tree entries in a
sparse-index, ls-files would naturally show the tree entries.
However, we can discuss that and any other cases -- if there are any
-- later.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/27] checkout-index: ensure full index
  2021-03-17 22:36           ` Elijah Newren
@ 2021-03-18  1:17             ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-18  1:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/17/2021 6:36 PM, Elijah Newren wrote:
> On Wed, Mar 17, 2021 at 2:33 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 3/17/2021 5:10 PM, Elijah Newren wrote:
>>> On Wed, Mar 17, 2021 at 1:05 PM Derrick Stolee <stolee@gmail.com> wrote:
>>>>
>>>> On 3/17/2021 1:50 PM, Elijah Newren wrote:
>>>>> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
>>>>> <gitgitgadget@gmail.com> wrote:
>>>>> With the caveat in the commit message, this change looks okay, but
>>>>> checkout-index may be buggy regardless of the presence of
>>>>> ensure_full_index().  If ensure_full_index() really is needed here
>>>>> because it needs to operate on all SKIP_WORKTREE paths and not just
>>>>> leading directories, that's because it's writing all those
>>>>> SKIP_WORKTREE entries to the working tree.  When it writes them to the
>>>>> working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
>>>>> a bit of a pickle...
>>>>
>>>> Perhaps I'm unclear in my intentions with this series: _every_
>>>> insertion of ensure_full_index() is intended to be audited with
>>>> tests in the future. Some might need behavior change, and others
>>>> will not. In this series, I'm just putting in the protections so
>>>> we don't accidentally trigger unexpected behavior.
>>>
>>> I think this may be part of my qualms -- what do you mean by not
>>> accidentally triggering unexpected behavior?  In particular, does your
>>> statement imply that whatever behavior you get after putting in
>>> ensure_full_index() is "expected"?  I think I'm reading that
>>> implication into it, and objecting that the behavior with the
>>> ensure_full_index() still isn't expected.  You've only removed a
>>> certain class of unexpected behavior, namely code that wasn't written
>>> to expect tree entries that suddenly gets them.  You haven't handled
>>> the class of "user wants to work with a subset of files, why are all
>>> these unrelated files being munged/updated/computed/shown/etc."
>>> unexpected behavior.
>>
>> My intention is to ensure that (at this moment) choosing to use
>> the on-disk sparse-index format does not alter Git's end-to-end
>> behavior.
>>
>> I want to avoid as much as possible a state where enabling the
>> sparse-index can start changing how Git commands behave, perhaps
>> in destructive ways.
>>
>> By adding these checks, we ensure the in-memory data structure
>> matches whatever a full index would have created, and then the
>> behavior matches what Git would do there. It might not be the
>> "correct" behavior, but it is _consistent_.
> 
> That sounds good.  Could this be included in
> Documentation/technical/sparse-index.txt?  There's only an oblique
> reference to it when talking about grep and rm, which incidentally
> were already updated by Matheus.  Perhaps also a reference to the
> warning in Documentation/git-sparse-checkout.txt would be worthwhile.

As I was thinking about how to make this more clear, I realized
that an update to that doc early in this series would be wise.
Thanks.

>>> I'm worrying that expectations are being set up such that working with
>>> just a small section of the code will be unusably hard.  There may be
>>> several commands/flags where it could make sense to operate on either
>>> (a) all files in the repo or (b) just on files within your sparse
>>> paths.  If, though, folks interpret operate-on-all-files as the
>>> "normal" mode (and history suggests they will), then people start
>>> adding all kinds of --no-do-this-sparsely flags to each command, and
>>> then users who want sparse operation have to remember to type such a
>>> flag with each and every command they ever run -- despite having taken
>>> at least three steps already to get a sparse-index.
>>>
>>> I believe the extended discussions (for _months_!) on just grep & rm,
>>> plus watching a --sparse patch being floated just in the last day for
>>> ls-files suggest to me that this is a _very_ likely outcome and I'm
>>> worried about it.
>>
>> It's these behavior changes that I would like to delay as much as
>> possible and focus on the format and making commands fast that don't
>> need a change in behavior.
> 
> Delaying sounds great, just so long as that delay doesn't also cement
> the behavior and confuse consistency for correctness.
> 
> I still think we don't have correct behavior for sparse-checkouts in
> many cases (I mean, when "git reset --hard" throws errors about not
> removing files and then removes them, we've obviously got some
> problems), but we had a decade long cementing of sorts with
> SKIP_WORKTREE and now a year-or-two long cementing since
> sparse-checkout was introduced and we never went through and cleaned
> up the commands. We should at some point, especially since we put the
> huge scary warning in Documenation/git-sparse-checkout.txt expressly
> for this purpose.
> 
> (I would have started this sooner, but trying to feed merge-ort and
> keep up with patch review already keeps me at less time on non-git
> projects than I think is expected for me.  Once merge-ort is done...)
> 
>> (Yes, there will be exceptions, like when "git add" specifically
>> adds a file that is in a directory that should be out of the cone,
>> but the user added it anyway. Atypical behavior like that can be
>> slow for now.)
> 
> I agree that there will be exceptions where we can't make the behavior
> be fast, but I disagree with that specific example.  "git add" should
> just give a warning message and not add any file outside the cone, for
> both sparse-index and sparse-checkout.  That can be done quickly.

That would be great! I would love behavior changes that make the
performance work I'm interested in easier.

> I know you want to delay the discussion of behavior fixes for specific
> commands somewhat, but this particular discussion has already been
> ongoing for about 4-12 months now (started at [1] and [2]) and it's
> reached a point where I've put my Reviewed-by on it; see [3] for the
> git-add piece specifically.
> 
> [1] https://lore.kernel.org/git/9f2135f90ffea7f4ccb226f506bf554deab324cc.1605205427.git.matheus.bernardino@usp.br/
> [2] https://lore.kernel.org/git/0b9b4c4b414a571877163667694afa3053bf8890.1585027716.git.matheus.bernardino@usp.br/
> [3] https://lore.kernel.org/git/66d5c71182274c78e1fcfe84e77deb17e4f0d7e6.1615588109.git.matheus.bernardino@usp.br/

I'm in full support of these ideas to change the behavior, when
possible. I would love to see that those changes make it really
easy to integrate the sparse-index into the commands.

>>>> Since tests take time to write and review, I was hoping that these
>>>> insertions were minimal enough to get us to a safe place where we
>>>> can remove the guards carefully.
>>>>
>>>> So with that in mind...
>>>>
>>>>> Might be nice to add a
>>>>> /* TODO: audit if this is needed; if it is, we may have other bugs... */
>>>>> or something like that.  But then again, perhaps you're considering
>>>>> all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
>>>>> If so, and we determine we really do need one and want to keep it
>>>>> indefinitely, will we mark those with a comment about why it's
>>>>> considered correct?
>>>>>
>>>>> I just want a way to know what still needs to be audited and what
>>>>> doesn't without doing a lot of history spelunking...
>>>>
>>>> ...every insertion "needs to be audited" in the future. That's a
>>>> big part of the next "phases" in the implementation plan.
>>>>
>>>> As you suggest, it might be a good idea to add a comment to every
>>>> insertion, to mark it as un-audited, such as:
>>>>
>>>>         /* TODO: test if ensure_full_index() is necessary */
>>>>
>>>> We can come back later to delete the comment if it truly is
>>>> necessary (and add tests to guarantee correct behavior). We can
>>>> also remove the comment _and_ the call by modifying the loop
>>>> behavior to do the right thing in some cases.
>>>
>>> If it's "needs to be audited for both performance reasons (can we
>>> operate on fewer entries as an invisible doesn't-change-results
>>> optimization) and correctness reasons (should we operate on fewer
>>> entries and given a modified result within a sparse-index because
>>> users would expect that, but maybe provide a special flag for the
>>> users who want to operate on all files in the repo)" and there's also
>>> an agreement that either audited or unaudited ones will be marked (or
>>> both), then great, I'm happy.  If not, can we discuss which part of my
>>> performance/correctness/marking we aren't in agreement on?
>>
>> I will mark all of the ones I'm inserting. My hope is to eventually
>> remove it entirely except for when disabling the sparse-index. That
>> is likely too far out to really hope for, but it is the direction I
>> am trying to go.
>>
>> As I indicate that we should carefully test each of these instances
>> where ensure_full_index() _might_ be necessary before removing them,
>> it is even more important to test the scenarios where the behavior
>> changes from a full index with sparse-checkout. Preferably, we just
>> change the behavior under sparse-checkout and then the sparse-index
>> can match that (see "test_sparse_match" in t1092).
> 
> Makes sense.  I agree that it'd be nice to have the two generally
> match, though I think we should be open to there being special cases
> that differ.  The only one I can think of right now is `git ls-files`
> (list the entries in the index) -- since there are tree entries in a
> sparse-index, ls-files would naturally show the tree entries.
> However, we can discuss that and any other cases -- if there are any
> -- later.

Here, I at least want to make it very clear that the change is
happening and include updates to the documentation. This is a
case where it is unclear what the "promise" is to _external_
tools that depend on ls-files (somehow). What is their intended
use of ls-files, and how do those expectations change when in a
sparse-checkout?

It's questions like this where I at least want to be making a
clear change that motivates why a change in behavior is merited,
then test that behavior to demonstrate that we expect that to be
the same in perpetuity.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/27] sparse-checkout: ensure full index
  2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
@ 2021-03-18  5:22   ` Elijah Newren
  2021-03-23 13:13     ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-18  5:22 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When adjusting the sparse-checkout definition, ensure that a sparse
> index is expanded to a full one to avoid unexpected behavior. This is a
> top candidate for later integration with the sparse-index, but requires
> careful testing.


I was going to comment on most of these collectively, but your commit
message is completely divorced from the patch; it appears you either
place the wrong commit message here or copied it from the one for a
sparse-checkout.  The code modifies stash, though.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/stash.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index ba774cce674f..b3d4fe814b6b 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -1350,6 +1350,7 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
>                 int i;
>                 char *ps_matched = xcalloc(ps->nr, 1);
>
> +               ensure_full_index(&the_index);
>                 for (i = 0; i < active_nr; i++)
>                         ce_path_match(&the_index, active_cache[i], ps,
>                                       ps_matched);
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 17/27] diff-lib: ensure full index
  2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
@ 2021-03-18  5:24   ` Elijah Newren
  2021-03-23 13:15     ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-18  5:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before iterating over all cache entries, ensure that a sparse index is
> expanded to a full index to avoid unexpected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index b73cc1859a49..41d6fcec1a81 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -102,6 +102,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>
>         if (diff_unmerged_stage < 0)
>                 diff_unmerged_stage = 2;
> +
>         entries = istate->cache_nr;
>         for (i = 0; i < entries; i++) {
>                 unsigned int oldmode, newmode;

I don't think adding a blank newline will ensure the index is expanded.  ;-)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 20/27] merge-ort: ensure full index
  2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
@ 2021-03-18  5:31   ` Elijah Newren
  2021-03-23 13:26     ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-03-18  5:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before iterating over all cache entries, ensure that a sparse index is
> expanded to a full one to avoid unexpected behavior.
>
> This is a top candidate for updating later with a proper integration
> with the sparse index, since we will likely enable the ORT strategy by
> default when the sparse index is enabled. This appears to be the only

s/appears to be/is/   :-)

> place where the ORT strategy interacts with the index in such a global
> way, so that integration should be clear once the ORT strategy and the
> sparse index topics stabilize.

Right, there is one more patch that will touch this function -- patch
7 from the series that marks merge-ort stable over here:
https://lore.kernel.org/git/pull.905.v2.git.1616016485.gitgitgadget@gmail.com/

While I have more optimizations for merge-ort, none of them will touch
this function.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  merge-ort.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 603d30c52170..9f737212555d 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3112,6 +3112,7 @@ static int record_conflicted_index_entries(struct merge_options *opt,
>         if (strmap_empty(conflicted))
>                 return 0;
>
> +       ensure_full_index(index);
>         original_cache_nr = index->cache_nr;
>
>         /* Put every entry from paths into plist, then sort */
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 00/27] Sparse Index: API protections
  2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
@ 2021-03-18  6:32   ` Elijah Newren
  0 siblings, 0 replies; 111+ messages in thread
From: Elijah Newren @ 2021-03-18  6:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
	Matheus Tavares Bernardino

On Wed, Mar 17, 2021 at 11:03 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > The point of this series is to insert protections for the consumers of the
> > in-memory index. We mark certain regions of code as needing a full index, so
> > we call ensure_full_index() to expand a sparse index to a full one, if
> > necessary. These protections are inserted file-by-file in every loop over
> > all cache entries. Well, "most" loops, because some are going to be handled
> > in the very next series so I leave them out.
...
> > However, after this series is complete, we now have a straight-forward plan
> > for making commands "sparse aware" one-by-one:
> >
> >  1. Disable settings.command_requires_full_index to allow an in-memory
> >     sparse-index.
> >  2. Run versions of that command under a debugger, breaking on
> >     ensure_full_index().
> >  3. Examine the call stack to determine the context of that expansion, then
> >     implement the proper behavior in those locations.
> >  4. Add tests to ensure we are checking this logic in the presence of sparse
> >     directory entries.
>
...
> If I'm confused about the goals and the plans, then my reviews will
> probably be less than helpful, so I'll suspend reading the series
> until I understand the plan a little better.

Thanks for patiently responding to all my other queries.  I read
through more of the series today.  I inserted comments on a couple
specific patches, but most of patches 6-25 are all along the same
theme.

I have some overall comments on those patches in 6-25 (none of which
need to be addressed in this series, but are meant as hopefully
helpful guides about future work):

add/rm: both of these were just above loops that had a
skip-the-SKIP_WORKTREE entries, at least once Matheus' series is
merged (https://lore.kernel.org/git/cover.1615588108.git.matheus.bernardino@usp.br/).
I think the upshot is just that these become easier to convert.

checkout/commit -- I think these follow the add/rm model and likewise
become easy to convert.

ls-files - I commented elsewhere in this thread about how I think it'd
make sense to have it list the entries in the index, as always.
Obviously, that'd give different output than a full index, because
there are different entries present in the index when using a
sparse-index vs. a full one.

However, all of these, as you say, can wait until later.


I also noted that patches 26 and 27 were based on ones from the RFC
series that I reviewed before, and I noticed you fixed an issue or two
I flagged there, but you also made some other changes and it's too
late at night for my brain to continue thinking and try to compare
them; I may try again tomorrow.

Other than patch 26 which I'm too tired to think through right now,
and the patches I specifically commented on, the rest of the series
looks good to me.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/27] *: remove 'const' qualifier for struct index_state
  2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
@ 2021-03-19 21:01   ` Junio C Hamano
  2021-03-20  1:45     ` Derrick Stolee
  2021-03-20  1:52     ` Junio C Hamano
  0 siblings, 2 replies; 111+ messages in thread
From: Junio C Hamano @ 2021-03-19 21:01 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Several methods specify that they take a 'struct index_state' pointer
> with the 'const' qualifier because they intend to only query the data,
> not change it. However, we will be introducing a step very low in the
> method stack that might modify a sparse-index to become a full index in
> the case that our queries venture inside a sparse-directory entry.
>
> This change only removes the 'const' qualifiers that are necessary for
> the following change which will actually modify the implementation of
> index_name_stage_pos().

This step has a bit of interaction with Matheus's "git add/rm" work
in sparse checkout (mt/add-rm-in-sparse-checkout), which I believe
is still in a bit of flux.  I didn't check potential conflicts the
remainder of the series may have with other in-flight topics.

So, I may throw review comments at the patches in this topic as if
they are standalone, but please do not be upset if it didn't appear
in the 'seen' topic.

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/27] *: remove 'const' qualifier for struct index_state
  2021-03-19 21:01   ` Junio C Hamano
@ 2021-03-20  1:45     ` Derrick Stolee
  2021-03-20  1:52     ` Junio C Hamano
  1 sibling, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-20  1:45 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee

On 3/19/2021 5:01 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Several methods specify that they take a 'struct index_state' pointer
>> with the 'const' qualifier because they intend to only query the data,
>> not change it. However, we will be introducing a step very low in the
>> method stack that might modify a sparse-index to become a full index in
>> the case that our queries venture inside a sparse-directory entry.
>>
>> This change only removes the 'const' qualifiers that are necessary for
>> the following change which will actually modify the implementation of
>> index_name_stage_pos().
> 
> This step has a bit of interaction with Matheus's "git add/rm" work
> in sparse checkout (mt/add-rm-in-sparse-checkout), which I believe
> is still in a bit of flux.  I didn't check potential conflicts the
> remainder of the series may have with other in-flight topics.
> 
> So, I may throw review comments at the patches in this topic as if
> they are standalone, but please do not be upset if it didn't appear
> in the 'seen' topic.

Thanks for the point. I need to reset my expectations about how
much I am stomping on other topics in flight. I rarely work so
widely across the source tree.

I could split this patch out on its own so it can fit into a
narrow window between other topics that might collide with it,
then rebase the rest of the series on top after things settle.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/27] *: remove 'const' qualifier for struct index_state
  2021-03-19 21:01   ` Junio C Hamano
  2021-03-20  1:45     ` Derrick Stolee
@ 2021-03-20  1:52     ` Junio C Hamano
  2021-03-30 16:53       ` Derrick Stolee
  1 sibling, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2021-03-20  1:52 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Several methods specify that they take a 'struct index_state' pointer
>> with the 'const' qualifier because they intend to only query the data,
>> not change it. However, we will be introducing a step very low in the
>> method stack that might modify a sparse-index to become a full index in
>> the case that our queries venture inside a sparse-directory entry.
>>
>> This change only removes the 'const' qualifiers that are necessary for
>> the following change which will actually modify the implementation of
>> index_name_stage_pos().
>
> This step has a bit of interaction with Matheus's "git add/rm" work
> in sparse checkout (mt/add-rm-in-sparse-checkout), which I believe
> is still in a bit of flux.  I didn't check potential conflicts the
> remainder of the series may have with other in-flight topics.
>
> So, I may throw review comments at the patches in this topic as if
> they are standalone, but please do not be upset if it didn't appear
> in the 'seen' topic.

Tonight's pushout will have this topic in 'seen', but I consider the
branch a series of trial merges (there are other first-time topics).
The result seems to compile for me, but other than that, I have not
much confidence in conflict resolution.  Please give them an extra
set of eyeballs.

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 15/27] sparse-checkout: ensure full index
  2021-03-18  5:22   ` Elijah Newren
@ 2021-03-23 13:13     ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-23 13:13 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/18/2021 1:22 AM, Elijah Newren wrote:
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When adjusting the sparse-checkout definition, ensure that a sparse
>> index is expanded to a full one to avoid unexpected behavior. This is a
>> top candidate for later integration with the sparse-index, but requires
>> careful testing.
> 
> 
> I was going to comment on most of these collectively, but your commit
> message is completely divorced from the patch; it appears you either
> place the wrong commit message here or copied it from the one for a
> sparse-checkout.  The code modifies stash, though.

Thanks for pointing this out. Must have been a bad squash during one
of many rebases.

-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 17/27] diff-lib: ensure full index
  2021-03-18  5:24   ` Elijah Newren
@ 2021-03-23 13:15     ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-23 13:15 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/18/2021 1:24 AM, Elijah Newren wrote:
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Before iterating over all cache entries, ensure that a sparse index is
>> expanded to a full index to avoid unexpected behavior.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  diff-lib.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/diff-lib.c b/diff-lib.c
>> index b73cc1859a49..41d6fcec1a81 100644
>> --- a/diff-lib.c
>> +++ b/diff-lib.c
>> @@ -102,6 +102,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>>
>>         if (diff_unmerged_stage < 0)
>>                 diff_unmerged_stage = 2;
>> +
>>         entries = istate->cache_nr;
>>         for (i = 0; i < entries; i++) {
>>                 unsigned int oldmode, newmode;
> 
> I don't think adding a blank newline will ensure the index is expanded.  ;-)

Oops! This is one where I think we determined the loop doesn't
need the guard, but I didn't remove the newline and hence the
patch wasn't dropped.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 20/27] merge-ort: ensure full index
  2021-03-18  5:31   ` Elijah Newren
@ 2021-03-23 13:26     ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-23 13:26 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Derrick Stolee

On 3/18/2021 1:31 AM, Elijah Newren wrote:
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Before iterating over all cache entries, ensure that a sparse index is
>> expanded to a full one to avoid unexpected behavior.
>>
>> This is a top candidate for updating later with a proper integration
>> with the sparse index, since we will likely enable the ORT strategy by
>> default when the sparse index is enabled. This appears to be the only
> 
> s/appears to be/is/   :-)
> 
>> place where the ORT strategy interacts with the index in such a global
>> way, so that integration should be clear once the ORT strategy and the
>> sparse index topics stabilize.
> 
> Right, there is one more patch that will touch this function -- patch
> 7 from the series that marks merge-ort stable over here:
> https://lore.kernel.org/git/pull.905.v2.git.1616016485.gitgitgadget@gmail.com/
> 
> While I have more optimizations for merge-ort, none of them will touch
> this function.

Yes, I noticed that that was one of the conflicts introduced with
this series.

I took a closer look at that patch and method, and it seems like
it only cares about paths with SKIP_WORKTREE _and_ is conflicted.
If it has a conflict, then I believe we won't collapse those
entries to a sparse-directory entry.

I think I will drop this patch for now, and rely on the fact that
I plan to enable merge-ORT directly if sparse-index is enabled.
At that time, I will carefully test this logic.

For now, the 'command_requires_full_index' guard will be
sufficient to preserve behavior.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/27] *: remove 'const' qualifier for struct index_state
  2021-03-20  1:52     ` Junio C Hamano
@ 2021-03-30 16:53       ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-03-30 16:53 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Derrick Stolee

On 3/19/2021 9:52 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> Several methods specify that they take a 'struct index_state' pointer
>>> with the 'const' qualifier because they intend to only query the data,
>>> not change it. However, we will be introducing a step very low in the
>>> method stack that might modify a sparse-index to become a full index in
>>> the case that our queries venture inside a sparse-directory entry.
>>>
>>> This change only removes the 'const' qualifiers that are necessary for
>>> the following change which will actually modify the implementation of
>>> index_name_stage_pos().
>>
>> This step has a bit of interaction with Matheus's "git add/rm" work
>> in sparse checkout (mt/add-rm-in-sparse-checkout), which I believe
>> is still in a bit of flux.  I didn't check potential conflicts the
>> remainder of the series may have with other in-flight topics.
>>
>> So, I may throw review comments at the patches in this topic as if
>> they are standalone, but please do not be upset if it didn't appear
>> in the 'seen' topic.
> 
> Tonight's pushout will have this topic in 'seen', but I consider the
> branch a series of trial merges (there are other first-time topics).
> The result seems to compile for me, but other than that, I have not
> much confidence in conflict resolution.  Please give them an extra
> set of eyeballs.

I looked at your merge, and recreated and equivalent one in my own
merging. Thanks for working through all those details. I trust that
your rerere data will remember those changes and help avoid trouble
when replaying new versions.

Otherwise, I could create a new version of this patch with the
intention of playing directly on top of mt/add-rm-in-sparse-checkout,
then keep future versions of this series on top of a merge of that
series and ds/sparse-index.

I have a v2 of this series ready to send, but I'll give it a day to
see how things shake out on ds/sparse-index and if you have another
suggestion for this conflict.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 00/25] Sparse Index: API protections
  2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
                   ` (27 preceding siblings ...)
  2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
@ 2021-04-01  1:49 ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
                     ` (27 more replies)
  28 siblings, 28 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee

Here is the second patch series submission coming out of the sparse-index
RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/

This is based on v3 of the format series [2].

[2]
https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@gmail.com/

The point of this series is to insert protections for the consumers of the
in-memory index to avoid unintended behavior change when using a sparse
index versus a full one.

We mark certain regions of code as needing a full index, so we call
ensure_full_index() to expand a sparse index to a full one, if necessary.
These protections are inserted file-by-file in every loop over all cache
entries. Well, "most" loops, because some are going to be handled in the
very next series so I leave them out.

Many callers use index_name_pos() to find a path by name. In these cases, we
can check if that position resolves to a sparse directory instance. In those
cases, we just expand to a full index and run the search again.

The last few patches deal with the name-hash hashtable for doing O(1)
lookups.

These protections don't do much right now, since the previous series created
the_repository->settings.command_requires_full_index to guard all index
reads and writes to ensure the in-memory copy is full for commands that have
not been tested with the sparse index yet.

However, after this series is complete, we now have a straight-forward plan
for making commands "sparse aware" one-by-one:

 1. Disable settings.command_requires_full_index to allow an in-memory
    sparse-index.
 2. Run versions of that command under a debugger, breaking on
    ensure_full_index().
 3. Examine the call stack to determine the context of that expansion, then
    implement the proper behavior in those locations.
 4. Add tests to ensure we are checking this logic in the presence of sparse
    directory entries.

I will admit that mostly it is the writing of the test cases that takes the
most time in the conversions I've done so far.


Updates in v2
=============

 * Rebased onto v5 of ds/sparse-index
 * Updated the technical doc to describe how these protections are guards to
   keep behavior consistent between a sparse-index and a full index. Whether
   or not that behavior is "correct" can be interrogated later.
 * Calls to ensure_full_index() are marked with a TODO comment saying these
   calls should be audited later (with tests).
 * Fixed an incorrectly squashed commit message.
 * Dropped the diff-lib.c commit because it was erroneously included in v2.
 * Dropped the merge-ort.c commit because of conflicts with work in flight
   and a quick audit that it is not needed.
 * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
   found it equivalent to what I would have done.

Thanks, -Stolee

Derrick Stolee (25):
  sparse-index: API protection strategy
  *: remove 'const' qualifier for struct index_state
  read-cache: expand on query into sparse-directory entry
  cache: move ensure_full_index() to cache.h
  add: ensure full index
  checkout-index: ensure full index
  checkout: ensure full index
  commit: ensure full index
  difftool: ensure full index
  fsck: ensure full index
  grep: ensure full index
  ls-files: ensure full index
  merge-index: ensure full index
  rm: ensure full index
  stash: ensure full index
  update-index: ensure full index
  dir: ensure full index
  entry: ensure full index
  merge-recursive: ensure full index
  pathspec: ensure full index
  read-cache: ensure full index
  resolve-undo: ensure full index
  revision: ensure full index
  sparse-index: expand_to_path()
  name-hash: use expand_to_path()

 Documentation/technical/sparse-index.txt | 37 +++++++++++-
 attr.c                                   | 14 ++---
 attr.h                                   |  4 +-
 builtin/add.c                            |  2 +
 builtin/checkout-index.c                 |  2 +
 builtin/checkout.c                       |  5 ++
 builtin/commit.c                         |  4 ++
 builtin/difftool.c                       |  3 +
 builtin/fsck.c                           |  2 +
 builtin/grep.c                           |  2 +
 builtin/ls-files.c                       | 14 +++--
 builtin/merge-index.c                    |  5 ++
 builtin/rm.c                             |  2 +
 builtin/stash.c                          |  2 +
 builtin/update-index.c                   |  2 +
 cache.h                                  |  7 ++-
 convert.c                                | 26 ++++-----
 convert.h                                | 20 +++----
 dir.c                                    | 14 +++--
 dir.h                                    |  8 +--
 entry.c                                  |  2 +
 merge-recursive.c                        |  4 +-
 name-hash.c                              | 10 ++++
 pathspec.c                               |  8 ++-
 pathspec.h                               |  6 +-
 read-cache.c                             | 35 ++++++++++--
 resolve-undo.c                           |  4 ++
 revision.c                               |  2 +
 sparse-index.c                           | 72 ++++++++++++++++++++++++
 sparse-index.h                           | 14 ++++-
 submodule.c                              |  6 +-
 submodule.h                              |  6 +-
 32 files changed, 273 insertions(+), 71 deletions(-)


base-commit: 66602733cc95d9a53594520cd8b28d3338e258ea
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/906

Range-diff vs v1:

  3:  bbf19f8a2be5 !  1:  7484e085e342 sparse-index: API protection strategy
     @@ Commit message
          Looking for non-zero stage:
          * builtin/add.c:chmod_pathspec()
          * builtin/merge.c:count_unmerged_entries()
     +    * merge-ort.c:record_conflicted_index_entries()
          * read-cache.c:unmerged_index()
          * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
          * revision.c:prepare_show_merge()
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/technical/sparse-index.txt ##
     -@@ Documentation/technical/sparse-index.txt: also introduce other features that have been considered for improving the
     - index, as well.
     +@@ Documentation/technical/sparse-index.txt: index, as well.
       
       Next, consumers of the index will be guarded against operating on a
     --sparse-index by inserting calls to `ensure_full_index()` or
     + sparse-index by inserting calls to `ensure_full_index()` or
      -`expand_index_to_path()`. After these guards are in place, we can begin
      -leaving sparse-directory entries in the in-memory index structure.
     -+sparse-index by inserting calls to `ensure_full_index()` before iterating
     -+over all cache entries. If a specific path is requested, then those will
     ++`expand_index_to_path()`. If a specific path is requested, then those will
      +be protected from within the `index_file_exists()` and `index_name_pos()`
     -+API calls: they will call `ensure_full_index()` if necessary.
     ++API calls: they will call `ensure_full_index()` if necessary. The
     ++intention here is to preserve existing behavior when interacting with a
     ++sparse-checkout. We don't want a change to happen by accident, without
     ++tests. Many of these locations may not need any change before removing the
     ++guards, but we should not do so without tests to ensure the expected
     ++behavior happens.
     ++
     ++It may be desirable to _change_ the behavior of some commands in the
     ++presence of a sparse index or more generally in any sparse-checkout
     ++scenario. In such cases, these should be carefully communicated and
     ++tested. No such behavior changes are intended during this phase.
      +
      +During a scan of the codebase, not every iteration of the cache entries
      +needs an `ensure_full_index()` check. The basic reasons include:
     @@ Documentation/technical/sparse-index.txt: also introduce other features that hav
      +
      +6. The sparse-index is disabled at this point when using the split-index
      +   feature, so no effort is made to protect the split-index API.
     -+
     -+After these guards are in place, we can begin leaving sparse-directory
     -+entries in the in-memory index structure.
       
       Even after inserting these guards, we will keep expanding sparse-indexes
       for most Git commands using the `command_requires_full_index` repository
  1:  628e16dd3fc7 =  2:  098b2c9ef352 *: remove 'const' qualifier for struct index_state
  2:  8e11e8917019 =  3:  737d27e18d64 read-cache: expand on query into sparse-directory entry
  4:  7e4079c48eb2 =  4:  db5c100f3e2b cache: move ensure_full_index() to cache.h
  5:  142df1ab8e3e !  5:  4a5fc2eb5a9f add: ensure full index
     @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths
       {
       	int i, retval = 0;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		struct cache_entry *ce = active_cache[i];
  6:  bfa0164cc3c1 !  6:  11c38f7277c5 checkout-index: ensure full index
     @@ builtin/checkout-index.c: static void checkout_all(const char *prefix, int prefi
       	int i, errs = 0;
       	struct cache_entry *last_ce = NULL;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr ; i++) {
       		struct cache_entry *ce = active_cache[i];
  7:  1bb7b6117e41 !  7:  fd04adbb3f79 checkout: ensure full index
     @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt
       			       NULL);
       
       	enable_delayed_checkout(&state);
     ++
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (pos = 0; pos < active_nr; pos++) {
       		struct cache_entry *ce = active_cache[pos];
     @@ builtin/checkout.c: static int checkout_paths(const struct checkout_opts *opts,
       	 * Make sure all pathspecs participated in locating the paths
       	 * to be checked out.
       	 */
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (pos = 0; pos < active_nr; pos++)
       		if (opts->overlay_mode)
  8:  a59e45c4ae8f !  8:  65704f39edc9 commit: ensure full index
     @@ builtin/commit.c: static int list_paths(struct string_list *list, const char *wi
       		free(max_prefix);
       	}
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		const struct cache_entry *ce = active_cache[i];
     @@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const cha
       		if (get_oid(parent, &oid)) {
       			int i, ita_nr = 0;
       
     ++			/* TODO: audit for interaction with sparse-index. */
      +			ensure_full_index(&the_index);
       			for (i = 0; i < active_nr; i++)
       				if (ce_intent_to_add(active_cache[i]))
  9:  83040e7558f3 !  9:  739f3fe9edf2 difftool: ensure full index
     @@ builtin/difftool.c: static int run_dir_diff(const char *extcmd, int symlinks, co
       		setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
       	rc = run_command_v_opt(helper_argv, flags);
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&wtindex);
      +
       	/*
 10:  988f7bd2d736 ! 10:  779a86ad1ec4 fsck: ensure full index
     @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
       		verify_index_checksum = 1;
       		verify_ce_order = 1;
       		read_cache();
     ++		/* TODO: audit for interaction with sparse-index. */
      +		ensure_full_index(&the_index);
       		for (i = 0; i < active_nr; i++) {
       			unsigned int mode;
 11:  abe189051a0c ! 11:  8c0d377054fa grep: ensure full index
     @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt,
       	if (repo_read_index(repo) < 0)
       		die(_("index file corrupt"));
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(repo->index);
       	for (nr = 0; nr < repo->index->cache_nr; nr++) {
       		const struct cache_entry *ce = repo->index->cache[nr];
 12:  00c8a7e1d119 ! 12:  beaa1467cabb ls-files: ensure full index
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
       
       	if (!(show_cached || show_stage || show_deleted || show_modified))
       		return;
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(repo->index);
       	for (i = 0; i < repo->index->cache_nr; i++) {
       		const struct cache_entry *ce = repo->index->cache[i];
     @@ builtin/ls-files.c: void overlay_tree_on_index(struct index_state *istate,
       		die("bad tree-ish %s", tree_name);
       
       	/* Hoist the unmerged entries up to stage #3 to make room */
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
 13:  1288f02177d2 ! 13:  73684141fcff merge-index: ensure full index
     @@ builtin/merge-index.c: static void merge_one_path(const char *path)
       static void merge_all(void)
       {
       	int i;
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		const struct cache_entry *ce = active_cache[i];
     @@ builtin/merge-index.c: int cmd_merge_index(int argc, const char **argv, const ch
       
       	read_cache();
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
      +
       	i = 1;
 14:  d6816560b32f ! 14:  6ea81a49c6b5 rm: ensure full index
     @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
       
       	seen = xcalloc(pathspec.nr, 1);
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		const struct cache_entry *ce = active_cache[i];
 15:  92ccd31dd343 ! 15:  49ca5ed05c8d sparse-checkout: ensure full index
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    sparse-checkout: ensure full index
     +    stash: ensure full index
      
     -    When adjusting the sparse-checkout definition, ensure that a sparse
     -    index is expanded to a full one to avoid unexpected behavior. This is a
     -    top candidate for later integration with the sparse-index, but requires
     -    careful testing.
     +    Before iterating over all cache entries, ensure that a sparse index is
     +    expanded to a full index to avoid unexpected behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ builtin/stash.c: static int do_push_stash(const struct pathspec *ps, const char
       		int i;
       		char *ps_matched = xcalloc(ps->nr, 1);
       
     ++		/* TODO: audit for interaction with sparse-index. */
      +		ensure_full_index(&the_index);
       		for (i = 0; i < active_nr; i++)
       			ce_path_match(&the_index, active_cache[i], ps,
 16:  796449758a08 ! 16:  9c4bb187c15d update-index: ensure full index
     @@ builtin/update-index.c: static int do_reupdate(int ac, const char **av,
       		 */
       		has_head = 0;
        redo:
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (pos = 0; pos < active_nr; pos++) {
       		const struct cache_entry *ce = active_cache[pos];
 17:  45bbed6150a2 <  -:  ------------ diff-lib: ensure full index
 18:  8ce1f80985a4 ! 17:  fae4c078c3ef dir: ensure full index
     @@ dir.c: static void connect_wt_gitdir_in_nested(const char *sub_worktree,
       	if (repo_read_index(&subrepo) < 0)
       		die(_("index file corrupt in repo %s"), subrepo.gitdir);
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(subrepo.index);
       	for (i = 0; i < subrepo.index->cache_nr; i++) {
       		const struct cache_entry *ce = subrepo.index->cache[i];
 19:  6e0b452f44c1 ! 18:  2b9180ee77d3 entry: ensure full index
     @@ entry.c: static void mark_colliding_entries(const struct checkout *state,
       
       	ce->ce_flags |= CE_MATCHED;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(state->istate);
       	for (i = 0; i < state->istate->cache_nr; i++) {
       		struct cache_entry *dup = state->istate->cache[i];
 20:  fb4c7038b746 <  -:  ------------ merge-ort: ensure full index
 21:  57d59825627f ! 19:  1e3f6085a405 merge-recursive: ensure full index
     @@ merge-recursive.c: static struct string_list *get_unmerged(struct index_state *i
       
       	unmerged->strdup_strings = 1;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct string_list_item *item;
 22:  f4c470686b27 ! 20:  e62a597a9725 pathspec: ensure full index
     @@ pathspec.c: void add_pathspec_matches_against_index(const struct pathspec *paths
       			num_unmatched++;
       	if (!num_unmatched)
       		return;
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		const struct cache_entry *ce = istate->cache[i];
 23:  81519782d4b2 ! 21:  ebfffdbdd6ad read-cache: ensure full index
     @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
       	 */
       	preload_index(istate, pathspec, 0);
       	trace2_region_enter("index", "refresh", NULL);
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce, *new_entry;
     @@ read-cache.c: int repo_index_has_changes(struct repository *repo,
       		diff_flush(&opt);
       		return opt.flags.has_changes != 0;
       	} else {
     ++		/* TODO: audit for interaction with sparse-index. */
      +		ensure_full_index(istate);
       		for (i = 0; sb && i < istate->cache_nr; i++) {
       			if (i)
 24:  39e8bea0c1ca ! 22:  495b07a87973 resolve-undo: ensure full index
     @@ resolve-undo.c: void unmerge_marked_index(struct index_state *istate)
       	if (!istate->resolve_undo)
       		return;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		const struct cache_entry *ce = istate->cache[i];
     @@ resolve-undo.c: void unmerge_index(struct index_state *istate, const struct path
       	if (!istate->resolve_undo)
       		return;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		const struct cache_entry *ce = istate->cache[i];
 25:  6abb1813ae10 ! 23:  3144114d1a75 revision: ensure full index
     @@ revision.c: static void do_add_index_objects_to_pending(struct rev_info *revs,
       {
       	int i;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
 26:  1690d33c11e0 = 24:  d52c72b4a7b9 sparse-index: expand_to_path()
 27:  40cfc59f695a = 25:  7e2d3fae9a2a name-hash: use expand_to_path()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 01/25] sparse-index: API protection strategy
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
                     ` (26 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Edit and expand the sparse-index design document with the plan for
guarding index operations with ensure_full_index().

Notably, the plan has changed to not have an expand_to_path() method in
favor of checking for a sparse-directory hit inside of the
index_path_pos() API.

The changes that follow this one will incrementally add
ensure_full_index() guards to iterations over all cache entries. Some
iterations over the cache entries are not protected due to a few
categories listed in the document. Since these are not being modified,
here is a short list of the files and methods that will not receive
these guards:

Looking for non-zero stage:
* builtin/add.c:chmod_pathspec()
* builtin/merge.c:count_unmerged_entries()
* merge-ort.c:record_conflicted_index_entries()
* read-cache.c:unmerged_index()
* rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
* revision.c:prepare_show_merge()
* sequencer.c:append_conflicts_hint()
* wt-status.c:wt_status_collect_changes_initial()

Looking for submodules:
* builtin/submodule--helper.c:module_list_compute()
* submodule.c: several methods
* worktree.c:validate_no_submodules()

Part of the index API:
* name-hash.c: lazy init methods
* preload-index.c:preload_thread(), preload_index()
* read-cache.c: file format methods

Checking for correct order of cache entries:
* read-cache.c:check_ce_order()

Ignores SKIP_WORKTREE entries or already aware:
* unpack-trees.c:mark_new_skip_worktree()
* wt-status.c:wt_status_check_sparse_checkout()

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/sparse-index.txt | 37 ++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt
index 8d3d80804604..3b24c1a219f8 100644
--- a/Documentation/technical/sparse-index.txt
+++ b/Documentation/technical/sparse-index.txt
@@ -85,8 +85,41 @@ index, as well.
 
 Next, consumers of the index will be guarded against operating on a
 sparse-index by inserting calls to `ensure_full_index()` or
-`expand_index_to_path()`. After these guards are in place, we can begin
-leaving sparse-directory entries in the in-memory index structure.
+`expand_index_to_path()`. If a specific path is requested, then those will
+be protected from within the `index_file_exists()` and `index_name_pos()`
+API calls: they will call `ensure_full_index()` if necessary. The
+intention here is to preserve existing behavior when interacting with a
+sparse-checkout. We don't want a change to happen by accident, without
+tests. Many of these locations may not need any change before removing the
+guards, but we should not do so without tests to ensure the expected
+behavior happens.
+
+It may be desirable to _change_ the behavior of some commands in the
+presence of a sparse index or more generally in any sparse-checkout
+scenario. In such cases, these should be carefully communicated and
+tested. No such behavior changes are intended during this phase.
+
+During a scan of the codebase, not every iteration of the cache entries
+needs an `ensure_full_index()` check. The basic reasons include:
+
+1. The loop is scanning for entries with non-zero stage. These entries
+   are not collapsed into a sparse-directory entry.
+
+2. The loop is scanning for submodules. These entries are not collapsed
+   into a sparse-directory entry.
+
+3. The loop is part of the index API, especially around reading or
+   writing the format.
+
+4. The loop is checking for correct order of cache entries and that is
+   correct if and only if the sparse-directory entries are in the correct
+   location.
+
+5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is
+   otherwise already aware of sparse directory entries.
+
+6. The sparse-index is disabled at this point when using the split-index
+   feature, so no effort is made to protect the split-index API.
 
 Even after inserting these guards, we will keep expanding sparse-indexes
 for most Git commands using the `command_requires_full_index` repository
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
                     ` (25 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.

This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 attr.c             | 14 +++++++-------
 attr.h             |  4 ++--
 builtin/ls-files.c | 10 +++++-----
 cache.h            |  6 +++---
 convert.c          | 26 +++++++++++++-------------
 convert.h          | 20 ++++++++++----------
 dir.c              | 12 ++++++------
 dir.h              |  8 ++++----
 merge-recursive.c  |  2 +-
 pathspec.c         |  6 +++---
 pathspec.h         |  6 +++---
 read-cache.c       | 10 +++++-----
 submodule.c        |  6 +++---
 submodule.h        |  6 +++---
 14 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/attr.c b/attr.c
index 4ef85d668b54..8de553293ee4 100644
--- a/attr.c
+++ b/attr.c
@@ -718,7 +718,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 	return res;
 }
 
-static struct attr_stack *read_attr_from_index(const struct index_state *istate,
+static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path,
 					       int macro_ok)
 {
@@ -748,7 +748,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
 	return res;
 }
 
-static struct attr_stack *read_attr(const struct index_state *istate,
+static struct attr_stack *read_attr(struct index_state *istate,
 				    const char *path, int macro_ok)
 {
 	struct attr_stack *res = NULL;
@@ -840,7 +840,7 @@ static void push_stack(struct attr_stack **attr_stack_p,
 	}
 }
 
-static void bootstrap_attr_stack(const struct index_state *istate,
+static void bootstrap_attr_stack(struct index_state *istate,
 				 struct attr_stack **stack)
 {
 	struct attr_stack *e;
@@ -878,7 +878,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
 	push_stack(stack, e, NULL, 0);
 }
 
-static void prepare_attr_stack(const struct index_state *istate,
+static void prepare_attr_stack(struct index_state *istate,
 			       const char *path, int dirlen,
 			       struct attr_stack **stack)
 {
@@ -1078,7 +1078,7 @@ static void determine_macros(struct all_attrs_item *all_attrs,
  * If check->check_nr is non-zero, only attributes in check[] are collected.
  * Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const struct index_state *istate,
+static void collect_some_attrs(struct index_state *istate,
 			       const char *path,
 			       struct attr_check *check)
 {
@@ -1107,7 +1107,7 @@ static void collect_some_attrs(const struct index_state *istate,
 	fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
 }
 
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
 		    const char *path,
 		    struct attr_check *check)
 {
@@ -1124,7 +1124,7 @@ void git_check_attr(const struct index_state *istate,
 	}
 }
 
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
 		   const char *path, struct attr_check *check)
 {
 	int i;
diff --git a/attr.h b/attr.h
index 404548f028a8..3732505edae8 100644
--- a/attr.h
+++ b/attr.h
@@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check);
  */
 const char *git_attr_name(const struct git_attr *);
 
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
 		    const char *path, struct attr_check *check);
 
 /*
  * Retrieve all attributes that apply to the specified path.
  * check holds the attributes and their values.
  */
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
 		   const char *path, struct attr_check *check);
 
 enum git_attr_direction {
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..4f9ed1fb29b7 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -57,7 +57,7 @@ static const char *tag_modified = "";
 static const char *tag_skip_worktree = "";
 static const char *tag_resolve_undo = "";
 
-static void write_eolinfo(const struct index_state *istate,
+static void write_eolinfo(struct index_state *istate,
 			  const struct cache_entry *ce, const char *path)
 {
 	if (show_eol) {
@@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce)
 	}
 }
 
-static void show_dir_entry(const struct index_state *istate,
+static void show_dir_entry(struct index_state *istate,
 			   const char *tag, struct dir_entry *ent)
 {
 	int len = max_prefix_len;
@@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate,
 	write_name(ent->name);
 }
 
-static void show_other_files(const struct index_state *istate,
+static void show_other_files(struct index_state *istate,
 			     const struct dir_struct *dir)
 {
 	int i;
@@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate,
 	}
 }
 
-static void show_killed_files(const struct index_state *istate,
+static void show_killed_files(struct index_state *istate,
 			      const struct dir_struct *dir)
 {
 	int i;
@@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
 	}
 }
 
-static void show_ru_info(const struct index_state *istate)
+static void show_ru_info(struct index_state *istate)
 {
 	struct string_list_item *item;
 
diff --git a/cache.h b/cache.h
index 8aede373aeb3..5006278c13ca 100644
--- a/cache.h
+++ b/cache.h
@@ -800,7 +800,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  * index_name_pos(&index, "f", 1) -> -3
  * index_name_pos(&index, "g", 1) -> -5
  */
-int index_name_pos(const struct index_state *, const char *name, int namelen);
+int index_name_pos(struct index_state *, const char *name, int namelen);
 
 /*
  * Some functions return the negative complement of an insert position when a
@@ -850,8 +850,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags);
 int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
 int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
 void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
-int index_name_is_other(const struct index_state *, const char *, int);
-void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
+int index_name_is_other(struct index_state *, const char *, int);
+void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
 
 /* do stat comparison even if CE_VALID is true */
 #define CE_MATCH_IGNORE_VALID		01
diff --git a/convert.c b/convert.c
index ee360c2f07ce..20b19abffa36 100644
--- a/convert.c
+++ b/convert.c
@@ -138,7 +138,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
 	}
 }
 
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
 					   const char *path)
 {
 	const char *ret;
@@ -222,7 +222,7 @@ static void check_global_conv_flags_eol(const char *path,
 	}
 }
 
-static int has_crlf_in_index(const struct index_state *istate, const char *path)
+static int has_crlf_in_index(struct index_state *istate, const char *path)
 {
 	unsigned long sz;
 	void *data;
@@ -496,7 +496,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
 	return 1;
 }
 
-static int crlf_to_git(const struct index_state *istate,
+static int crlf_to_git(struct index_state *istate,
 		       const char *path, const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, int conv_flags)
@@ -1307,7 +1307,7 @@ struct conv_attrs {
 
 static struct attr_check *check;
 
-static void convert_attrs(const struct index_state *istate,
+static void convert_attrs(struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
 	struct attr_check_item *ccheck = NULL;
@@ -1369,7 +1369,7 @@ void reset_parsed_attributes(void)
 	user_convert_tail = NULL;
 }
 
-int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
+int would_convert_to_git_filter_fd(struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
 
@@ -1388,7 +1388,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char
 	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL);
 }
 
-const char *get_convert_attr_ascii(const struct index_state *istate, const char *path)
+const char *get_convert_attr_ascii(struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
 
@@ -1414,7 +1414,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char
 	return "";
 }
 
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
 		   const char *path, const char *src, size_t len,
 		   struct strbuf *dst, int conv_flags)
 {
@@ -1448,7 +1448,7 @@ int convert_to_git(const struct index_state *istate,
 	return ret | ident_to_git(src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
 			      const char *path, int fd, struct strbuf *dst,
 			      int conv_flags)
 {
@@ -1466,7 +1466,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 	ident_to_git(dst->buf, dst->len, dst, ca.ident);
 }
 
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(struct index_state *istate,
 					    const char *path, const char *src,
 					    size_t len, struct strbuf *dst,
 					    int normalizing,
@@ -1510,7 +1510,7 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
 	return ret | ret_filter;
 }
 
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
 				  const char *path, const char *src,
 				  size_t len, struct strbuf *dst,
 				  const struct checkout_metadata *meta,
@@ -1519,7 +1519,7 @@ int async_convert_to_working_tree(const struct index_state *istate,
 	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
 }
 
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
 			    const char *path, const char *src,
 			    size_t len, struct strbuf *dst,
 			    const struct checkout_metadata *meta)
@@ -1527,7 +1527,7 @@ int convert_to_working_tree(const struct index_state *istate,
 	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
 }
 
-int renormalize_buffer(const struct index_state *istate, const char *path,
+int renormalize_buffer(struct index_state *istate, const char *path,
 		       const char *src, size_t len, struct strbuf *dst)
 {
 	int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
@@ -1964,7 +1964,7 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
  * Note that you would be crazy to set CRLF, smudge/clean or ident to a
  * large binary blob you would want us not to slurp into the memory!
  */
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
 					const char *path,
 					const struct object_id *oid)
 {
diff --git a/convert.h b/convert.h
index e29d1026a686..0f7c8a1f0446 100644
--- a/convert.h
+++ b/convert.h
@@ -65,41 +65,41 @@ struct checkout_metadata {
 
 extern enum eol core_eol;
 extern char *check_roundtrip_encoding;
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
 					   const char *path);
 const char *get_wt_convert_stats_ascii(const char *path);
-const char *get_convert_attr_ascii(const struct index_state *istate,
+const char *get_convert_attr_ascii(struct index_state *istate,
 				   const char *path);
 
 /* returns 1 if *dst was used */
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
 		   const char *path, const char *src, size_t len,
 		   struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
 			    const char *path, const char *src,
 			    size_t len, struct strbuf *dst,
 			    const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
 				  const char *path, const char *src,
 				  size_t len, struct strbuf *dst,
 				  const struct checkout_metadata *meta,
 				  void *dco);
 int async_query_available_blobs(const char *cmd,
 				struct string_list *available_paths);
-int renormalize_buffer(const struct index_state *istate,
+int renormalize_buffer(struct index_state *istate,
 		       const char *path, const char *src, size_t len,
 		       struct strbuf *dst);
-static inline int would_convert_to_git(const struct index_state *istate,
+static inline int would_convert_to_git(struct index_state *istate,
 				       const char *path)
 {
 	return convert_to_git(istate, path, NULL, 0, NULL, 0);
 }
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
 			      const char *path, int fd,
 			      struct strbuf *dst,
 			      int conv_flags);
-int would_convert_to_git_filter_fd(const struct index_state *istate,
+int would_convert_to_git_filter_fd(struct index_state *istate,
 				   const char *path);
 
 /*
@@ -133,7 +133,7 @@ void reset_parsed_attributes(void);
 
 struct stream_filter; /* opaque */
 
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
 					const char *path,
 					const struct object_id *);
 void free_stream_filter(struct stream_filter *);
diff --git a/dir.c b/dir.c
index fd8aa7c40faa..5b00dfb5b144 100644
--- a/dir.c
+++ b/dir.c
@@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
  * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
  * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
-static int match_pathspec_item(const struct index_state *istate,
+static int match_pathspec_item(struct index_state *istate,
 			       const struct pathspec_item *item, int prefix,
 			       const char *name, int namelen, unsigned flags)
 {
@@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate,
  * pathspec did not match any names, which could indicate that the
  * user mistyped the nth pathspec.
  */
-static int do_match_pathspec(const struct index_state *istate,
+static int do_match_pathspec(struct index_state *istate,
 			     const struct pathspec *ps,
 			     const char *name, int namelen,
 			     int prefix, char *seen,
@@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate,
 	return retval;
 }
 
-static int match_pathspec_with_flags(const struct index_state *istate,
+static int match_pathspec_with_flags(struct index_state *istate,
 				     const struct pathspec *ps,
 				     const char *name, int namelen,
 				     int prefix, char *seen, unsigned flags)
@@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate,
 	return negative ? 0 : positive;
 }
 
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
 		   const struct pathspec *ps,
 		   const char *name, int namelen,
 		   int prefix, char *seen, int is_dir)
@@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate,
 /**
  * Check if a submodule is a superset of the pathspec
  */
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
 			 const struct pathspec *ps,
 			 const char *submodule_name,
 			 char *seen)
@@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base,
 	add_pattern_to_hashsets(pl, pattern);
 }
 
-static int read_skip_worktree_file_from_index(const struct index_state *istate,
+static int read_skip_worktree_file_from_index(struct index_state *istate,
 					      const char *path,
 					      size_t *size_out, char **data_out,
 					      struct oid_stat *oid_stat)
diff --git a/dir.h b/dir.h
index facfae47402a..51cb0e217247 100644
--- a/dir.h
+++ b/dir.h
@@ -354,7 +354,7 @@ int count_slashes(const char *s);
 int simple_length(const char *match);
 int no_wildcard(const char *string);
 char *common_prefix(const struct pathspec *pathspec);
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
 		   const struct pathspec *pathspec,
 		   const char *name, int namelen,
 		   int prefix, char *seen, int is_dir);
@@ -492,12 +492,12 @@ int git_fnmatch(const struct pathspec_item *item,
 		const char *pattern, const char *string,
 		int prefix);
 
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
 			 const struct pathspec *ps,
 			 const char *submodule_name,
 			 char *seen);
 
-static inline int ce_path_match(const struct index_state *istate,
+static inline int ce_path_match(struct index_state *istate,
 				const struct cache_entry *ce,
 				const struct pathspec *pathspec,
 				char *seen)
@@ -506,7 +506,7 @@ static inline int ce_path_match(const struct index_state *istate,
 			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
 }
 
-static inline int dir_path_match(const struct index_state *istate,
+static inline int dir_path_match(struct index_state *istate,
 				 const struct dir_entry *ent,
 				 const struct pathspec *pathspec,
 				 int prefix, char *seen)
diff --git a/merge-recursive.c b/merge-recursive.c
index 3d9207455b3a..b8de7a704eae 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2987,7 +2987,7 @@ static int blob_unchanged(struct merge_options *opt,
 	struct strbuf obuf = STRBUF_INIT;
 	struct strbuf abuf = STRBUF_INIT;
 	int ret = 0; /* assume changed for safety */
-	const struct index_state *idx = opt->repo->index;
+	struct index_state *idx = opt->repo->index;
 
 	if (a->mode != o->mode)
 		return 0;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22f2..b6e333965cb4 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -20,7 +20,7 @@
  * to use find_pathspecs_matching_against_index() instead.
  */
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
-					const struct index_state *istate,
+					struct index_state *istate,
 					char *seen)
 {
 	int num_unmatched = 0, i;
@@ -51,7 +51,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
  * given pathspecs achieves against all items in the index.
  */
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate)
+					    struct index_state *istate)
 {
 	char *seen = xcalloc(pathspec->nr, 1);
 	add_pathspec_matches_against_index(pathspec, istate, seen);
@@ -702,7 +702,7 @@ void clear_pathspec(struct pathspec *pathspec)
 	pathspec->nr = 0;
 }
 
-int match_pathspec_attrs(const struct index_state *istate,
+int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item)
 {
diff --git a/pathspec.h b/pathspec.h
index 454ce364fac7..2ccc8080b6cc 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -150,11 +150,11 @@ static inline int ps_strcmp(const struct pathspec_item *item,
 }
 
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
-					const struct index_state *istate,
+					struct index_state *istate,
 					char *seen);
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate);
-int match_pathspec_attrs(const struct index_state *istate,
+					    struct index_state *istate);
+int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
 
diff --git a/read-cache.c b/read-cache.c
index 2410e6e0df13..3ad94578095e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -549,7 +549,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(const struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
 {
 	int first, last;
 
@@ -570,7 +570,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na
 	return -first-1;
 }
 
-int index_name_pos(const struct index_state *istate, const char *name, int namelen)
+int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
 	return index_name_stage_pos(istate, name, namelen, 0);
 }
@@ -3389,8 +3389,8 @@ int repo_read_index_unmerged(struct repository *repo)
  * We helpfully remove a trailing "/" from directories so that
  * the output of read_directory can be used as-is.
  */
-int index_name_is_other(const struct index_state *istate, const char *name,
-		int namelen)
+int index_name_is_other(struct index_state *istate, const char *name,
+			int namelen)
 {
 	int pos;
 	if (namelen && name[namelen - 1] == '/')
@@ -3408,7 +3408,7 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 	return 1;
 }
 
-void *read_blob_data_from_index(const struct index_state *istate,
+void *read_blob_data_from_index(struct index_state *istate,
 				const char *path, unsigned long *size)
 {
 	int pos, len;
diff --git a/submodule.c b/submodule.c
index 9767ba9893cc..83809a4f7bd2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -33,7 +33,7 @@ static struct oid_array ref_tips_after_fetch;
  * will be disabled because we can't guess what might be configured in
  * .gitmodules unless the user resolves the conflict.
  */
-int is_gitmodules_unmerged(const struct index_state *istate)
+int is_gitmodules_unmerged(struct index_state *istate)
 {
 	int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE));
 	if (pos < 0) { /* .gitmodules not found or isn't merged */
@@ -301,7 +301,7 @@ int is_submodule_populated_gently(const char *path, int *return_error_code)
 /*
  * Dies if the provided 'prefix' corresponds to an unpopulated submodule
  */
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
 				  const char *prefix)
 {
 	int i, prefixlen;
@@ -331,7 +331,7 @@ void die_in_unpopulated_submodule(const struct index_state *istate,
 /*
  * Dies if any paths in the provided pathspec descends into a submodule
  */
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
 			       const struct pathspec *ps)
 {
 	int i, j;
diff --git a/submodule.h b/submodule.h
index 4ac6e31cf1f7..84640c49c114 100644
--- a/submodule.h
+++ b/submodule.h
@@ -39,7 +39,7 @@ struct submodule_update_strategy {
 };
 #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
 
-int is_gitmodules_unmerged(const struct index_state *istate);
+int is_gitmodules_unmerged(struct index_state *istate);
 int is_writing_gitmodules_ok(void);
 int is_staging_gitmodules_ok(struct index_state *istate);
 int update_path_in_gitmodules(const char *oldpath, const char *newpath);
@@ -60,9 +60,9 @@ int is_submodule_active(struct repository *repo, const char *path);
  * Otherwise the return error code is the same as of resolve_gitdir_gently.
  */
 int is_submodule_populated_gently(const char *path, int *return_error_code);
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
 				  const char *prefix);
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
 			       const struct pathspec *ps);
 enum submodule_update_type parse_submodule_update_type(const char *value);
 int parse_submodule_update_strategy(const char *value,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
                     ` (24 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.

In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.

Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 3ad94578095e..3698bc7bf77d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -567,6 +567,27 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		}
 		first = next+1;
 	}
+
+	if (istate->sparse_index &&
+	    first > 0) {
+		/* Note: first <= istate->cache_nr */
+		struct cache_entry *ce = istate->cache[first - 1];
+
+		/*
+		 * If we are in a sparse-index _and_ the entry before the
+		 * insertion position is a sparse-directory entry that is
+		 * an ancestor of 'name', then we need to expand the index
+		 * and search again. This will only trigger once, because
+		 * thereafter the index is fully expanded.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) < namelen &&
+		    !strncmp(name, ce->name, ce_namelen(ce))) {
+			ensure_full_index(istate);
+			return index_name_stage_pos(istate, name, namelen, stage);
+		}
+	}
+
 	return -first-1;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 04/25] cache: move ensure_full_index() to cache.h
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
                     ` (23 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Soon we will insert ensure_full_index() calls across the codebase.
Instead of also adding include statements for sparse-index.h, let's just
use the fact that anything that cares about the index already has
cache.h in its includes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h        | 1 +
 sparse-index.h | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 5006278c13ca..b7e20e9778db 100644
--- a/cache.h
+++ b/cache.h
@@ -350,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce);
 void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
 void free_name_hash(struct index_state *istate);
 
+void ensure_full_index(struct index_state *istate);
 
 /* Cache entry creation and cleanup */
 
diff --git a/sparse-index.h b/sparse-index.h
index 39dcc859735e..0268f38753c0 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -2,7 +2,6 @@
 #define SPARSE_INDEX_H__
 
 struct index_state;
-void ensure_full_index(struct index_state *istate);
 int convert_to_sparse(struct index_state *istate);
 
 struct repository;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 05/25] add: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
                     ` (22 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3a2..afccf2fd5543 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -141,6 +141,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 {
 	int i, retval = 0;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 06/25] checkout-index: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
                     ` (21 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before we iterate over all cache entries, ensure that the index is not
sparse. This loop in checkout_all() might be safe to iterate over a
sparse index, but let's put this protection here until it can be
carefully tested.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 023e49e271c2..2c2936a9dae0 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -119,6 +119,8 @@ static void checkout_all(const char *prefix, int prefix_length)
 	int i, errs = 0;
 	struct cache_entry *last_ce = NULL;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr ; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce) != checkout_stage
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 07/25] checkout: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
                     ` (20 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries in the checkout builtin, ensure
that we have a full index to avoid any unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0e6639052001..d0dbe63ea119 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -368,6 +368,9 @@ static int checkout_worktree(const struct checkout_opts *opts,
 			       NULL);
 
 	enable_delayed_checkout(&state);
+
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -512,6 +515,8 @@ static int checkout_paths(const struct checkout_opts *opts,
 	 * Make sure all pathspecs participated in locating the paths
 	 * to be checked out.
 	 */
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++)
 		if (opts->overlay_mode)
 			mark_ce_for_checkout_overlay(active_cache[pos],
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 08/25] commit: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
                     ` (19 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

These two loops iterate over all cache entries, so ensure that a sparse
index is expanded to a full index before we do so.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/commit.c b/builtin/commit.c
index 739110c5a7f6..cf0c36d1dcb2 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -251,6 +251,8 @@ static int list_paths(struct string_list *list, const char *with_tree,
 		free(max_prefix);
 	}
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		struct string_list_item *item;
@@ -931,6 +933,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		if (get_oid(parent, &oid)) {
 			int i, ita_nr = 0;
 
+			/* TODO: audit for interaction with sparse-index. */
+			ensure_full_index(&the_index);
 			for (i = 0; i < active_nr; i++)
 				if (ce_intent_to_add(active_cache[i]))
 					ita_nr++;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 09/25] difftool: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
                     ` (18 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index has
been expanded to a full one to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/difftool.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fddf..32c914dde6a0 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -584,6 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 		setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
 	rc = run_command_v_opt(helper_argv, flags);
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&wtindex);
+
 	/*
 	 * If the diff includes working copy files and those
 	 * files were modified during the diff, then the changes
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 10/25] fsck: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
                     ` (17 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying all blobs reachable from the index, ensure that a sparse
index has been expanded to a full one to avoid missing some blobs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fsck.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c706..4d7f5c63ce0d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -883,6 +883,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		verify_index_checksum = 1;
 		verify_ce_order = 1;
 		read_cache();
+		/* TODO: audit for interaction with sparse-index. */
+		ensure_full_index(&the_index);
 		for (i = 0; i < active_nr; i++) {
 			unsigned int mode;
 			struct blob *blob;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 11/25] grep: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
                     ` (16 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one so we do not miss blobs to scan. Later, this can
integrate more carefully with sparse indexes with proper testing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 4e91a253ac3b..c2d40414e975 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -504,6 +504,8 @@ static int grep_cache(struct grep_opt *opt,
 	if (repo_read_index(repo) < 0)
 		die(_("index file corrupt"));
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(repo->index);
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 12/25] ls-files: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (10 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
                     ` (15 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid missing files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/ls-files.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 4f9ed1fb29b7..a0b4e54d1149 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -317,6 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 
 	if (!(show_cached || show_stage || show_deleted || show_modified))
 		return;
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(repo->index);
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
@@ -494,6 +496,8 @@ void overlay_tree_on_index(struct index_state *istate,
 		die("bad tree-ish %s", tree_name);
 
 	/* Hoist the unmerged entries up to stage #3 to make room */
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		if (!ce_stage(ce))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 13/25] merge-index: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (11 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
                     ` (14 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/merge-index.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 38ea6ad6ca25..c0383fe9df9a 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -58,6 +58,8 @@ static void merge_one_path(const char *path)
 static void merge_all(void)
 {
 	int i;
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
@@ -80,6 +82,9 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
 
 	read_cache();
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
+
 	i = 1;
 	if (!strcmp(argv[i], "-o")) {
 		one_shot = 1;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 14/25] rm: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (12 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
                     ` (13 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/rm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f02..5559a0b453a3 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -293,6 +293,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 15/25] stash: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (13 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
                     ` (12 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/stash.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/stash.c b/builtin/stash.c
index ba774cce674f..6fb7178ef2fa 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1350,6 +1350,8 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		int i;
 		char *ps_matched = xcalloc(ps->nr, 1);
 
+		/* TODO: audit for interaction with sparse-index. */
+		ensure_full_index(&the_index);
 		for (i = 0; i < active_nr; i++)
 			ce_path_match(&the_index, active_cache[i], ps,
 				      ps_matched);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 16/25] update-index: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (14 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
                     ` (11 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/update-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 79087bccea4b..f1f16f2de526 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -745,6 +745,8 @@ static int do_reupdate(int ac, const char **av,
 		 */
 		has_head = 0;
  redo:
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++) {
 		const struct cache_entry *ce = active_cache[pos];
 		struct cache_entry *old = NULL;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 17/25] dir: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (15 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
                     ` (10 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dir.c b/dir.c
index 5b00dfb5b144..166238e79f52 100644
--- a/dir.c
+++ b/dir.c
@@ -3533,6 +3533,8 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
 	if (repo_read_index(&subrepo) < 0)
 		die(_("index file corrupt in repo %s"), subrepo.gitdir);
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(subrepo.index);
 	for (i = 0; i < subrepo.index->cache_nr; i++) {
 		const struct cache_entry *ce = subrepo.index->cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 18/25] entry: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (16 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 entry.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/entry.c b/entry.c
index 7b9f43716f76..891e4ba2b45a 100644
--- a/entry.c
+++ b/entry.c
@@ -412,6 +412,8 @@ static void mark_colliding_entries(const struct checkout *state,
 
 	ce->ce_flags |= CE_MATCHED;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(state->istate);
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 19/25] merge-recursive: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (17 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 merge-recursive.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index b8de7a704eae..91d8597728c1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -522,6 +522,8 @@ static struct string_list *get_unmerged(struct index_state *istate)
 
 	unmerged->strdup_strings = 1;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct string_list_item *item;
 		struct stage_data *e;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 20/25] pathspec: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (18 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/pathspec.c b/pathspec.c
index b6e333965cb4..d67688bab74b 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -36,6 +36,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		ce_path_match(istate, ce, pathspec, seen);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 21/25] read-cache: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (19 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:49   ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 3698bc7bf77d..a9dcf0ab4f78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1577,6 +1577,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -2498,6 +2500,8 @@ int repo_index_has_changes(struct repository *repo,
 		diff_flush(&opt);
 		return opt.flags.has_changes != 0;
 	} else {
+		/* TODO: audit for interaction with sparse-index. */
+		ensure_full_index(istate);
 		for (i = 0; sb && i < istate->cache_nr; i++) {
 			if (i)
 				strbuf_addch(sb, ' ');
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 22/25] resolve-undo: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (20 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:49   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:50   ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:49 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 resolve-undo.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/resolve-undo.c b/resolve-undo.c
index 236320f179cb..ee36ca58b135 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -172,6 +172,8 @@ void unmerge_marked_index(struct index_state *istate)
 	if (!istate->resolve_undo)
 		return;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (ce->ce_flags & CE_MATCHED)
@@ -186,6 +188,8 @@ void unmerge_index(struct index_state *istate, const struct pathspec *pathspec)
 	if (!istate->resolve_undo)
 		return;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (!ce_path_match(istate, ce, pathspec, NULL))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 23/25] revision: ensure full index
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (21 preceding siblings ...)
  2021-04-01  1:49   ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:50   ` Derrick Stolee via GitGitGadget
  2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:50 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all index entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior. This case could
be integrated later by ensuring that we walk the tree in the
sparse-directory entry, but the current behavior is only expecting
blobs. Save this integration for later when it can be properly tested.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/revision.c b/revision.c
index b78733f5089b..b72e0ac1bdca 100644
--- a/revision.c
+++ b/revision.c
@@ -1680,6 +1680,8 @@ static void do_add_index_objects_to_pending(struct rev_info *revs,
 {
 	int i;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct blob *blob;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 24/25] sparse-index: expand_to_path()
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (22 preceding siblings ...)
  2021-04-01  1:50   ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
@ 2021-04-01  1:50   ` Derrick Stolee via GitGitGadget
  2021-04-05 19:32     ` Elijah Newren
  2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  27 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:50 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.

When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.

Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.

The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.

There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.

To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
 sparse-index.h | 13 +++++++++
 2 files changed, 85 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 95ea17174da3..8a1223041296 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate)
 
 	trace2_region_leave("index", "ensure_full_index", istate->repo);
 }
+
+/*
+ * This static global helps avoid infinite recursion between
+ * expand_to_path() and index_file_exists().
+ */
+static int in_expand_to_path = 0;
+
+void expand_to_path(struct index_state *istate,
+		    const char *path, size_t pathlen, int icase)
+{
+	struct strbuf path_mutable = STRBUF_INIT;
+	size_t substr_len;
+
+	/* prevent extra recursion */
+	if (in_expand_to_path)
+		return;
+
+	if (!istate || !istate->sparse_index)
+		return;
+
+	if (!istate->repo)
+		istate->repo = the_repository;
+
+	in_expand_to_path = 1;
+
+	/*
+	 * We only need to actually expand a region if the
+	 * following are both true:
+	 *
+	 * 1. 'path' is not already in the index.
+	 * 2. Some parent directory of 'path' is a sparse directory.
+	 */
+
+	if (index_file_exists(istate, path, pathlen, icase))
+		goto cleanup;
+
+	strbuf_add(&path_mutable, path, pathlen);
+	strbuf_addch(&path_mutable, '/');
+
+	/* Check the name hash for all parent directories */
+	substr_len = 0;
+	while (substr_len < pathlen) {
+		char temp;
+		char *replace = strchr(path_mutable.buf + substr_len, '/');
+
+		if (!replace)
+			break;
+
+		/* replace the character _after_ the slash */
+		replace++;
+		temp = *replace;
+		*replace = '\0';
+		if (index_file_exists(istate, path_mutable.buf,
+				      path_mutable.len, icase)) {
+			/*
+			 * We found a parent directory in the name-hash
+			 * hashtable, which means that this entry could
+			 * exist within a sparse-directory entry. Expand
+			 * accordingly.
+			 */
+			ensure_full_index(istate);
+			break;
+		}
+
+		*replace = temp;
+		substr_len = replace - path_mutable.buf;
+	}
+
+cleanup:
+	strbuf_release(&path_mutable);
+	in_expand_to_path = 0;
+}
diff --git a/sparse-index.h b/sparse-index.h
index 0268f38753c0..1115a0d7dd98 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -4,6 +4,19 @@
 struct index_state;
 int convert_to_sparse(struct index_state *istate);
 
+/*
+ * Some places in the codebase expect to search for a specific path.
+ * This path might be outside of the sparse-checkout definition, in
+ * which case a sparse-index may not contain a path for that index.
+ *
+ * Given an index and a path, check to see if a leading directory for
+ * 'path' exists in the index as a sparse directory. In that case,
+ * expand that sparse directory to a full range of cache entries and
+ * populate the index accordingly.
+ */
+void expand_to_path(struct index_state *istate,
+		    const char *path, size_t pathlen, int icase);
+
 struct repository;
 int set_sparse_index_config(struct repository *repo, int enable);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v2 25/25] name-hash: use expand_to_path()
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (23 preceding siblings ...)
  2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-01  1:50   ` Derrick Stolee via GitGitGadget
  2021-04-05 19:53     ` Elijah Newren
  2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
                     ` (2 subsequent siblings)
  27 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01  1:50 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A sparse-index loads the name-hash data for its entries, including the
sparse-directory entries. If a caller asks for a path that is contained
within a sparse-directory entry, we need to expand to a full index and
recalculate the name hash table before returning the result. Insert
calls to expand_to_path() to protect against this case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 4e03fac9bb12..75c159e06eeb 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "thread-utils.h"
 #include "trace2.h"
+#include "sparse-index.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -109,6 +110,12 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 	if (ce->ce_flags & CE_HASHED)
 		return;
 	ce->ce_flags |= CE_HASHED;
+
+	if (S_ISSPARSEDIR(ce->ce_mode)) {
+		add_dir_entry(istate, ce);
+		return;
+	}
+
 	hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
 	hashmap_add(&istate->name_hash, &ce->ent);
 
@@ -680,6 +687,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	struct dir_entry *dir;
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, 0);
 	dir = find_dir_entry(istate, name, namelen);
 	return dir && dir->nr;
 }
@@ -690,6 +698,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
 	const char *ptr = startPtr;
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, strlen(name), 0);
 	while (*ptr) {
 		while (*ptr && *ptr != '/')
 			ptr++;
@@ -713,6 +722,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
 	unsigned int hash = memihash(name, namelen);
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, icase);
 
 	ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
 					 struct cache_entry, ent);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v2 00/25] Sparse Index: API protections
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (24 preceding siblings ...)
  2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-01  7:07   ` Junio C Hamano
  2021-04-01 13:32     ` Derrick Stolee
  2021-04-05 19:55   ` Elijah Newren
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
  27 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2021-04-01  7:07 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, newren, Derrick Stolee, Matheus Tavares Bernardino, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Here is the second patch series submission coming out of the sparse-index
> RFC [1].

Queued.  There were a few conflicts with new codepaths and I've
touched up a few of them (Alban's merge-strategies-in-c moves some
code that needs to dehydrate the sparse index entries from
builtin/merge-index.c to a new file, Elijah's ort has "const"
pointer to the istate that you need to be writable), but there may
be more that I missed.  Please check the tip of 'seen' when it gets
pushed out.

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v2 00/25] Sparse Index: API protections
  2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
@ 2021-04-01 13:32     ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-04-01 13:32 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, newren, Matheus Tavares Bernardino, Derrick Stolee

On 4/1/2021 3:07 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Here is the second patch series submission coming out of the sparse-index
>> RFC [1].
> 
> Queued.  There were a few conflicts with new codepaths and I've
> touched up a few of them (Alban's merge-strategies-in-c moves some
> code that needs to dehydrate the sparse index entries from
> builtin/merge-index.c to a new file, Elijah's ort has "const"
> pointer to the istate that you need to be writable), but there may
> be more that I missed.  Please check the tip of 'seen' when it gets
> pushed out.

Thank you for dealing with these complicated merges. I promise
that the series after this one are much more focused in their
code footprint. Thus, they should be easier to merge with
ongoing topics.

I validated your merge by rebasing the rest of my current work
in this area onto 'seen'. The tests for 'git status' and 'git
add' with sparse-indexes adds to the confidence of the merge.

I did see a small conflict with mt/add-rm-in-sparse-checkout
in one of my future patches because of an adjacent edit. I'll
be sure to base my next submission on a merge with that topic.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v2 24/25] sparse-index: expand_to_path()
  2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-05 19:32     ` Elijah Newren
  2021-04-06 11:46       ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-04-05 19:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Some users of the index API have a specific path they are looking for,
> but choose to use index_file_exists() to rely on the name-hash hashtable
> instead of doing binary search with index_name_pos(). These users only
> need to know a yes/no answer, not a position within the cache array.
>
> When the index is sparse, the name-hash hash table does not contain the
> full list of paths within sparse directories. It _does_ contain the
> directory names for the sparse-directory entries.
>
> Create a helper function, expand_to_path(), for intended use with the
> name-hash hashtable functions. The integration with name-hash.c will
> follow in a later change.
>
> The solution here is to use ensure_full_index() when we determine that
> the requested path is within a sparse directory entry. This will
> populate the name-hash hashtable as the index is recomputed from
> scratch.
>
> There may be cases where the caller is trying to find an untracked path
> that is not in the index but also is not within a sparse directory
> entry. We want to minimize the overhead for these requests. If we used
> index_name_pos() to find the insertion order of the path, then we could
> determine from that position if a sparse-directory exists. (In fact,
> just calling index_name_pos() in that case would lead to expanding the
> index to a full index.) However, this takes O(log N) time where N is the
> number of cache entries.
>
> To keep the performance of this call based mostly on the input string,
> use index_file_exists() to look for the ancestors of the path. Using the
> heuristic that a sparse directory is likely to have a small number of
> parent directories, we start from the bottom and build up. Use a string
> buffer to allow mutating the path name to terminate after each slash for
> each hashset test.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  sparse-index.h | 13 +++++++++
>  2 files changed, 85 insertions(+)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 95ea17174da3..8a1223041296 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate)
>
>         trace2_region_leave("index", "ensure_full_index", istate->repo);
>  }
> +
> +/*
> + * This static global helps avoid infinite recursion between
> + * expand_to_path() and index_file_exists().
> + */
> +static int in_expand_to_path = 0;
> +
> +void expand_to_path(struct index_state *istate,
> +                   const char *path, size_t pathlen, int icase)
> +{
> +       struct strbuf path_mutable = STRBUF_INIT;
> +       size_t substr_len;
> +
> +       /* prevent extra recursion */
> +       if (in_expand_to_path)
> +               return;
> +
> +       if (!istate || !istate->sparse_index)
> +               return;
> +
> +       if (!istate->repo)
> +               istate->repo = the_repository;
> +
> +       in_expand_to_path = 1;
> +
> +       /*
> +        * We only need to actually expand a region if the
> +        * following are both true:
> +        *
> +        * 1. 'path' is not already in the index.
> +        * 2. Some parent directory of 'path' is a sparse directory.
> +        */
> +
> +       if (index_file_exists(istate, path, pathlen, icase))
> +               goto cleanup;
> +
> +       strbuf_add(&path_mutable, path, pathlen);
> +       strbuf_addch(&path_mutable, '/');
> +
> +       /* Check the name hash for all parent directories */
> +       substr_len = 0;
> +       while (substr_len < pathlen) {
> +               char temp;
> +               char *replace = strchr(path_mutable.buf + substr_len, '/');
> +
> +               if (!replace)
> +                       break;
> +
> +               /* replace the character _after_ the slash */
> +               replace++;
> +               temp = *replace;
> +               *replace = '\0';
> +               if (index_file_exists(istate, path_mutable.buf,
> +                                     path_mutable.len, icase)) {
> +                       /*
> +                        * We found a parent directory in the name-hash
> +                        * hashtable, which means that this entry could
> +                        * exist within a sparse-directory entry. Expand
> +                        * accordingly.

"this entry" is a bit unclear; it might be worth referring to the
"path" variable instead.  I think it'd also help to explain the
reasoning for using the '/' character, because while it's clear when
looking at the series, just running across it in the code it might be
easy to forget or miss.  Perhaps:

                    * We found a parent directory in the name-hash
                    * hashtable, because only sparse directory entries
                    * have a trailing '/' character.  Since "path" wasn't
                    * in the index, perhaps it exists within this
                    * sparse-directory.  Expand accordingly.

> +                        */
> +                       ensure_full_index(istate);
> +                       break;
> +               }
> +
> +               *replace = temp;
> +               substr_len = replace - path_mutable.buf;
> +       }
> +
> +cleanup:
> +       strbuf_release(&path_mutable);
> +       in_expand_to_path = 0;
> +}
> diff --git a/sparse-index.h b/sparse-index.h
> index 0268f38753c0..1115a0d7dd98 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -4,6 +4,19 @@
>  struct index_state;
>  int convert_to_sparse(struct index_state *istate);
>
> +/*
> + * Some places in the codebase expect to search for a specific path.
> + * This path might be outside of the sparse-checkout definition, in
> + * which case a sparse-index may not contain a path for that index.
> + *
> + * Given an index and a path, check to see if a leading directory for
> + * 'path' exists in the index as a sparse directory. In that case,
> + * expand that sparse directory to a full range of cache entries and
> + * populate the index accordingly.
> + */
> +void expand_to_path(struct index_state *istate,
> +                   const char *path, size_t pathlen, int icase);
> +
>  struct repository;
>  int set_sparse_index_config(struct repository *repo, int enable);
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v2 25/25] name-hash: use expand_to_path()
  2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-05 19:53     ` Elijah Newren
  0 siblings, 0 replies; 111+ messages in thread
From: Elijah Newren @ 2021-04-05 19:53 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
	Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee

On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> A sparse-index loads the name-hash data for its entries, including the
> sparse-directory entries. If a caller asks for a path that is contained
> within a sparse-directory entry, we need to expand to a full index and
> recalculate the name hash table before returning the result. Insert
> calls to expand_to_path() to protect against this case.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  name-hash.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/name-hash.c b/name-hash.c
> index 4e03fac9bb12..75c159e06eeb 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -8,6 +8,7 @@
>  #include "cache.h"
>  #include "thread-utils.h"
>  #include "trace2.h"
> +#include "sparse-index.h"
>
>  struct dir_entry {
>         struct hashmap_entry ent;
> @@ -109,6 +110,12 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
>         if (ce->ce_flags & CE_HASHED)
>                 return;
>         ce->ce_flags |= CE_HASHED;
> +
> +       if (S_ISSPARSEDIR(ce->ce_mode)) {
> +               add_dir_entry(istate, ce);
> +               return;
> +       }

I don't see how this relates to the commit message.  Was this meant to
be included in another commit?  If it is intended to be included here,
could an explanation be added?

> +
>         hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
>         hashmap_add(&istate->name_hash, &ce->ent);
>
> @@ -680,6 +687,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>         struct dir_entry *dir;
>
>         lazy_init_name_hash(istate);
> +       expand_to_path(istate, name, namelen, 0);
>         dir = find_dir_entry(istate, name, namelen);
>         return dir && dir->nr;
>  }
> @@ -690,6 +698,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
>         const char *ptr = startPtr;
>
>         lazy_init_name_hash(istate);
> +       expand_to_path(istate, name, strlen(name), 0);
>         while (*ptr) {
>                 while (*ptr && *ptr != '/')
>                         ptr++;
> @@ -713,6 +722,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
>         unsigned int hash = memihash(name, namelen);
>
>         lazy_init_name_hash(istate);
> +       expand_to_path(istate, name, namelen, icase);
>
>         ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
>                                          struct cache_entry, ent);
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v2 00/25] Sparse Index: API protections
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (25 preceding siblings ...)
  2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
@ 2021-04-05 19:55   ` Elijah Newren
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
  27 siblings, 0 replies; 111+ messages in thread
From: Elijah Newren @ 2021-04-05 19:55 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
	Matheus Tavares Bernardino, Derrick Stolee

On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the second patch series submission coming out of the sparse-index
> RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> This is based on v3 of the format series [2].
>
> [2]
> https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@gmail.com/
>
> The point of this series is to insert protections for the consumers of the
> in-memory index to avoid unintended behavior change when using a sparse
> index versus a full one.
>
> We mark certain regions of code as needing a full index, so we call
> ensure_full_index() to expand a sparse index to a full one, if necessary.
> These protections are inserted file-by-file in every loop over all cache
> entries. Well, "most" loops, because some are going to be handled in the
> very next series so I leave them out.
>
> Many callers use index_name_pos() to find a path by name. In these cases, we
> can check if that position resolves to a sparse directory instance. In those
> cases, we just expand to a full index and run the search again.
>
> The last few patches deal with the name-hash hashtable for doing O(1)
> lookups.
>
> These protections don't do much right now, since the previous series created
> the_repository->settings.command_requires_full_index to guard all index
> reads and writes to ensure the in-memory copy is full for commands that have
> not been tested with the sparse index yet.
>
> However, after this series is complete, we now have a straight-forward plan
> for making commands "sparse aware" one-by-one:
>
>  1. Disable settings.command_requires_full_index to allow an in-memory
>     sparse-index.
>  2. Run versions of that command under a debugger, breaking on
>     ensure_full_index().
>  3. Examine the call stack to determine the context of that expansion, then
>     implement the proper behavior in those locations.
>  4. Add tests to ensure we are checking this logic in the presence of sparse
>     directory entries.
>
> I will admit that mostly it is the writing of the test cases that takes the
> most time in the conversions I've done so far.
>
>
> Updates in v2
> =============
>
>  * Rebased onto v5 of ds/sparse-index
>  * Updated the technical doc to describe how these protections are guards to
>    keep behavior consistent between a sparse-index and a full index. Whether
>    or not that behavior is "correct" can be interrogated later.
>  * Calls to ensure_full_index() are marked with a TODO comment saying these
>    calls should be audited later (with tests).
>  * Fixed an incorrectly squashed commit message.
>  * Dropped the diff-lib.c commit because it was erroneously included in v2.
>  * Dropped the merge-ort.c commit because of conflicts with work in flight
>    and a quick audit that it is not needed.
>  * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
>    found it equivalent to what I would have done.

You addressed most my concerns moving from the RFC to v1, and this
version addresses most of my remaining comments/questions/suggestions.
I only had two minor comments on the last two patches.  This series is
looking pretty good to me.

> Thanks, -Stolee
>
> Derrick Stolee (25):
>   sparse-index: API protection strategy
>   *: remove 'const' qualifier for struct index_state
>   read-cache: expand on query into sparse-directory entry
>   cache: move ensure_full_index() to cache.h
>   add: ensure full index
>   checkout-index: ensure full index
>   checkout: ensure full index
>   commit: ensure full index
>   difftool: ensure full index
>   fsck: ensure full index
>   grep: ensure full index
>   ls-files: ensure full index
>   merge-index: ensure full index
>   rm: ensure full index
>   stash: ensure full index
>   update-index: ensure full index
>   dir: ensure full index
>   entry: ensure full index
>   merge-recursive: ensure full index
>   pathspec: ensure full index
>   read-cache: ensure full index
>   resolve-undo: ensure full index
>   revision: ensure full index
>   sparse-index: expand_to_path()
>   name-hash: use expand_to_path()
>
>  Documentation/technical/sparse-index.txt | 37 +++++++++++-
>  attr.c                                   | 14 ++---
>  attr.h                                   |  4 +-
>  builtin/add.c                            |  2 +
>  builtin/checkout-index.c                 |  2 +
>  builtin/checkout.c                       |  5 ++
>  builtin/commit.c                         |  4 ++
>  builtin/difftool.c                       |  3 +
>  builtin/fsck.c                           |  2 +
>  builtin/grep.c                           |  2 +
>  builtin/ls-files.c                       | 14 +++--
>  builtin/merge-index.c                    |  5 ++
>  builtin/rm.c                             |  2 +
>  builtin/stash.c                          |  2 +
>  builtin/update-index.c                   |  2 +
>  cache.h                                  |  7 ++-
>  convert.c                                | 26 ++++-----
>  convert.h                                | 20 +++----
>  dir.c                                    | 14 +++--
>  dir.h                                    |  8 +--
>  entry.c                                  |  2 +
>  merge-recursive.c                        |  4 +-
>  name-hash.c                              | 10 ++++
>  pathspec.c                               |  8 ++-
>  pathspec.h                               |  6 +-
>  read-cache.c                             | 35 ++++++++++--
>  resolve-undo.c                           |  4 ++
>  revision.c                               |  2 +
>  sparse-index.c                           | 72 ++++++++++++++++++++++++
>  sparse-index.h                           | 14 ++++-
>  submodule.c                              |  6 +-
>  submodule.h                              |  6 +-
>  32 files changed, 273 insertions(+), 71 deletions(-)
>
>
> base-commit: 66602733cc95d9a53594520cd8b28d3338e258ea
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/906
>
> Range-diff vs v1:
>
>   3:  bbf19f8a2be5 !  1:  7484e085e342 sparse-index: API protection strategy
>      @@ Commit message
>           Looking for non-zero stage:
>           * builtin/add.c:chmod_pathspec()
>           * builtin/merge.c:count_unmerged_entries()
>      +    * merge-ort.c:record_conflicted_index_entries()
>           * read-cache.c:unmerged_index()
>           * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
>           * revision.c:prepare_show_merge()
>      @@ Commit message
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>        ## Documentation/technical/sparse-index.txt ##
>      -@@ Documentation/technical/sparse-index.txt: also introduce other features that have been considered for improving the
>      - index, as well.
>      +@@ Documentation/technical/sparse-index.txt: index, as well.
>
>        Next, consumers of the index will be guarded against operating on a
>      --sparse-index by inserting calls to `ensure_full_index()` or
>      + sparse-index by inserting calls to `ensure_full_index()` or
>       -`expand_index_to_path()`. After these guards are in place, we can begin
>       -leaving sparse-directory entries in the in-memory index structure.
>      -+sparse-index by inserting calls to `ensure_full_index()` before iterating
>      -+over all cache entries. If a specific path is requested, then those will
>      ++`expand_index_to_path()`. If a specific path is requested, then those will
>       +be protected from within the `index_file_exists()` and `index_name_pos()`
>      -+API calls: they will call `ensure_full_index()` if necessary.
>      ++API calls: they will call `ensure_full_index()` if necessary. The
>      ++intention here is to preserve existing behavior when interacting with a
>      ++sparse-checkout. We don't want a change to happen by accident, without
>      ++tests. Many of these locations may not need any change before removing the
>      ++guards, but we should not do so without tests to ensure the expected
>      ++behavior happens.
>      ++
>      ++It may be desirable to _change_ the behavior of some commands in the
>      ++presence of a sparse index or more generally in any sparse-checkout
>      ++scenario. In such cases, these should be carefully communicated and
>      ++tested. No such behavior changes are intended during this phase.
>       +
>       +During a scan of the codebase, not every iteration of the cache entries
>       +needs an `ensure_full_index()` check. The basic reasons include:
>      @@ Documentation/technical/sparse-index.txt: also introduce other features that hav
>       +
>       +6. The sparse-index is disabled at this point when using the split-index
>       +   feature, so no effort is made to protect the split-index API.
>      -+
>      -+After these guards are in place, we can begin leaving sparse-directory
>      -+entries in the in-memory index structure.
>
>        Even after inserting these guards, we will keep expanding sparse-indexes
>        for most Git commands using the `command_requires_full_index` repository
>   1:  628e16dd3fc7 =  2:  098b2c9ef352 *: remove 'const' qualifier for struct index_state
>   2:  8e11e8917019 =  3:  737d27e18d64 read-cache: expand on query into sparse-directory entry
>   4:  7e4079c48eb2 =  4:  db5c100f3e2b cache: move ensure_full_index() to cache.h
>   5:  142df1ab8e3e !  5:  4a5fc2eb5a9f add: ensure full index
>      @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths
>        {
>         int i, retval = 0;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 struct cache_entry *ce = active_cache[i];
>   6:  bfa0164cc3c1 !  6:  11c38f7277c5 checkout-index: ensure full index
>      @@ builtin/checkout-index.c: static void checkout_all(const char *prefix, int prefi
>         int i, errs = 0;
>         struct cache_entry *last_ce = NULL;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (i = 0; i < active_nr ; i++) {
>                 struct cache_entry *ce = active_cache[i];
>   7:  1bb7b6117e41 !  7:  fd04adbb3f79 checkout: ensure full index
>      @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt
>                                NULL);
>
>         enable_delayed_checkout(&state);
>      ++
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (pos = 0; pos < active_nr; pos++) {
>                 struct cache_entry *ce = active_cache[pos];
>      @@ builtin/checkout.c: static int checkout_paths(const struct checkout_opts *opts,
>          * Make sure all pathspecs participated in locating the paths
>          * to be checked out.
>          */
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (pos = 0; pos < active_nr; pos++)
>                 if (opts->overlay_mode)
>   8:  a59e45c4ae8f !  8:  65704f39edc9 commit: ensure full index
>      @@ builtin/commit.c: static int list_paths(struct string_list *list, const char *wi
>                 free(max_prefix);
>         }
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
>      @@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const cha
>                 if (get_oid(parent, &oid)) {
>                         int i, ita_nr = 0;
>
>      ++                 /* TODO: audit for interaction with sparse-index. */
>       +                 ensure_full_index(&the_index);
>                         for (i = 0; i < active_nr; i++)
>                                 if (ce_intent_to_add(active_cache[i]))
>   9:  83040e7558f3 !  9:  739f3fe9edf2 difftool: ensure full index
>      @@ builtin/difftool.c: static int run_dir_diff(const char *extcmd, int symlinks, co
>                 setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
>         rc = run_command_v_opt(helper_argv, flags);
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&wtindex);
>       +
>         /*
>  10:  988f7bd2d736 ! 10:  779a86ad1ec4 fsck: ensure full index
>      @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
>                 verify_index_checksum = 1;
>                 verify_ce_order = 1;
>                 read_cache();
>      ++         /* TODO: audit for interaction with sparse-index. */
>       +         ensure_full_index(&the_index);
>                 for (i = 0; i < active_nr; i++) {
>                         unsigned int mode;
>  11:  abe189051a0c ! 11:  8c0d377054fa grep: ensure full index
>      @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt,
>         if (repo_read_index(repo) < 0)
>                 die(_("index file corrupt"));
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(repo->index);
>         for (nr = 0; nr < repo->index->cache_nr; nr++) {
>                 const struct cache_entry *ce = repo->index->cache[nr];
>  12:  00c8a7e1d119 ! 12:  beaa1467cabb ls-files: ensure full index
>      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
>
>         if (!(show_cached || show_stage || show_deleted || show_modified))
>                 return;
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(repo->index);
>         for (i = 0; i < repo->index->cache_nr; i++) {
>                 const struct cache_entry *ce = repo->index->cache[i];
>      @@ builtin/ls-files.c: void overlay_tree_on_index(struct index_state *istate,
>                 die("bad tree-ish %s", tree_name);
>
>         /* Hoist the unmerged entries up to stage #3 to make room */
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce = istate->cache[i];
>  13:  1288f02177d2 ! 13:  73684141fcff merge-index: ensure full index
>      @@ builtin/merge-index.c: static void merge_one_path(const char *path)
>        static void merge_all(void)
>        {
>         int i;
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
>      @@ builtin/merge-index.c: int cmd_merge_index(int argc, const char **argv, const ch
>
>         read_cache();
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>       +
>         i = 1;
>  14:  d6816560b32f ! 14:  6ea81a49c6b5 rm: ensure full index
>      @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
>
>         seen = xcalloc(pathspec.nr, 1);
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
>  15:  92ccd31dd343 ! 15:  49ca5ed05c8d sparse-checkout: ensure full index
>      @@ Metadata
>       Author: Derrick Stolee <dstolee@microsoft.com>
>
>        ## Commit message ##
>      -    sparse-checkout: ensure full index
>      +    stash: ensure full index
>
>      -    When adjusting the sparse-checkout definition, ensure that a sparse
>      -    index is expanded to a full one to avoid unexpected behavior. This is a
>      -    top candidate for later integration with the sparse-index, but requires
>      -    careful testing.
>      +    Before iterating over all cache entries, ensure that a sparse index is
>      +    expanded to a full index to avoid unexpected behavior.
>
>           Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
>      @@ builtin/stash.c: static int do_push_stash(const struct pathspec *ps, const char
>                 int i;
>                 char *ps_matched = xcalloc(ps->nr, 1);
>
>      ++         /* TODO: audit for interaction with sparse-index. */
>       +         ensure_full_index(&the_index);
>                 for (i = 0; i < active_nr; i++)
>                         ce_path_match(&the_index, active_cache[i], ps,
>  16:  796449758a08 ! 16:  9c4bb187c15d update-index: ensure full index
>      @@ builtin/update-index.c: static int do_reupdate(int ac, const char **av,
>                  */
>                 has_head = 0;
>         redo:
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(&the_index);
>         for (pos = 0; pos < active_nr; pos++) {
>                 const struct cache_entry *ce = active_cache[pos];
>  17:  45bbed6150a2 <  -:  ------------ diff-lib: ensure full index
>  18:  8ce1f80985a4 ! 17:  fae4c078c3ef dir: ensure full index
>      @@ dir.c: static void connect_wt_gitdir_in_nested(const char *sub_worktree,
>         if (repo_read_index(&subrepo) < 0)
>                 die(_("index file corrupt in repo %s"), subrepo.gitdir);
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(subrepo.index);
>         for (i = 0; i < subrepo.index->cache_nr; i++) {
>                 const struct cache_entry *ce = subrepo.index->cache[i];
>  19:  6e0b452f44c1 ! 18:  2b9180ee77d3 entry: ensure full index
>      @@ entry.c: static void mark_colliding_entries(const struct checkout *state,
>
>         ce->ce_flags |= CE_MATCHED;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(state->istate);
>         for (i = 0; i < state->istate->cache_nr; i++) {
>                 struct cache_entry *dup = state->istate->cache[i];
>  20:  fb4c7038b746 <  -:  ------------ merge-ort: ensure full index
>  21:  57d59825627f ! 19:  1e3f6085a405 merge-recursive: ensure full index
>      @@ merge-recursive.c: static struct string_list *get_unmerged(struct index_state *i
>
>         unmerged->strdup_strings = 1;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct string_list_item *item;
>  22:  f4c470686b27 ! 20:  e62a597a9725 pathspec: ensure full index
>      @@ pathspec.c: void add_pathspec_matches_against_index(const struct pathspec *paths
>                         num_unmatched++;
>         if (!num_unmatched)
>                 return;
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>  23:  81519782d4b2 ! 21:  ebfffdbdd6ad read-cache: ensure full index
>      @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
>          */
>         preload_index(istate, pathspec, 0);
>         trace2_region_enter("index", "refresh", NULL);
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce, *new_entry;
>      @@ read-cache.c: int repo_index_has_changes(struct repository *repo,
>                 diff_flush(&opt);
>                 return opt.flags.has_changes != 0;
>         } else {
>      ++         /* TODO: audit for interaction with sparse-index. */
>       +         ensure_full_index(istate);
>                 for (i = 0; sb && i < istate->cache_nr; i++) {
>                         if (i)
>  24:  39e8bea0c1ca ! 22:  495b07a87973 resolve-undo: ensure full index
>      @@ resolve-undo.c: void unmerge_marked_index(struct index_state *istate)
>         if (!istate->resolve_undo)
>                 return;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>      @@ resolve-undo.c: void unmerge_index(struct index_state *istate, const struct path
>         if (!istate->resolve_undo)
>                 return;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
>  25:  6abb1813ae10 ! 23:  3144114d1a75 revision: ensure full index
>      @@ revision.c: static void do_add_index_objects_to_pending(struct rev_info *revs,
>        {
>         int i;
>
>      ++ /* TODO: audit for interaction with sparse-index. */
>       + ensure_full_index(istate);
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce = istate->cache[i];
>  26:  1690d33c11e0 = 24:  d52c72b4a7b9 sparse-index: expand_to_path()
>  27:  40cfc59f695a = 25:  7e2d3fae9a2a name-hash: use expand_to_path()
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v2 24/25] sparse-index: expand_to_path()
  2021-04-05 19:32     ` Elijah Newren
@ 2021-04-06 11:46       ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-04-06 11:46 UTC (permalink / raw)
  To: Elijah Newren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

On 4/5/2021 3:32 PM, Elijah Newren wrote:
> On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>>> +               if (index_file_exists(istate, path_mutable.buf,
>> +                                     path_mutable.len, icase)) {
>> +                       /*
>> +                        * We found a parent directory in the name-hash
>> +                        * hashtable, which means that this entry could
>> +                        * exist within a sparse-directory entry. Expand
>> +                        * accordingly.
> 
> "this entry" is a bit unclear; it might be worth referring to the
> "path" variable instead.  I think it'd also help to explain the
> reasoning for using the '/' character, because while it's clear when
> looking at the series, just running across it in the code it might be
> easy to forget or miss.  Perhaps:
> 
>                     * We found a parent directory in the name-hash
>                     * hashtable, because only sparse directory entries
>                     * have a trailing '/' character.  Since "path" wasn't
>                     * in the index, perhaps it exists within this
>                     * sparse-directory.  Expand accordingly.

A good recommendation. Thanks.

-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 00/26] Sparse Index: API protections
  2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
                     ` (26 preceding siblings ...)
  2021-04-05 19:55   ` Elijah Newren
@ 2021-04-12 21:07   ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
                       ` (26 more replies)
  27 siblings, 27 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee

Here is the second patch series submission coming out of the sparse-index
RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/

This is based on ds/sparse-index.

The point of this series is to insert protections for the consumers of the
in-memory index to avoid unintended behavior change when using a sparse
index versus a full one.

We mark certain regions of code as needing a full index, so we call
ensure_full_index() to expand a sparse index to a full one, if necessary.
These protections are inserted file-by-file in every loop over all cache
entries. Well, "most" loops, because some are going to be handled in the
very next series so I leave them out.

Many callers use index_name_pos() to find a path by name. In these cases, we
can check if that position resolves to a sparse directory instance. In those
cases, we just expand to a full index and run the search again.

The last few patches deal with the name-hash hashtable for doing O(1)
lookups.

These protections don't do much right now, since the previous series created
the_repository->settings.command_requires_full_index to guard all index
reads and writes to ensure the in-memory copy is full for commands that have
not been tested with the sparse index yet.

However, after this series is complete, we now have a straight-forward plan
for making commands "sparse aware" one-by-one:

 1. Disable settings.command_requires_full_index to allow an in-memory
    sparse-index.
 2. Run versions of that command under a debugger, breaking on
    ensure_full_index().
 3. Examine the call stack to determine the context of that expansion, then
    implement the proper behavior in those locations.
 4. Add tests to ensure we are checking this logic in the presence of sparse
    directory entries.

I will admit that mostly it is the writing of the test cases that takes the
most time in the conversions I've done so far.


Updates in v3
=============

 * I updated based on Elijah's feedback.
 * One new patch splits out a change that Elijah (rightfully) pointed out
   did not belong with the patch it was originally in.

I gave it time to see if any other comments came in, but it looks like
review stabilized. I probably waited a bit longer than I should have.


Updates in v2
=============

 * Rebased onto v5 of ds/sparse-index
 * Updated the technical doc to describe how these protections are guards to
   keep behavior consistent between a sparse-index and a full index. Whether
   or not that behavior is "correct" can be interrogated later.
 * Calls to ensure_full_index() are marked with a TODO comment saying these
   calls should be audited later (with tests).
 * Fixed an incorrectly squashed commit message.
 * Dropped the diff-lib.c commit because it was erroneously included in v2.
 * Dropped the merge-ort.c commit because of conflicts with work in flight
   and a quick audit that it is not needed.
 * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
   found it equivalent to what I would have done.

Thanks, -Stolee

Derrick Stolee (26):
  sparse-index: API protection strategy
  *: remove 'const' qualifier for struct index_state
  read-cache: expand on query into sparse-directory entry
  cache: move ensure_full_index() to cache.h
  add: ensure full index
  checkout-index: ensure full index
  checkout: ensure full index
  commit: ensure full index
  difftool: ensure full index
  fsck: ensure full index
  grep: ensure full index
  ls-files: ensure full index
  merge-index: ensure full index
  rm: ensure full index
  stash: ensure full index
  update-index: ensure full index
  dir: ensure full index
  entry: ensure full index
  merge-recursive: ensure full index
  pathspec: ensure full index
  read-cache: ensure full index
  resolve-undo: ensure full index
  revision: ensure full index
  name-hash: don't add directories to name_hash
  sparse-index: expand_to_path()
  name-hash: use expand_to_path()

 Documentation/technical/sparse-index.txt | 37 +++++++++++-
 attr.c                                   | 14 ++---
 attr.h                                   |  4 +-
 builtin/add.c                            |  2 +
 builtin/checkout-index.c                 |  2 +
 builtin/checkout.c                       |  5 ++
 builtin/commit.c                         |  4 ++
 builtin/difftool.c                       |  3 +
 builtin/fsck.c                           |  2 +
 builtin/grep.c                           |  2 +
 builtin/ls-files.c                       | 14 +++--
 builtin/merge-index.c                    |  5 ++
 builtin/rm.c                             |  2 +
 builtin/stash.c                          |  2 +
 builtin/update-index.c                   |  2 +
 cache.h                                  |  7 ++-
 convert.c                                | 26 ++++-----
 convert.h                                | 20 +++----
 dir.c                                    | 14 +++--
 dir.h                                    |  8 +--
 entry.c                                  |  2 +
 merge-recursive.c                        |  4 +-
 name-hash.c                              | 11 +++-
 pathspec.c                               |  8 ++-
 pathspec.h                               |  6 +-
 read-cache.c                             | 35 ++++++++++--
 resolve-undo.c                           |  4 ++
 revision.c                               |  2 +
 sparse-index.c                           | 73 ++++++++++++++++++++++++
 sparse-index.h                           | 14 ++++-
 submodule.c                              |  6 +-
 submodule.h                              |  6 +-
 32 files changed, 273 insertions(+), 73 deletions(-)


base-commit: c9e40ae8ec41c5566e5849a87c969fa81ef49fcd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/906

Range-diff vs v2:

  1:  7484e085e342 =  1:  4731f610ba6e sparse-index: API protection strategy
  2:  098b2c9ef352 =  2:  d3a92538edb6 *: remove 'const' qualifier for struct index_state
  3:  737d27e18d64 =  3:  f4b77aa18b93 read-cache: expand on query into sparse-directory entry
  4:  db5c100f3e2b =  4:  da17774a53c5 cache: move ensure_full_index() to cache.h
  5:  4a5fc2eb5a9f =  5:  b59c9f482828 add: ensure full index
  6:  11c38f7277c5 =  6:  0082855b5961 checkout-index: ensure full index
  7:  fd04adbb3f79 =  7:  e2ac527143ff checkout: ensure full index
  8:  65704f39edc9 =  8:  1a3b51fd3c4b commit: ensure full index
  9:  739f3fe9edf2 =  9:  8c61d40dfe01 difftool: ensure full index
 10:  779a86ad1ec4 = 10:  45b603379422 fsck: ensure full index
 11:  8c0d377054fa = 11:  97124e9fdc7f grep: ensure full index
 12:  beaa1467cabb = 12:  b00e214515e8 ls-files: ensure full index
 13:  73684141fcff = 13:  6497f2ce225b merge-index: ensure full index
 14:  6ea81a49c6b5 = 14:  175f3bc6b336 rm: ensure full index
 15:  49ca5ed05c8d = 15:  daa77e84e0e2 stash: ensure full index
 16:  9c4bb187c15d = 16:  8c5336964d9b update-index: ensure full index
 17:  fae4c078c3ef = 17:  08a62c23c8f7 dir: ensure full index
 18:  2b9180ee77d3 = 18:  825ebceee508 entry: ensure full index
 19:  1e3f6085a405 = 19:  3673db517235 merge-recursive: ensure full index
 20:  e62a597a9725 = 20:  4d3f6de29a63 pathspec: ensure full index
 21:  ebfffdbdd6ad = 21:  bda9cab15966 read-cache: ensure full index
 22:  495b07a87973 = 22:  38f295a41ec1 resolve-undo: ensure full index
 23:  3144114d1a75 = 23:  f928e104f0d3 revision: ensure full index
  -:  ------------ > 24:  5fd83dcf2747 name-hash: don't add directories to name_hash
 24:  d52c72b4a7b9 ! 25:  335fec3676a0 sparse-index: expand_to_path()
     @@ sparse-index.c: void ensure_full_index(struct index_state *istate)
      +				      path_mutable.len, icase)) {
      +			/*
      +			 * We found a parent directory in the name-hash
     -+			 * hashtable, which means that this entry could
     -+			 * exist within a sparse-directory entry. Expand
     -+			 * accordingly.
     ++			 * hashtable, because only sparse directory entries
     ++			 * have a trailing '/' character.  Since "path" wasn't
     ++			 * in the index, perhaps it exists within this
     ++			 * sparse-directory.  Expand accordingly.
      +			 */
      +			ensure_full_index(istate);
      +			break;
 25:  7e2d3fae9a2a ! 26:  1f3af8a886e5 name-hash: use expand_to_path()
     @@ name-hash.c
       
       struct dir_entry {
       	struct hashmap_entry ent;
     -@@ name-hash.c: static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
     - 	if (ce->ce_flags & CE_HASHED)
     - 		return;
     - 	ce->ce_flags |= CE_HASHED;
     -+
     -+	if (S_ISSPARSEDIR(ce->ce_mode)) {
     -+		add_dir_entry(istate, ce);
     -+		return;
     -+	}
     -+
     - 	hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
     - 	hashmap_add(&istate->name_hash, &ce->ent);
     - 
      @@ name-hash.c: int index_dir_exists(struct index_state *istate, const char *name, int namelen)
       	struct dir_entry *dir;
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 01/26] sparse-index: API protection strategy
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
                       ` (25 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Edit and expand the sparse-index design document with the plan for
guarding index operations with ensure_full_index().

Notably, the plan has changed to not have an expand_to_path() method in
favor of checking for a sparse-directory hit inside of the
index_path_pos() API.

The changes that follow this one will incrementally add
ensure_full_index() guards to iterations over all cache entries. Some
iterations over the cache entries are not protected due to a few
categories listed in the document. Since these are not being modified,
here is a short list of the files and methods that will not receive
these guards:

Looking for non-zero stage:
* builtin/add.c:chmod_pathspec()
* builtin/merge.c:count_unmerged_entries()
* merge-ort.c:record_conflicted_index_entries()
* read-cache.c:unmerged_index()
* rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
* revision.c:prepare_show_merge()
* sequencer.c:append_conflicts_hint()
* wt-status.c:wt_status_collect_changes_initial()

Looking for submodules:
* builtin/submodule--helper.c:module_list_compute()
* submodule.c: several methods
* worktree.c:validate_no_submodules()

Part of the index API:
* name-hash.c: lazy init methods
* preload-index.c:preload_thread(), preload_index()
* read-cache.c: file format methods

Checking for correct order of cache entries:
* read-cache.c:check_ce_order()

Ignores SKIP_WORKTREE entries or already aware:
* unpack-trees.c:mark_new_skip_worktree()
* wt-status.c:wt_status_check_sparse_checkout()

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/sparse-index.txt | 37 ++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt
index 8d3d80804604..3b24c1a219f8 100644
--- a/Documentation/technical/sparse-index.txt
+++ b/Documentation/technical/sparse-index.txt
@@ -85,8 +85,41 @@ index, as well.
 
 Next, consumers of the index will be guarded against operating on a
 sparse-index by inserting calls to `ensure_full_index()` or
-`expand_index_to_path()`. After these guards are in place, we can begin
-leaving sparse-directory entries in the in-memory index structure.
+`expand_index_to_path()`. If a specific path is requested, then those will
+be protected from within the `index_file_exists()` and `index_name_pos()`
+API calls: they will call `ensure_full_index()` if necessary. The
+intention here is to preserve existing behavior when interacting with a
+sparse-checkout. We don't want a change to happen by accident, without
+tests. Many of these locations may not need any change before removing the
+guards, but we should not do so without tests to ensure the expected
+behavior happens.
+
+It may be desirable to _change_ the behavior of some commands in the
+presence of a sparse index or more generally in any sparse-checkout
+scenario. In such cases, these should be carefully communicated and
+tested. No such behavior changes are intended during this phase.
+
+During a scan of the codebase, not every iteration of the cache entries
+needs an `ensure_full_index()` check. The basic reasons include:
+
+1. The loop is scanning for entries with non-zero stage. These entries
+   are not collapsed into a sparse-directory entry.
+
+2. The loop is scanning for submodules. These entries are not collapsed
+   into a sparse-directory entry.
+
+3. The loop is part of the index API, especially around reading or
+   writing the format.
+
+4. The loop is checking for correct order of cache entries and that is
+   correct if and only if the sparse-directory entries are in the correct
+   location.
+
+5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is
+   otherwise already aware of sparse directory entries.
+
+6. The sparse-index is disabled at this point when using the split-index
+   feature, so no effort is made to protect the split-index API.
 
 Even after inserting these guards, we will keep expanding sparse-indexes
 for most Git commands using the `command_requires_full_index` repository
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
                       ` (24 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.

This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 attr.c             | 14 +++++++-------
 attr.h             |  4 ++--
 builtin/ls-files.c | 10 +++++-----
 cache.h            |  6 +++---
 convert.c          | 26 +++++++++++++-------------
 convert.h          | 20 ++++++++++----------
 dir.c              | 12 ++++++------
 dir.h              |  8 ++++----
 merge-recursive.c  |  2 +-
 pathspec.c         |  6 +++---
 pathspec.h         |  6 +++---
 read-cache.c       | 10 +++++-----
 submodule.c        |  6 +++---
 submodule.h        |  6 +++---
 14 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/attr.c b/attr.c
index 4ef85d668b54..8de553293ee4 100644
--- a/attr.c
+++ b/attr.c
@@ -718,7 +718,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 	return res;
 }
 
-static struct attr_stack *read_attr_from_index(const struct index_state *istate,
+static struct attr_stack *read_attr_from_index(struct index_state *istate,
 					       const char *path,
 					       int macro_ok)
 {
@@ -748,7 +748,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
 	return res;
 }
 
-static struct attr_stack *read_attr(const struct index_state *istate,
+static struct attr_stack *read_attr(struct index_state *istate,
 				    const char *path, int macro_ok)
 {
 	struct attr_stack *res = NULL;
@@ -840,7 +840,7 @@ static void push_stack(struct attr_stack **attr_stack_p,
 	}
 }
 
-static void bootstrap_attr_stack(const struct index_state *istate,
+static void bootstrap_attr_stack(struct index_state *istate,
 				 struct attr_stack **stack)
 {
 	struct attr_stack *e;
@@ -878,7 +878,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
 	push_stack(stack, e, NULL, 0);
 }
 
-static void prepare_attr_stack(const struct index_state *istate,
+static void prepare_attr_stack(struct index_state *istate,
 			       const char *path, int dirlen,
 			       struct attr_stack **stack)
 {
@@ -1078,7 +1078,7 @@ static void determine_macros(struct all_attrs_item *all_attrs,
  * If check->check_nr is non-zero, only attributes in check[] are collected.
  * Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const struct index_state *istate,
+static void collect_some_attrs(struct index_state *istate,
 			       const char *path,
 			       struct attr_check *check)
 {
@@ -1107,7 +1107,7 @@ static void collect_some_attrs(const struct index_state *istate,
 	fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
 }
 
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
 		    const char *path,
 		    struct attr_check *check)
 {
@@ -1124,7 +1124,7 @@ void git_check_attr(const struct index_state *istate,
 	}
 }
 
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
 		   const char *path, struct attr_check *check)
 {
 	int i;
diff --git a/attr.h b/attr.h
index 404548f028a8..3732505edae8 100644
--- a/attr.h
+++ b/attr.h
@@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check);
  */
 const char *git_attr_name(const struct git_attr *);
 
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
 		    const char *path, struct attr_check *check);
 
 /*
  * Retrieve all attributes that apply to the specified path.
  * check holds the attributes and their values.
  */
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
 		   const char *path, struct attr_check *check);
 
 enum git_attr_direction {
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..4f9ed1fb29b7 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -57,7 +57,7 @@ static const char *tag_modified = "";
 static const char *tag_skip_worktree = "";
 static const char *tag_resolve_undo = "";
 
-static void write_eolinfo(const struct index_state *istate,
+static void write_eolinfo(struct index_state *istate,
 			  const struct cache_entry *ce, const char *path)
 {
 	if (show_eol) {
@@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce)
 	}
 }
 
-static void show_dir_entry(const struct index_state *istate,
+static void show_dir_entry(struct index_state *istate,
 			   const char *tag, struct dir_entry *ent)
 {
 	int len = max_prefix_len;
@@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate,
 	write_name(ent->name);
 }
 
-static void show_other_files(const struct index_state *istate,
+static void show_other_files(struct index_state *istate,
 			     const struct dir_struct *dir)
 {
 	int i;
@@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate,
 	}
 }
 
-static void show_killed_files(const struct index_state *istate,
+static void show_killed_files(struct index_state *istate,
 			      const struct dir_struct *dir)
 {
 	int i;
@@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
 	}
 }
 
-static void show_ru_info(const struct index_state *istate)
+static void show_ru_info(struct index_state *istate)
 {
 	struct string_list_item *item;
 
diff --git a/cache.h b/cache.h
index 8aede373aeb3..5006278c13ca 100644
--- a/cache.h
+++ b/cache.h
@@ -800,7 +800,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
  * index_name_pos(&index, "f", 1) -> -3
  * index_name_pos(&index, "g", 1) -> -5
  */
-int index_name_pos(const struct index_state *, const char *name, int namelen);
+int index_name_pos(struct index_state *, const char *name, int namelen);
 
 /*
  * Some functions return the negative complement of an insert position when a
@@ -850,8 +850,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags);
 int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
 int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
 void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
-int index_name_is_other(const struct index_state *, const char *, int);
-void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
+int index_name_is_other(struct index_state *, const char *, int);
+void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
 
 /* do stat comparison even if CE_VALID is true */
 #define CE_MATCH_IGNORE_VALID		01
diff --git a/convert.c b/convert.c
index ee360c2f07ce..20b19abffa36 100644
--- a/convert.c
+++ b/convert.c
@@ -138,7 +138,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
 	}
 }
 
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
 					   const char *path)
 {
 	const char *ret;
@@ -222,7 +222,7 @@ static void check_global_conv_flags_eol(const char *path,
 	}
 }
 
-static int has_crlf_in_index(const struct index_state *istate, const char *path)
+static int has_crlf_in_index(struct index_state *istate, const char *path)
 {
 	unsigned long sz;
 	void *data;
@@ -496,7 +496,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
 	return 1;
 }
 
-static int crlf_to_git(const struct index_state *istate,
+static int crlf_to_git(struct index_state *istate,
 		       const char *path, const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, int conv_flags)
@@ -1307,7 +1307,7 @@ struct conv_attrs {
 
 static struct attr_check *check;
 
-static void convert_attrs(const struct index_state *istate,
+static void convert_attrs(struct index_state *istate,
 			  struct conv_attrs *ca, const char *path)
 {
 	struct attr_check_item *ccheck = NULL;
@@ -1369,7 +1369,7 @@ void reset_parsed_attributes(void)
 	user_convert_tail = NULL;
 }
 
-int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
+int would_convert_to_git_filter_fd(struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
 
@@ -1388,7 +1388,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char
 	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL);
 }
 
-const char *get_convert_attr_ascii(const struct index_state *istate, const char *path)
+const char *get_convert_attr_ascii(struct index_state *istate, const char *path)
 {
 	struct conv_attrs ca;
 
@@ -1414,7 +1414,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char
 	return "";
 }
 
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
 		   const char *path, const char *src, size_t len,
 		   struct strbuf *dst, int conv_flags)
 {
@@ -1448,7 +1448,7 @@ int convert_to_git(const struct index_state *istate,
 	return ret | ident_to_git(src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
 			      const char *path, int fd, struct strbuf *dst,
 			      int conv_flags)
 {
@@ -1466,7 +1466,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
 	ident_to_git(dst->buf, dst->len, dst, ca.ident);
 }
 
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(struct index_state *istate,
 					    const char *path, const char *src,
 					    size_t len, struct strbuf *dst,
 					    int normalizing,
@@ -1510,7 +1510,7 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
 	return ret | ret_filter;
 }
 
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
 				  const char *path, const char *src,
 				  size_t len, struct strbuf *dst,
 				  const struct checkout_metadata *meta,
@@ -1519,7 +1519,7 @@ int async_convert_to_working_tree(const struct index_state *istate,
 	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
 }
 
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
 			    const char *path, const char *src,
 			    size_t len, struct strbuf *dst,
 			    const struct checkout_metadata *meta)
@@ -1527,7 +1527,7 @@ int convert_to_working_tree(const struct index_state *istate,
 	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
 }
 
-int renormalize_buffer(const struct index_state *istate, const char *path,
+int renormalize_buffer(struct index_state *istate, const char *path,
 		       const char *src, size_t len, struct strbuf *dst)
 {
 	int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
@@ -1964,7 +1964,7 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
  * Note that you would be crazy to set CRLF, smudge/clean or ident to a
  * large binary blob you would want us not to slurp into the memory!
  */
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
 					const char *path,
 					const struct object_id *oid)
 {
diff --git a/convert.h b/convert.h
index e29d1026a686..0f7c8a1f0446 100644
--- a/convert.h
+++ b/convert.h
@@ -65,41 +65,41 @@ struct checkout_metadata {
 
 extern enum eol core_eol;
 extern char *check_roundtrip_encoding;
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
 					   const char *path);
 const char *get_wt_convert_stats_ascii(const char *path);
-const char *get_convert_attr_ascii(const struct index_state *istate,
+const char *get_convert_attr_ascii(struct index_state *istate,
 				   const char *path);
 
 /* returns 1 if *dst was used */
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
 		   const char *path, const char *src, size_t len,
 		   struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
 			    const char *path, const char *src,
 			    size_t len, struct strbuf *dst,
 			    const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
 				  const char *path, const char *src,
 				  size_t len, struct strbuf *dst,
 				  const struct checkout_metadata *meta,
 				  void *dco);
 int async_query_available_blobs(const char *cmd,
 				struct string_list *available_paths);
-int renormalize_buffer(const struct index_state *istate,
+int renormalize_buffer(struct index_state *istate,
 		       const char *path, const char *src, size_t len,
 		       struct strbuf *dst);
-static inline int would_convert_to_git(const struct index_state *istate,
+static inline int would_convert_to_git(struct index_state *istate,
 				       const char *path)
 {
 	return convert_to_git(istate, path, NULL, 0, NULL, 0);
 }
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
 			      const char *path, int fd,
 			      struct strbuf *dst,
 			      int conv_flags);
-int would_convert_to_git_filter_fd(const struct index_state *istate,
+int would_convert_to_git_filter_fd(struct index_state *istate,
 				   const char *path);
 
 /*
@@ -133,7 +133,7 @@ void reset_parsed_attributes(void);
 
 struct stream_filter; /* opaque */
 
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
 					const char *path,
 					const struct object_id *);
 void free_stream_filter(struct stream_filter *);
diff --git a/dir.c b/dir.c
index fd8aa7c40faa..5b00dfb5b144 100644
--- a/dir.c
+++ b/dir.c
@@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
  * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
  * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
  */
-static int match_pathspec_item(const struct index_state *istate,
+static int match_pathspec_item(struct index_state *istate,
 			       const struct pathspec_item *item, int prefix,
 			       const char *name, int namelen, unsigned flags)
 {
@@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate,
  * pathspec did not match any names, which could indicate that the
  * user mistyped the nth pathspec.
  */
-static int do_match_pathspec(const struct index_state *istate,
+static int do_match_pathspec(struct index_state *istate,
 			     const struct pathspec *ps,
 			     const char *name, int namelen,
 			     int prefix, char *seen,
@@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate,
 	return retval;
 }
 
-static int match_pathspec_with_flags(const struct index_state *istate,
+static int match_pathspec_with_flags(struct index_state *istate,
 				     const struct pathspec *ps,
 				     const char *name, int namelen,
 				     int prefix, char *seen, unsigned flags)
@@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate,
 	return negative ? 0 : positive;
 }
 
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
 		   const struct pathspec *ps,
 		   const char *name, int namelen,
 		   int prefix, char *seen, int is_dir)
@@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate,
 /**
  * Check if a submodule is a superset of the pathspec
  */
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
 			 const struct pathspec *ps,
 			 const char *submodule_name,
 			 char *seen)
@@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base,
 	add_pattern_to_hashsets(pl, pattern);
 }
 
-static int read_skip_worktree_file_from_index(const struct index_state *istate,
+static int read_skip_worktree_file_from_index(struct index_state *istate,
 					      const char *path,
 					      size_t *size_out, char **data_out,
 					      struct oid_stat *oid_stat)
diff --git a/dir.h b/dir.h
index facfae47402a..51cb0e217247 100644
--- a/dir.h
+++ b/dir.h
@@ -354,7 +354,7 @@ int count_slashes(const char *s);
 int simple_length(const char *match);
 int no_wildcard(const char *string);
 char *common_prefix(const struct pathspec *pathspec);
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
 		   const struct pathspec *pathspec,
 		   const char *name, int namelen,
 		   int prefix, char *seen, int is_dir);
@@ -492,12 +492,12 @@ int git_fnmatch(const struct pathspec_item *item,
 		const char *pattern, const char *string,
 		int prefix);
 
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
 			 const struct pathspec *ps,
 			 const char *submodule_name,
 			 char *seen);
 
-static inline int ce_path_match(const struct index_state *istate,
+static inline int ce_path_match(struct index_state *istate,
 				const struct cache_entry *ce,
 				const struct pathspec *pathspec,
 				char *seen)
@@ -506,7 +506,7 @@ static inline int ce_path_match(const struct index_state *istate,
 			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
 }
 
-static inline int dir_path_match(const struct index_state *istate,
+static inline int dir_path_match(struct index_state *istate,
 				 const struct dir_entry *ent,
 				 const struct pathspec *pathspec,
 				 int prefix, char *seen)
diff --git a/merge-recursive.c b/merge-recursive.c
index 3d9207455b3a..b8de7a704eae 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2987,7 +2987,7 @@ static int blob_unchanged(struct merge_options *opt,
 	struct strbuf obuf = STRBUF_INIT;
 	struct strbuf abuf = STRBUF_INIT;
 	int ret = 0; /* assume changed for safety */
-	const struct index_state *idx = opt->repo->index;
+	struct index_state *idx = opt->repo->index;
 
 	if (a->mode != o->mode)
 		return 0;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22f2..b6e333965cb4 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -20,7 +20,7 @@
  * to use find_pathspecs_matching_against_index() instead.
  */
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
-					const struct index_state *istate,
+					struct index_state *istate,
 					char *seen)
 {
 	int num_unmatched = 0, i;
@@ -51,7 +51,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
  * given pathspecs achieves against all items in the index.
  */
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate)
+					    struct index_state *istate)
 {
 	char *seen = xcalloc(pathspec->nr, 1);
 	add_pathspec_matches_against_index(pathspec, istate, seen);
@@ -702,7 +702,7 @@ void clear_pathspec(struct pathspec *pathspec)
 	pathspec->nr = 0;
 }
 
-int match_pathspec_attrs(const struct index_state *istate,
+int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item)
 {
diff --git a/pathspec.h b/pathspec.h
index 454ce364fac7..2ccc8080b6cc 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -150,11 +150,11 @@ static inline int ps_strcmp(const struct pathspec_item *item,
 }
 
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
-					const struct index_state *istate,
+					struct index_state *istate,
 					char *seen);
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate);
-int match_pathspec_attrs(const struct index_state *istate,
+					    struct index_state *istate);
+int match_pathspec_attrs(struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
 
diff --git a/read-cache.c b/read-cache.c
index 2410e6e0df13..3ad94578095e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -549,7 +549,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
 	return 0;
 }
 
-static int index_name_stage_pos(const struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
 {
 	int first, last;
 
@@ -570,7 +570,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na
 	return -first-1;
 }
 
-int index_name_pos(const struct index_state *istate, const char *name, int namelen)
+int index_name_pos(struct index_state *istate, const char *name, int namelen)
 {
 	return index_name_stage_pos(istate, name, namelen, 0);
 }
@@ -3389,8 +3389,8 @@ int repo_read_index_unmerged(struct repository *repo)
  * We helpfully remove a trailing "/" from directories so that
  * the output of read_directory can be used as-is.
  */
-int index_name_is_other(const struct index_state *istate, const char *name,
-		int namelen)
+int index_name_is_other(struct index_state *istate, const char *name,
+			int namelen)
 {
 	int pos;
 	if (namelen && name[namelen - 1] == '/')
@@ -3408,7 +3408,7 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 	return 1;
 }
 
-void *read_blob_data_from_index(const struct index_state *istate,
+void *read_blob_data_from_index(struct index_state *istate,
 				const char *path, unsigned long *size)
 {
 	int pos, len;
diff --git a/submodule.c b/submodule.c
index 9767ba9893cc..83809a4f7bd2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -33,7 +33,7 @@ static struct oid_array ref_tips_after_fetch;
  * will be disabled because we can't guess what might be configured in
  * .gitmodules unless the user resolves the conflict.
  */
-int is_gitmodules_unmerged(const struct index_state *istate)
+int is_gitmodules_unmerged(struct index_state *istate)
 {
 	int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE));
 	if (pos < 0) { /* .gitmodules not found or isn't merged */
@@ -301,7 +301,7 @@ int is_submodule_populated_gently(const char *path, int *return_error_code)
 /*
  * Dies if the provided 'prefix' corresponds to an unpopulated submodule
  */
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
 				  const char *prefix)
 {
 	int i, prefixlen;
@@ -331,7 +331,7 @@ void die_in_unpopulated_submodule(const struct index_state *istate,
 /*
  * Dies if any paths in the provided pathspec descends into a submodule
  */
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
 			       const struct pathspec *ps)
 {
 	int i, j;
diff --git a/submodule.h b/submodule.h
index 4ac6e31cf1f7..84640c49c114 100644
--- a/submodule.h
+++ b/submodule.h
@@ -39,7 +39,7 @@ struct submodule_update_strategy {
 };
 #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
 
-int is_gitmodules_unmerged(const struct index_state *istate);
+int is_gitmodules_unmerged(struct index_state *istate);
 int is_writing_gitmodules_ok(void);
 int is_staging_gitmodules_ok(struct index_state *istate);
 int update_path_in_gitmodules(const char *oldpath, const char *newpath);
@@ -60,9 +60,9 @@ int is_submodule_active(struct repository *repo, const char *path);
  * Otherwise the return error code is the same as of resolve_gitdir_gently.
  */
 int is_submodule_populated_gently(const char *path, int *return_error_code);
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
 				  const char *prefix);
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
 			       const struct pathspec *ps);
 enum submodule_update_type parse_submodule_update_type(const char *value);
 int parse_submodule_update_strategy(const char *value,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
                       ` (23 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.

In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.

Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 3ad94578095e..3698bc7bf77d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -567,6 +567,27 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
 		}
 		first = next+1;
 	}
+
+	if (istate->sparse_index &&
+	    first > 0) {
+		/* Note: first <= istate->cache_nr */
+		struct cache_entry *ce = istate->cache[first - 1];
+
+		/*
+		 * If we are in a sparse-index _and_ the entry before the
+		 * insertion position is a sparse-directory entry that is
+		 * an ancestor of 'name', then we need to expand the index
+		 * and search again. This will only trigger once, because
+		 * thereafter the index is fully expanded.
+		 */
+		if (S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) < namelen &&
+		    !strncmp(name, ce->name, ce_namelen(ce))) {
+			ensure_full_index(istate);
+			return index_name_stage_pos(istate, name, namelen, stage);
+		}
+	}
+
 	return -first-1;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 04/26] cache: move ensure_full_index() to cache.h
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-04-12 21:07     ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
                       ` (22 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Soon we will insert ensure_full_index() calls across the codebase.
Instead of also adding include statements for sparse-index.h, let's just
use the fact that anything that cares about the index already has
cache.h in its includes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 cache.h        | 1 +
 sparse-index.h | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 5006278c13ca..b7e20e9778db 100644
--- a/cache.h
+++ b/cache.h
@@ -350,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce);
 void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
 void free_name_hash(struct index_state *istate);
 
+void ensure_full_index(struct index_state *istate);
 
 /* Cache entry creation and cleanup */
 
diff --git a/sparse-index.h b/sparse-index.h
index 39dcc859735e..0268f38753c0 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -2,7 +2,6 @@
 #define SPARSE_INDEX_H__
 
 struct index_state;
-void ensure_full_index(struct index_state *istate);
 int convert_to_sparse(struct index_state *istate);
 
 struct repository;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 05/26] add: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-04-12 21:07     ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
                       ` (21 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3a2..afccf2fd5543 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -141,6 +141,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 {
 	int i, retval = 0;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 06/26] checkout-index: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-04-12 21:07     ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
                       ` (20 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before we iterate over all cache entries, ensure that the index is not
sparse. This loop in checkout_all() might be safe to iterate over a
sparse index, but let's put this protection here until it can be
carefully tested.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 023e49e271c2..2c2936a9dae0 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -119,6 +119,8 @@ static void checkout_all(const char *prefix, int prefix_length)
 	int i, errs = 0;
 	struct cache_entry *last_ce = NULL;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr ; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce) != checkout_stage
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 07/26] checkout: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-04-12 21:07     ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:07     ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
                       ` (19 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries in the checkout builtin, ensure
that we have a full index to avoid any unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0e6639052001..d0dbe63ea119 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -368,6 +368,9 @@ static int checkout_worktree(const struct checkout_opts *opts,
 			       NULL);
 
 	enable_delayed_checkout(&state);
+
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -512,6 +515,8 @@ static int checkout_paths(const struct checkout_opts *opts,
 	 * Make sure all pathspecs participated in locating the paths
 	 * to be checked out.
 	 */
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++)
 		if (opts->overlay_mode)
 			mark_ce_for_checkout_overlay(active_cache[pos],
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 08/26] commit: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-04-12 21:07     ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
                       ` (18 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

These two loops iterate over all cache entries, so ensure that a sparse
index is expanded to a full index before we do so.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/commit.c b/builtin/commit.c
index 739110c5a7f6..cf0c36d1dcb2 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -251,6 +251,8 @@ static int list_paths(struct string_list *list, const char *with_tree,
 		free(max_prefix);
 	}
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		struct string_list_item *item;
@@ -931,6 +933,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		if (get_oid(parent, &oid)) {
 			int i, ita_nr = 0;
 
+			/* TODO: audit for interaction with sparse-index. */
+			ensure_full_index(&the_index);
 			for (i = 0; i < active_nr; i++)
 				if (ce_intent_to_add(active_cache[i]))
 					ita_nr++;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 09/26] difftool: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-04-12 21:07     ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
                       ` (17 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index has
been expanded to a full one to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/difftool.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fddf..32c914dde6a0 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -584,6 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 		setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
 	rc = run_command_v_opt(helper_argv, flags);
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&wtindex);
+
 	/*
 	 * If the diff includes working copy files and those
 	 * files were modified during the diff, then the changes
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 10/26] fsck: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
                       ` (16 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying all blobs reachable from the index, ensure that a sparse
index has been expanded to a full one to avoid missing some blobs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fsck.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c706..4d7f5c63ce0d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -883,6 +883,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		verify_index_checksum = 1;
 		verify_ce_order = 1;
 		read_cache();
+		/* TODO: audit for interaction with sparse-index. */
+		ensure_full_index(&the_index);
 		for (i = 0; i < active_nr; i++) {
 			unsigned int mode;
 			struct blob *blob;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 11/26] grep: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
                       ` (15 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one so we do not miss blobs to scan. Later, this can
integrate more carefully with sparse indexes with proper testing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 4e91a253ac3b..c2d40414e975 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -504,6 +504,8 @@ static int grep_cache(struct grep_opt *opt,
 	if (repo_read_index(repo) < 0)
 		die(_("index file corrupt"));
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(repo->index);
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 12/26] ls-files: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (10 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
                       ` (14 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid missing files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/ls-files.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 4f9ed1fb29b7..a0b4e54d1149 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -317,6 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 
 	if (!(show_cached || show_stage || show_deleted || show_modified))
 		return;
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(repo->index);
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
@@ -494,6 +496,8 @@ void overlay_tree_on_index(struct index_state *istate,
 		die("bad tree-ish %s", tree_name);
 
 	/* Hoist the unmerged entries up to stage #3 to make room */
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		if (!ce_stage(ce))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 13/26] merge-index: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (11 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
                       ` (13 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/merge-index.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 38ea6ad6ca25..c0383fe9df9a 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -58,6 +58,8 @@ static void merge_one_path(const char *path)
 static void merge_all(void)
 {
 	int i;
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
@@ -80,6 +82,9 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
 
 	read_cache();
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
+
 	i = 1;
 	if (!strcmp(argv[i], "-o")) {
 		one_shot = 1;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 14/26] rm: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (12 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
                       ` (12 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/rm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f02..5559a0b453a3 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -293,6 +293,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 15/26] stash: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (13 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
                       ` (11 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/stash.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/stash.c b/builtin/stash.c
index ba774cce674f..6fb7178ef2fa 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1350,6 +1350,8 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		int i;
 		char *ps_matched = xcalloc(ps->nr, 1);
 
+		/* TODO: audit for interaction with sparse-index. */
+		ensure_full_index(&the_index);
 		for (i = 0; i < active_nr; i++)
 			ce_path_match(&the_index, active_cache[i], ps,
 				      ps_matched);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 16/26] update-index: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (14 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
                       ` (10 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/update-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 79087bccea4b..f1f16f2de526 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -745,6 +745,8 @@ static int do_reupdate(int ac, const char **av,
 		 */
 		has_head = 0;
  redo:
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(&the_index);
 	for (pos = 0; pos < active_nr; pos++) {
 		const struct cache_entry *ce = active_cache[pos];
 		struct cache_entry *old = NULL;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 17/26] dir: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (15 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dir.c b/dir.c
index 5b00dfb5b144..166238e79f52 100644
--- a/dir.c
+++ b/dir.c
@@ -3533,6 +3533,8 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
 	if (repo_read_index(&subrepo) < 0)
 		die(_("index file corrupt in repo %s"), subrepo.gitdir);
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(subrepo.index);
 	for (i = 0; i < subrepo.index->cache_nr; i++) {
 		const struct cache_entry *ce = subrepo.index->cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 18/26] entry: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (16 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 entry.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/entry.c b/entry.c
index 7b9f43716f76..891e4ba2b45a 100644
--- a/entry.c
+++ b/entry.c
@@ -412,6 +412,8 @@ static void mark_colliding_entries(const struct checkout *state,
 
 	ce->ce_flags |= CE_MATCHED;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(state->istate);
 	for (i = 0; i < state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 19/26] merge-recursive: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (17 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 merge-recursive.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index b8de7a704eae..91d8597728c1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -522,6 +522,8 @@ static struct string_list *get_unmerged(struct index_state *istate)
 
 	unmerged->strdup_strings = 1;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct string_list_item *item;
 		struct stage_data *e;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 20/26] pathspec: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (18 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 pathspec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/pathspec.c b/pathspec.c
index b6e333965cb4..d67688bab74b 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -36,6 +36,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 			num_unmatched++;
 	if (!num_unmatched)
 		return;
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		ce_path_match(istate, ce, pathspec, seen);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 21/26] read-cache: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (19 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 3698bc7bf77d..a9dcf0ab4f78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1577,6 +1577,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	 */
 	preload_index(istate, pathspec, 0);
 	trace2_region_enter("index", "refresh", NULL);
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new_entry;
 		int cache_errno = 0;
@@ -2498,6 +2500,8 @@ int repo_index_has_changes(struct repository *repo,
 		diff_flush(&opt);
 		return opt.flags.has_changes != 0;
 	} else {
+		/* TODO: audit for interaction with sparse-index. */
+		ensure_full_index(istate);
 		for (i = 0; sb && i < istate->cache_nr; i++) {
 			if (i)
 				strbuf_addch(sb, ' ');
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 22/26] resolve-undo: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (20 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 resolve-undo.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/resolve-undo.c b/resolve-undo.c
index 236320f179cb..ee36ca58b135 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -172,6 +172,8 @@ void unmerge_marked_index(struct index_state *istate)
 	if (!istate->resolve_undo)
 		return;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (ce->ce_flags & CE_MATCHED)
@@ -186,6 +188,8 @@ void unmerge_index(struct index_state *istate, const struct pathspec *pathspec)
 	if (!istate->resolve_undo)
 		return;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
 		if (!ce_path_match(istate, ce, pathspec, NULL))
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 23/26] revision: ensure full index
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (21 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all index entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior. This case could
be integrated later by ensuring that we walk the tree in the
sparse-directory entry, but the current behavior is only expecting
blobs. Save this integration for later when it can be properly tested.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/revision.c b/revision.c
index b78733f5089b..b72e0ac1bdca 100644
--- a/revision.c
+++ b/revision.c
@@ -1680,6 +1680,8 @@ static void do_add_index_objects_to_pending(struct rev_info *revs,
 {
 	int i;
 
+	/* TODO: audit for interaction with sparse-index. */
+	ensure_full_index(istate);
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct blob *blob;
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 24/26] name-hash: don't add directories to name_hash
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (22 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Sparse directory entries represent a directory that is outside the
sparse-checkout definition. These are not paths to blobs, so should not
be added to the name_hash table. Instead, they should be added to the
directory hashtable when 'ignore_case' is true.

Add a condition to avoid placing sparse directories into the name_hash
hashtable. This avoids filling the table with extra entries that will
never be queried.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index 4e03fac9bb12..d08deaa2c9e7 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -109,8 +109,11 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 	if (ce->ce_flags & CE_HASHED)
 		return;
 	ce->ce_flags |= CE_HASHED;
-	hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
-	hashmap_add(&istate->name_hash, &ce->ent);
+
+	if (!S_ISSPARSEDIR(ce->ce_mode)) {
+		hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
+		hashmap_add(&istate->name_hash, &ce->ent);
+	}
 
 	if (ignore_case)
 		add_dir_entry(istate, ce);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 25/26] sparse-index: expand_to_path()
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (23 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-12 21:08     ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
  2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.

When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.

Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.

The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.

There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.

To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
 sparse-index.h | 13 +++++++++
 2 files changed, 86 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 95ea17174da3..6f21397e2ee0 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -283,3 +283,76 @@ void ensure_full_index(struct index_state *istate)
 
 	trace2_region_leave("index", "ensure_full_index", istate->repo);
 }
+
+/*
+ * This static global helps avoid infinite recursion between
+ * expand_to_path() and index_file_exists().
+ */
+static int in_expand_to_path = 0;
+
+void expand_to_path(struct index_state *istate,
+		    const char *path, size_t pathlen, int icase)
+{
+	struct strbuf path_mutable = STRBUF_INIT;
+	size_t substr_len;
+
+	/* prevent extra recursion */
+	if (in_expand_to_path)
+		return;
+
+	if (!istate || !istate->sparse_index)
+		return;
+
+	if (!istate->repo)
+		istate->repo = the_repository;
+
+	in_expand_to_path = 1;
+
+	/*
+	 * We only need to actually expand a region if the
+	 * following are both true:
+	 *
+	 * 1. 'path' is not already in the index.
+	 * 2. Some parent directory of 'path' is a sparse directory.
+	 */
+
+	if (index_file_exists(istate, path, pathlen, icase))
+		goto cleanup;
+
+	strbuf_add(&path_mutable, path, pathlen);
+	strbuf_addch(&path_mutable, '/');
+
+	/* Check the name hash for all parent directories */
+	substr_len = 0;
+	while (substr_len < pathlen) {
+		char temp;
+		char *replace = strchr(path_mutable.buf + substr_len, '/');
+
+		if (!replace)
+			break;
+
+		/* replace the character _after_ the slash */
+		replace++;
+		temp = *replace;
+		*replace = '\0';
+		if (index_file_exists(istate, path_mutable.buf,
+				      path_mutable.len, icase)) {
+			/*
+			 * We found a parent directory in the name-hash
+			 * hashtable, because only sparse directory entries
+			 * have a trailing '/' character.  Since "path" wasn't
+			 * in the index, perhaps it exists within this
+			 * sparse-directory.  Expand accordingly.
+			 */
+			ensure_full_index(istate);
+			break;
+		}
+
+		*replace = temp;
+		substr_len = replace - path_mutable.buf;
+	}
+
+cleanup:
+	strbuf_release(&path_mutable);
+	in_expand_to_path = 0;
+}
diff --git a/sparse-index.h b/sparse-index.h
index 0268f38753c0..1115a0d7dd98 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -4,6 +4,19 @@
 struct index_state;
 int convert_to_sparse(struct index_state *istate);
 
+/*
+ * Some places in the codebase expect to search for a specific path.
+ * This path might be outside of the sparse-checkout definition, in
+ * which case a sparse-index may not contain a path for that index.
+ *
+ * Given an index and a path, check to see if a leading directory for
+ * 'path' exists in the index as a sparse directory. In that case,
+ * expand that sparse directory to a full range of cache entries and
+ * populate the index accordingly.
+ */
+void expand_to_path(struct index_state *istate,
+		    const char *path, size_t pathlen, int icase);
+
 struct repository;
 int set_sparse_index_config(struct repository *repo, int enable);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v3 26/26] name-hash: use expand_to_path()
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (24 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08     ` Derrick Stolee via GitGitGadget
  2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
  26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
  To: git
  Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A sparse-index loads the name-hash data for its entries, including the
sparse-directory entries. If a caller asks for a path that is contained
within a sparse-directory entry, we need to expand to a full index and
recalculate the name hash table before returning the result. Insert
calls to expand_to_path() to protect against this case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 name-hash.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index d08deaa2c9e7..383cf589969c 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "thread-utils.h"
 #include "trace2.h"
+#include "sparse-index.h"
 
 struct dir_entry {
 	struct hashmap_entry ent;
@@ -683,6 +684,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	struct dir_entry *dir;
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, 0);
 	dir = find_dir_entry(istate, name, namelen);
 	return dir && dir->nr;
 }
@@ -693,6 +695,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
 	const char *ptr = startPtr;
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, strlen(name), 0);
 	while (*ptr) {
 		while (*ptr && *ptr != '/')
 			ptr++;
@@ -716,6 +719,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
 	unsigned int hash = memihash(name, namelen);
 
 	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, icase);
 
 	ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
 					 struct cache_entry, ent);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v3 00/26] Sparse Index: API protections
  2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
                       ` (25 preceding siblings ...)
  2021-04-12 21:08     ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-13 16:02     ` Elijah Newren
  2021-04-14 20:44       ` Junio C Hamano
  26 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-04-13 16:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
	Matheus Tavares Bernardino, Derrick Stolee

On Mon, Apr 12, 2021 at 2:08 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the second patch series submission coming out of the sparse-index
> RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> This is based on ds/sparse-index.
>
> The point of this series is to insert protections for the consumers of the
> in-memory index to avoid unintended behavior change when using a sparse
> index versus a full one.
>
> We mark certain regions of code as needing a full index, so we call
> ensure_full_index() to expand a sparse index to a full one, if necessary.
> These protections are inserted file-by-file in every loop over all cache
> entries. Well, "most" loops, because some are going to be handled in the
> very next series so I leave them out.
>
> Many callers use index_name_pos() to find a path by name. In these cases, we
> can check if that position resolves to a sparse directory instance. In those
> cases, we just expand to a full index and run the search again.
>
> The last few patches deal with the name-hash hashtable for doing O(1)
> lookups.
>
> These protections don't do much right now, since the previous series created
> the_repository->settings.command_requires_full_index to guard all index
> reads and writes to ensure the in-memory copy is full for commands that have
> not been tested with the sparse index yet.
>
> However, after this series is complete, we now have a straight-forward plan
> for making commands "sparse aware" one-by-one:
>
>  1. Disable settings.command_requires_full_index to allow an in-memory
>     sparse-index.
>  2. Run versions of that command under a debugger, breaking on
>     ensure_full_index().
>  3. Examine the call stack to determine the context of that expansion, then
>     implement the proper behavior in those locations.
>  4. Add tests to ensure we are checking this logic in the presence of sparse
>     directory entries.
>
> I will admit that mostly it is the writing of the test cases that takes the
> most time in the conversions I've done so far.
>
>
> Updates in v3
> =============
>
>  * I updated based on Elijah's feedback.
>  * One new patch splits out a change that Elijah (rightfully) pointed out
>    did not belong with the patch it was originally in.
>
> I gave it time to see if any other comments came in, but it looks like
> review stabilized. I probably waited a bit longer than I should have.

This round looks good to me.

Reviewed-by: Elijah Newren <newren@gmail.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v3 00/26] Sparse Index: API protections
  2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
@ 2021-04-14 20:44       ` Junio C Hamano
  2021-04-15  2:42         ` Derrick Stolee
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2021-04-14 20:44 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Derrick Stolee, Matheus Tavares Bernardino, Derrick Stolee

Elijah Newren <newren@gmail.com> writes:

> This round looks good to me.
>
> Reviewed-by: Elijah Newren <newren@gmail.com>

Thanks; this kind of change inevitably would involve semantic
conflicts with topics in flight, but we've seen Derrick works well
together with others in such scenarios already, so let's run with
this version and see what happens.

Will merge to 'next' until there are new issues found, by the end of
the week at the latest.

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v3 00/26] Sparse Index: API protections
  2021-04-14 20:44       ` Junio C Hamano
@ 2021-04-15  2:42         ` Derrick Stolee
  0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-04-15  2:42 UTC (permalink / raw)
  To: Junio C Hamano, Elijah Newren
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	Matheus Tavares Bernardino, Derrick Stolee

On 4/14/2021 4:44 PM, Junio C Hamano wrote:
> Elijah Newren <newren@gmail.com> writes:
> 
>> This round looks good to me.
>>
>> Reviewed-by: Elijah Newren <newren@gmail.com>
> 
> Thanks; this kind of change inevitably would involve semantic
> conflicts with topics in flight, but we've seen Derrick works well
> together with others in such scenarios already, so let's run with
> this version and see what happens.

Thanks!

Semantic conflicts like the one in mt/add-rm-sparse-checkout won't
show up until I start relaxing command_requires_full_index, so I'll
need to be careful at those points. But I'll keep a lookout for
other issues contributors have when integrating across this change.

> Will merge to 'next' until there are new issues found, by the end of
> the week at the latest.

That timeline works for me.

-Stolee

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2021-04-15  2:42 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-03-19 21:01   ` Junio C Hamano
2021-03-20  1:45     ` Derrick Stolee
2021-03-20  1:52     ` Junio C Hamano
2021-03-30 16:53       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 02/27] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 03/27] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 04/27] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
2021-03-17 17:35   ` Elijah Newren
2021-03-17 20:35     ` Matheus Tavares Bernardino
2021-03-17 20:55       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
2021-03-17 17:50   ` Elijah Newren
2021-03-17 20:05     ` Derrick Stolee
2021-03-17 21:10       ` Elijah Newren
2021-03-17 21:33         ` Derrick Stolee
2021-03-17 22:36           ` Elijah Newren
2021-03-18  1:17             ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 07/27] checkout: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 08/27] commit: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 09/27] difftool: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 10/27] fsck: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 11/27] grep: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 12/27] ls-files: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 13/27] merge-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 14/27] rm: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
2021-03-18  5:22   ` Elijah Newren
2021-03-23 13:13     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 16/27] update-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
2021-03-18  5:24   ` Elijah Newren
2021-03-23 13:15     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 18/27] dir: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 19/27] entry: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
2021-03-18  5:31   ` Elijah Newren
2021-03-23 13:26     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 21/27] merge-recursive: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 22/27] pathspec: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 23/27] read-cache: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 24/27] resolve-undo: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 25/27] revision: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 26/27] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 27/27] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
2021-03-18  6:32   ` Elijah Newren
2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:32     ` Elijah Newren
2021-04-06 11:46       ` Derrick Stolee
2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:53     ` Elijah Newren
2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
2021-04-01 13:32     ` Derrick Stolee
2021-04-05 19:55   ` Elijah Newren
2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
2021-04-14 20:44       ` Junio C Hamano
2021-04-15  2:42         ` Derrick Stolee

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).