* [PATCH v2 01/25] sparse-index: API protection strategy
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
` (26 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Edit and expand the sparse-index design document with the plan for
guarding index operations with ensure_full_index().
Notably, the plan has changed to not have an expand_to_path() method in
favor of checking for a sparse-directory hit inside of the
index_path_pos() API.
The changes that follow this one will incrementally add
ensure_full_index() guards to iterations over all cache entries. Some
iterations over the cache entries are not protected due to a few
categories listed in the document. Since these are not being modified,
here is a short list of the files and methods that will not receive
these guards:
Looking for non-zero stage:
* builtin/add.c:chmod_pathspec()
* builtin/merge.c:count_unmerged_entries()
* merge-ort.c:record_conflicted_index_entries()
* read-cache.c:unmerged_index()
* rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
* revision.c:prepare_show_merge()
* sequencer.c:append_conflicts_hint()
* wt-status.c:wt_status_collect_changes_initial()
Looking for submodules:
* builtin/submodule--helper.c:module_list_compute()
* submodule.c: several methods
* worktree.c:validate_no_submodules()
Part of the index API:
* name-hash.c: lazy init methods
* preload-index.c:preload_thread(), preload_index()
* read-cache.c: file format methods
Checking for correct order of cache entries:
* read-cache.c:check_ce_order()
Ignores SKIP_WORKTREE entries or already aware:
* unpack-trees.c:mark_new_skip_worktree()
* wt-status.c:wt_status_check_sparse_checkout()
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
Documentation/technical/sparse-index.txt | 37 ++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt
index 8d3d80804604..3b24c1a219f8 100644
--- a/Documentation/technical/sparse-index.txt
+++ b/Documentation/technical/sparse-index.txt
@@ -85,8 +85,41 @@ index, as well.
Next, consumers of the index will be guarded against operating on a
sparse-index by inserting calls to `ensure_full_index()` or
-`expand_index_to_path()`. After these guards are in place, we can begin
-leaving sparse-directory entries in the in-memory index structure.
+`expand_index_to_path()`. If a specific path is requested, then those will
+be protected from within the `index_file_exists()` and `index_name_pos()`
+API calls: they will call `ensure_full_index()` if necessary. The
+intention here is to preserve existing behavior when interacting with a
+sparse-checkout. We don't want a change to happen by accident, without
+tests. Many of these locations may not need any change before removing the
+guards, but we should not do so without tests to ensure the expected
+behavior happens.
+
+It may be desirable to _change_ the behavior of some commands in the
+presence of a sparse index or more generally in any sparse-checkout
+scenario. In such cases, these should be carefully communicated and
+tested. No such behavior changes are intended during this phase.
+
+During a scan of the codebase, not every iteration of the cache entries
+needs an `ensure_full_index()` check. The basic reasons include:
+
+1. The loop is scanning for entries with non-zero stage. These entries
+ are not collapsed into a sparse-directory entry.
+
+2. The loop is scanning for submodules. These entries are not collapsed
+ into a sparse-directory entry.
+
+3. The loop is part of the index API, especially around reading or
+ writing the format.
+
+4. The loop is checking for correct order of cache entries and that is
+ correct if and only if the sparse-directory entries are in the correct
+ location.
+
+5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is
+ otherwise already aware of sparse directory entries.
+
+6. The sparse-index is disabled at this point when using the split-index
+ feature, so no effort is made to protect the split-index API.
Even after inserting these guards, we will keep expanding sparse-indexes
for most Git commands using the `command_requires_full_index` repository
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
` (25 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
attr.c | 14 +++++++-------
attr.h | 4 ++--
builtin/ls-files.c | 10 +++++-----
cache.h | 6 +++---
convert.c | 26 +++++++++++++-------------
convert.h | 20 ++++++++++----------
dir.c | 12 ++++++------
dir.h | 8 ++++----
merge-recursive.c | 2 +-
pathspec.c | 6 +++---
pathspec.h | 6 +++---
read-cache.c | 10 +++++-----
submodule.c | 6 +++---
submodule.h | 6 +++---
14 files changed, 68 insertions(+), 68 deletions(-)
diff --git a/attr.c b/attr.c
index 4ef85d668b54..8de553293ee4 100644
--- a/attr.c
+++ b/attr.c
@@ -718,7 +718,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
return res;
}
-static struct attr_stack *read_attr_from_index(const struct index_state *istate,
+static struct attr_stack *read_attr_from_index(struct index_state *istate,
const char *path,
int macro_ok)
{
@@ -748,7 +748,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
return res;
}
-static struct attr_stack *read_attr(const struct index_state *istate,
+static struct attr_stack *read_attr(struct index_state *istate,
const char *path, int macro_ok)
{
struct attr_stack *res = NULL;
@@ -840,7 +840,7 @@ static void push_stack(struct attr_stack **attr_stack_p,
}
}
-static void bootstrap_attr_stack(const struct index_state *istate,
+static void bootstrap_attr_stack(struct index_state *istate,
struct attr_stack **stack)
{
struct attr_stack *e;
@@ -878,7 +878,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
push_stack(stack, e, NULL, 0);
}
-static void prepare_attr_stack(const struct index_state *istate,
+static void prepare_attr_stack(struct index_state *istate,
const char *path, int dirlen,
struct attr_stack **stack)
{
@@ -1078,7 +1078,7 @@ static void determine_macros(struct all_attrs_item *all_attrs,
* If check->check_nr is non-zero, only attributes in check[] are collected.
* Otherwise all attributes are collected.
*/
-static void collect_some_attrs(const struct index_state *istate,
+static void collect_some_attrs(struct index_state *istate,
const char *path,
struct attr_check *check)
{
@@ -1107,7 +1107,7 @@ static void collect_some_attrs(const struct index_state *istate,
fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
}
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
const char *path,
struct attr_check *check)
{
@@ -1124,7 +1124,7 @@ void git_check_attr(const struct index_state *istate,
}
}
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
const char *path, struct attr_check *check)
{
int i;
diff --git a/attr.h b/attr.h
index 404548f028a8..3732505edae8 100644
--- a/attr.h
+++ b/attr.h
@@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check);
*/
const char *git_attr_name(const struct git_attr *);
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
const char *path, struct attr_check *check);
/*
* Retrieve all attributes that apply to the specified path.
* check holds the attributes and their values.
*/
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
const char *path, struct attr_check *check);
enum git_attr_direction {
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..4f9ed1fb29b7 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -57,7 +57,7 @@ static const char *tag_modified = "";
static const char *tag_skip_worktree = "";
static const char *tag_resolve_undo = "";
-static void write_eolinfo(const struct index_state *istate,
+static void write_eolinfo(struct index_state *istate,
const struct cache_entry *ce, const char *path)
{
if (show_eol) {
@@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce)
}
}
-static void show_dir_entry(const struct index_state *istate,
+static void show_dir_entry(struct index_state *istate,
const char *tag, struct dir_entry *ent)
{
int len = max_prefix_len;
@@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate,
write_name(ent->name);
}
-static void show_other_files(const struct index_state *istate,
+static void show_other_files(struct index_state *istate,
const struct dir_struct *dir)
{
int i;
@@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate,
}
}
-static void show_killed_files(const struct index_state *istate,
+static void show_killed_files(struct index_state *istate,
const struct dir_struct *dir)
{
int i;
@@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
}
}
-static void show_ru_info(const struct index_state *istate)
+static void show_ru_info(struct index_state *istate)
{
struct string_list_item *item;
diff --git a/cache.h b/cache.h
index 8aede373aeb3..5006278c13ca 100644
--- a/cache.h
+++ b/cache.h
@@ -800,7 +800,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
* index_name_pos(&index, "f", 1) -> -3
* index_name_pos(&index, "g", 1) -> -5
*/
-int index_name_pos(const struct index_state *, const char *name, int namelen);
+int index_name_pos(struct index_state *, const char *name, int namelen);
/*
* Some functions return the negative complement of an insert position when a
@@ -850,8 +850,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags);
int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
-int index_name_is_other(const struct index_state *, const char *, int);
-void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
+int index_name_is_other(struct index_state *, const char *, int);
+void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
/* do stat comparison even if CE_VALID is true */
#define CE_MATCH_IGNORE_VALID 01
diff --git a/convert.c b/convert.c
index ee360c2f07ce..20b19abffa36 100644
--- a/convert.c
+++ b/convert.c
@@ -138,7 +138,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
}
}
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
const char *path)
{
const char *ret;
@@ -222,7 +222,7 @@ static void check_global_conv_flags_eol(const char *path,
}
}
-static int has_crlf_in_index(const struct index_state *istate, const char *path)
+static int has_crlf_in_index(struct index_state *istate, const char *path)
{
unsigned long sz;
void *data;
@@ -496,7 +496,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
return 1;
}
-static int crlf_to_git(const struct index_state *istate,
+static int crlf_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *buf,
enum crlf_action crlf_action, int conv_flags)
@@ -1307,7 +1307,7 @@ struct conv_attrs {
static struct attr_check *check;
-static void convert_attrs(const struct index_state *istate,
+static void convert_attrs(struct index_state *istate,
struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
@@ -1369,7 +1369,7 @@ void reset_parsed_attributes(void)
user_convert_tail = NULL;
}
-int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
+int would_convert_to_git_filter_fd(struct index_state *istate, const char *path)
{
struct conv_attrs ca;
@@ -1388,7 +1388,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL);
}
-const char *get_convert_attr_ascii(const struct index_state *istate, const char *path)
+const char *get_convert_attr_ascii(struct index_state *istate, const char *path)
{
struct conv_attrs ca;
@@ -1414,7 +1414,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char
return "";
}
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags)
{
@@ -1448,7 +1448,7 @@ int convert_to_git(const struct index_state *istate,
return ret | ident_to_git(src, len, dst, ca.ident);
}
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
const char *path, int fd, struct strbuf *dst,
int conv_flags)
{
@@ -1466,7 +1466,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
ident_to_git(dst->buf, dst->len, dst, ca.ident);
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
int normalizing,
@@ -1510,7 +1510,7 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
return ret | ret_filter;
}
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
@@ -1519,7 +1519,7 @@ int async_convert_to_working_tree(const struct index_state *istate,
return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
}
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta)
@@ -1527,7 +1527,7 @@ int convert_to_working_tree(const struct index_state *istate,
return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
}
-int renormalize_buffer(const struct index_state *istate, const char *path,
+int renormalize_buffer(struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
@@ -1964,7 +1964,7 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
* Note that you would be crazy to set CRLF, smudge/clean or ident to a
* large binary blob you would want us not to slurp into the memory!
*/
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
const char *path,
const struct object_id *oid)
{
diff --git a/convert.h b/convert.h
index e29d1026a686..0f7c8a1f0446 100644
--- a/convert.h
+++ b/convert.h
@@ -65,41 +65,41 @@ struct checkout_metadata {
extern enum eol core_eol;
extern char *check_roundtrip_encoding;
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
const char *path);
const char *get_wt_convert_stats_ascii(const char *path);
-const char *get_convert_attr_ascii(const struct index_state *istate,
+const char *get_convert_attr_ascii(struct index_state *istate,
const char *path);
/* returns 1 if *dst was used */
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
void *dco);
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
-int renormalize_buffer(const struct index_state *istate,
+int renormalize_buffer(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst);
-static inline int would_convert_to_git(const struct index_state *istate,
+static inline int would_convert_to_git(struct index_state *istate,
const char *path)
{
return convert_to_git(istate, path, NULL, 0, NULL, 0);
}
/* Precondition: would_convert_to_git_filter_fd(path) == true */
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
const char *path, int fd,
struct strbuf *dst,
int conv_flags);
-int would_convert_to_git_filter_fd(const struct index_state *istate,
+int would_convert_to_git_filter_fd(struct index_state *istate,
const char *path);
/*
@@ -133,7 +133,7 @@ void reset_parsed_attributes(void);
struct stream_filter; /* opaque */
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
const char *path,
const struct object_id *);
void free_stream_filter(struct stream_filter *);
diff --git a/dir.c b/dir.c
index fd8aa7c40faa..5b00dfb5b144 100644
--- a/dir.c
+++ b/dir.c
@@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
* [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
* [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
*/
-static int match_pathspec_item(const struct index_state *istate,
+static int match_pathspec_item(struct index_state *istate,
const struct pathspec_item *item, int prefix,
const char *name, int namelen, unsigned flags)
{
@@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate,
* pathspec did not match any names, which could indicate that the
* user mistyped the nth pathspec.
*/
-static int do_match_pathspec(const struct index_state *istate,
+static int do_match_pathspec(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen,
@@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate,
return retval;
}
-static int match_pathspec_with_flags(const struct index_state *istate,
+static int match_pathspec_with_flags(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen, unsigned flags)
@@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate,
return negative ? 0 : positive;
}
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen, int is_dir)
@@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate,
/**
* Check if a submodule is a superset of the pathspec
*/
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
const struct pathspec *ps,
const char *submodule_name,
char *seen)
@@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base,
add_pattern_to_hashsets(pl, pattern);
}
-static int read_skip_worktree_file_from_index(const struct index_state *istate,
+static int read_skip_worktree_file_from_index(struct index_state *istate,
const char *path,
size_t *size_out, char **data_out,
struct oid_stat *oid_stat)
diff --git a/dir.h b/dir.h
index facfae47402a..51cb0e217247 100644
--- a/dir.h
+++ b/dir.h
@@ -354,7 +354,7 @@ int count_slashes(const char *s);
int simple_length(const char *match);
int no_wildcard(const char *string);
char *common_prefix(const struct pathspec *pathspec);
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
const struct pathspec *pathspec,
const char *name, int namelen,
int prefix, char *seen, int is_dir);
@@ -492,12 +492,12 @@ int git_fnmatch(const struct pathspec_item *item,
const char *pattern, const char *string,
int prefix);
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
const struct pathspec *ps,
const char *submodule_name,
char *seen);
-static inline int ce_path_match(const struct index_state *istate,
+static inline int ce_path_match(struct index_state *istate,
const struct cache_entry *ce,
const struct pathspec *pathspec,
char *seen)
@@ -506,7 +506,7 @@ static inline int ce_path_match(const struct index_state *istate,
S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
}
-static inline int dir_path_match(const struct index_state *istate,
+static inline int dir_path_match(struct index_state *istate,
const struct dir_entry *ent,
const struct pathspec *pathspec,
int prefix, char *seen)
diff --git a/merge-recursive.c b/merge-recursive.c
index 3d9207455b3a..b8de7a704eae 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2987,7 +2987,7 @@ static int blob_unchanged(struct merge_options *opt,
struct strbuf obuf = STRBUF_INIT;
struct strbuf abuf = STRBUF_INIT;
int ret = 0; /* assume changed for safety */
- const struct index_state *idx = opt->repo->index;
+ struct index_state *idx = opt->repo->index;
if (a->mode != o->mode)
return 0;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22f2..b6e333965cb4 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -20,7 +20,7 @@
* to use find_pathspecs_matching_against_index() instead.
*/
void add_pathspec_matches_against_index(const struct pathspec *pathspec,
- const struct index_state *istate,
+ struct index_state *istate,
char *seen)
{
int num_unmatched = 0, i;
@@ -51,7 +51,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
* given pathspecs achieves against all items in the index.
*/
char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
- const struct index_state *istate)
+ struct index_state *istate)
{
char *seen = xcalloc(pathspec->nr, 1);
add_pathspec_matches_against_index(pathspec, istate, seen);
@@ -702,7 +702,7 @@ void clear_pathspec(struct pathspec *pathspec)
pathspec->nr = 0;
}
-int match_pathspec_attrs(const struct index_state *istate,
+int match_pathspec_attrs(struct index_state *istate,
const char *name, int namelen,
const struct pathspec_item *item)
{
diff --git a/pathspec.h b/pathspec.h
index 454ce364fac7..2ccc8080b6cc 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -150,11 +150,11 @@ static inline int ps_strcmp(const struct pathspec_item *item,
}
void add_pathspec_matches_against_index(const struct pathspec *pathspec,
- const struct index_state *istate,
+ struct index_state *istate,
char *seen);
char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
- const struct index_state *istate);
-int match_pathspec_attrs(const struct index_state *istate,
+ struct index_state *istate);
+int match_pathspec_attrs(struct index_state *istate,
const char *name, int namelen,
const struct pathspec_item *item);
diff --git a/read-cache.c b/read-cache.c
index 2410e6e0df13..3ad94578095e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -549,7 +549,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
return 0;
}
-static int index_name_stage_pos(const struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
{
int first, last;
@@ -570,7 +570,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na
return -first-1;
}
-int index_name_pos(const struct index_state *istate, const char *name, int namelen)
+int index_name_pos(struct index_state *istate, const char *name, int namelen)
{
return index_name_stage_pos(istate, name, namelen, 0);
}
@@ -3389,8 +3389,8 @@ int repo_read_index_unmerged(struct repository *repo)
* We helpfully remove a trailing "/" from directories so that
* the output of read_directory can be used as-is.
*/
-int index_name_is_other(const struct index_state *istate, const char *name,
- int namelen)
+int index_name_is_other(struct index_state *istate, const char *name,
+ int namelen)
{
int pos;
if (namelen && name[namelen - 1] == '/')
@@ -3408,7 +3408,7 @@ int index_name_is_other(const struct index_state *istate, const char *name,
return 1;
}
-void *read_blob_data_from_index(const struct index_state *istate,
+void *read_blob_data_from_index(struct index_state *istate,
const char *path, unsigned long *size)
{
int pos, len;
diff --git a/submodule.c b/submodule.c
index 9767ba9893cc..83809a4f7bd2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -33,7 +33,7 @@ static struct oid_array ref_tips_after_fetch;
* will be disabled because we can't guess what might be configured in
* .gitmodules unless the user resolves the conflict.
*/
-int is_gitmodules_unmerged(const struct index_state *istate)
+int is_gitmodules_unmerged(struct index_state *istate)
{
int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE));
if (pos < 0) { /* .gitmodules not found or isn't merged */
@@ -301,7 +301,7 @@ int is_submodule_populated_gently(const char *path, int *return_error_code)
/*
* Dies if the provided 'prefix' corresponds to an unpopulated submodule
*/
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
const char *prefix)
{
int i, prefixlen;
@@ -331,7 +331,7 @@ void die_in_unpopulated_submodule(const struct index_state *istate,
/*
* Dies if any paths in the provided pathspec descends into a submodule
*/
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
const struct pathspec *ps)
{
int i, j;
diff --git a/submodule.h b/submodule.h
index 4ac6e31cf1f7..84640c49c114 100644
--- a/submodule.h
+++ b/submodule.h
@@ -39,7 +39,7 @@ struct submodule_update_strategy {
};
#define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
-int is_gitmodules_unmerged(const struct index_state *istate);
+int is_gitmodules_unmerged(struct index_state *istate);
int is_writing_gitmodules_ok(void);
int is_staging_gitmodules_ok(struct index_state *istate);
int update_path_in_gitmodules(const char *oldpath, const char *newpath);
@@ -60,9 +60,9 @@ int is_submodule_active(struct repository *repo, const char *path);
* Otherwise the return error code is the same as of resolve_gitdir_gently.
*/
int is_submodule_populated_gently(const char *path, int *return_error_code);
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
const char *prefix);
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
const struct pathspec *ps);
enum submodule_update_type parse_submodule_update_type(const char *value);
int parse_submodule_update_strategy(const char *value,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
` (24 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.
In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.
Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
read-cache.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/read-cache.c b/read-cache.c
index 3ad94578095e..3698bc7bf77d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -567,6 +567,27 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
}
first = next+1;
}
+
+ if (istate->sparse_index &&
+ first > 0) {
+ /* Note: first <= istate->cache_nr */
+ struct cache_entry *ce = istate->cache[first - 1];
+
+ /*
+ * If we are in a sparse-index _and_ the entry before the
+ * insertion position is a sparse-directory entry that is
+ * an ancestor of 'name', then we need to expand the index
+ * and search again. This will only trigger once, because
+ * thereafter the index is fully expanded.
+ */
+ if (S_ISSPARSEDIR(ce->ce_mode) &&
+ ce_namelen(ce) < namelen &&
+ !strncmp(name, ce->name, ce_namelen(ce))) {
+ ensure_full_index(istate);
+ return index_name_stage_pos(istate, name, namelen, stage);
+ }
+ }
+
return -first-1;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 04/25] cache: move ensure_full_index() to cache.h
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
` (23 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Soon we will insert ensure_full_index() calls across the codebase.
Instead of also adding include statements for sparse-index.h, let's just
use the fact that anything that cares about the index already has
cache.h in its includes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
cache.h | 1 +
sparse-index.h | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/cache.h b/cache.h
index 5006278c13ca..b7e20e9778db 100644
--- a/cache.h
+++ b/cache.h
@@ -350,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce);
void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
void free_name_hash(struct index_state *istate);
+void ensure_full_index(struct index_state *istate);
/* Cache entry creation and cleanup */
diff --git a/sparse-index.h b/sparse-index.h
index 39dcc859735e..0268f38753c0 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -2,7 +2,6 @@
#define SPARSE_INDEX_H__
struct index_state;
-void ensure_full_index(struct index_state *istate);
int convert_to_sparse(struct index_state *istate);
struct repository;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 05/25] add: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
` (22 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/add.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3a2..afccf2fd5543 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -141,6 +141,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
{
int i, retval = 0;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 06/25] checkout-index: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
` (21 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before we iterate over all cache entries, ensure that the index is not
sparse. This loop in checkout_all() might be safe to iterate over a
sparse index, but let's put this protection here until it can be
carefully tested.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/checkout-index.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 023e49e271c2..2c2936a9dae0 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -119,6 +119,8 @@ static void checkout_all(const char *prefix, int prefix_length)
int i, errs = 0;
struct cache_entry *last_ce = NULL;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr ; i++) {
struct cache_entry *ce = active_cache[i];
if (ce_stage(ce) != checkout_stage
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 07/25] checkout: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
` (20 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries in the checkout builtin, ensure
that we have a full index to avoid any unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/checkout.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0e6639052001..d0dbe63ea119 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -368,6 +368,9 @@ static int checkout_worktree(const struct checkout_opts *opts,
NULL);
enable_delayed_checkout(&state);
+
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++) {
struct cache_entry *ce = active_cache[pos];
if (ce->ce_flags & CE_MATCHED) {
@@ -512,6 +515,8 @@ static int checkout_paths(const struct checkout_opts *opts,
* Make sure all pathspecs participated in locating the paths
* to be checked out.
*/
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++)
if (opts->overlay_mode)
mark_ce_for_checkout_overlay(active_cache[pos],
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 08/25] commit: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (6 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
` (19 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
These two loops iterate over all cache entries, so ensure that a sparse
index is expanded to a full index before we do so.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/commit.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/builtin/commit.c b/builtin/commit.c
index 739110c5a7f6..cf0c36d1dcb2 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -251,6 +251,8 @@ static int list_paths(struct string_list *list, const char *with_tree,
free(max_prefix);
}
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
struct string_list_item *item;
@@ -931,6 +933,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
if (get_oid(parent, &oid)) {
int i, ita_nr = 0;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++)
if (ce_intent_to_add(active_cache[i]))
ita_nr++;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 09/25] difftool: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (7 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
` (18 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index has
been expanded to a full one to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/difftool.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fddf..32c914dde6a0 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -584,6 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
rc = run_command_v_opt(helper_argv, flags);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&wtindex);
+
/*
* If the diff includes working copy files and those
* files were modified during the diff, then the changes
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 10/25] fsck: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (8 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
` (17 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
When verifying all blobs reachable from the index, ensure that a sparse
index has been expanded to a full one to avoid missing some blobs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/fsck.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c706..4d7f5c63ce0d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -883,6 +883,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
verify_index_checksum = 1;
verify_ce_order = 1;
read_cache();
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
unsigned int mode;
struct blob *blob;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 11/25] grep: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (9 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
` (16 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one so we do not miss blobs to scan. Later, this can
integrate more carefully with sparse indexes with proper testing.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/grep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/grep.c b/builtin/grep.c
index 4e91a253ac3b..c2d40414e975 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -504,6 +504,8 @@ static int grep_cache(struct grep_opt *opt,
if (repo_read_index(repo) < 0)
die(_("index file corrupt"));
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(repo->index);
for (nr = 0; nr < repo->index->cache_nr; nr++) {
const struct cache_entry *ce = repo->index->cache[nr];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 12/25] ls-files: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (10 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
` (15 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid missing files.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/ls-files.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 4f9ed1fb29b7..a0b4e54d1149 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -317,6 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
if (!(show_cached || show_stage || show_deleted || show_modified))
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(repo->index);
for (i = 0; i < repo->index->cache_nr; i++) {
const struct cache_entry *ce = repo->index->cache[i];
struct stat st;
@@ -494,6 +496,8 @@ void overlay_tree_on_index(struct index_state *istate,
die("bad tree-ish %s", tree_name);
/* Hoist the unmerged entries up to stage #3 to make room */
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
if (!ce_stage(ce))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 13/25] merge-index: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (11 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
` (14 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/merge-index.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 38ea6ad6ca25..c0383fe9df9a 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -58,6 +58,8 @@ static void merge_one_path(const char *path)
static void merge_all(void)
{
int i;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
@@ -80,6 +82,9 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
read_cache();
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
+
i = 1;
if (!strcmp(argv[i], "-o")) {
one_shot = 1;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 14/25] rm: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (12 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
` (13 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/rm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f02..5559a0b453a3 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -293,6 +293,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
seen = xcalloc(pathspec.nr, 1);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
if (!ce_path_match(&the_index, ce, &pathspec, seen))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 15/25] stash: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (13 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
` (12 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/stash.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/stash.c b/builtin/stash.c
index ba774cce674f..6fb7178ef2fa 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1350,6 +1350,8 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
int i;
char *ps_matched = xcalloc(ps->nr, 1);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++)
ce_path_match(&the_index, active_cache[i], ps,
ps_matched);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 16/25] update-index: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (14 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
` (11 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/update-index.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 79087bccea4b..f1f16f2de526 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -745,6 +745,8 @@ static int do_reupdate(int ac, const char **av,
*/
has_head = 0;
redo:
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++) {
const struct cache_entry *ce = active_cache[pos];
struct cache_entry *old = NULL;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 17/25] dir: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (15 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
` (10 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
dir.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/dir.c b/dir.c
index 5b00dfb5b144..166238e79f52 100644
--- a/dir.c
+++ b/dir.c
@@ -3533,6 +3533,8 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
if (repo_read_index(&subrepo) < 0)
die(_("index file corrupt in repo %s"), subrepo.gitdir);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(subrepo.index);
for (i = 0; i < subrepo.index->cache_nr; i++) {
const struct cache_entry *ce = subrepo.index->cache[i];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 18/25] entry: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (16 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
` (9 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
entry.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/entry.c b/entry.c
index 7b9f43716f76..891e4ba2b45a 100644
--- a/entry.c
+++ b/entry.c
@@ -412,6 +412,8 @@ static void mark_colliding_entries(const struct checkout *state,
ce->ce_flags |= CE_MATCHED;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(state->istate);
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 19/25] merge-recursive: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (17 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
` (8 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
merge-recursive.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/merge-recursive.c b/merge-recursive.c
index b8de7a704eae..91d8597728c1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -522,6 +522,8 @@ static struct string_list *get_unmerged(struct index_state *istate)
unmerged->strdup_strings = 1;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct string_list_item *item;
struct stage_data *e;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 20/25] pathspec: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (18 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
` (7 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
pathspec.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/pathspec.c b/pathspec.c
index b6e333965cb4..d67688bab74b 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -36,6 +36,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
num_unmatched++;
if (!num_unmatched)
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
ce_path_match(istate, ce, pathspec, seen);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 21/25] read-cache: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (19 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:49 ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
` (6 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
read-cache.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/read-cache.c b/read-cache.c
index 3698bc7bf77d..a9dcf0ab4f78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1577,6 +1577,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
*/
preload_index(istate, pathspec, 0);
trace2_region_enter("index", "refresh", NULL);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce, *new_entry;
int cache_errno = 0;
@@ -2498,6 +2500,8 @@ int repo_index_has_changes(struct repository *repo,
diff_flush(&opt);
return opt.flags.has_changes != 0;
} else {
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; sb && i < istate->cache_nr; i++) {
if (i)
strbuf_addch(sb, ' ');
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 22/25] resolve-undo: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (20 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:49 ` Derrick Stolee via GitGitGadget
2021-04-01 1:50 ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:49 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
resolve-undo.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/resolve-undo.c b/resolve-undo.c
index 236320f179cb..ee36ca58b135 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -172,6 +172,8 @@ void unmerge_marked_index(struct index_state *istate)
if (!istate->resolve_undo)
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
if (ce->ce_flags & CE_MATCHED)
@@ -186,6 +188,8 @@ void unmerge_index(struct index_state *istate, const struct pathspec *pathspec)
if (!istate->resolve_undo)
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
if (!ce_path_match(istate, ce, pathspec, NULL))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 23/25] revision: ensure full index
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (21 preceding siblings ...)
2021-04-01 1:49 ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:50 ` Derrick Stolee via GitGitGadget
2021-04-01 1:50 ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
27 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:50 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all index entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior. This case could
be integrated later by ensuring that we walk the tree in the
sparse-directory entry, but the current behavior is only expecting
blobs. Save this integration for later when it can be properly tested.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
revision.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/revision.c b/revision.c
index b78733f5089b..b72e0ac1bdca 100644
--- a/revision.c
+++ b/revision.c
@@ -1680,6 +1680,8 @@ static void do_add_index_objects_to_pending(struct rev_info *revs,
{
int i;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
struct blob *blob;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v2 24/25] sparse-index: expand_to_path()
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (22 preceding siblings ...)
2021-04-01 1:50 ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
@ 2021-04-01 1:50 ` Derrick Stolee via GitGitGadget
2021-04-05 19:32 ` Elijah Newren
2021-04-01 1:50 ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
27 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:50 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.
When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.
Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.
The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.
There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.
To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
sparse-index.h | 13 +++++++++
2 files changed, 85 insertions(+)
diff --git a/sparse-index.c b/sparse-index.c
index 95ea17174da3..8a1223041296 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate)
trace2_region_leave("index", "ensure_full_index", istate->repo);
}
+
+/*
+ * This static global helps avoid infinite recursion between
+ * expand_to_path() and index_file_exists().
+ */
+static int in_expand_to_path = 0;
+
+void expand_to_path(struct index_state *istate,
+ const char *path, size_t pathlen, int icase)
+{
+ struct strbuf path_mutable = STRBUF_INIT;
+ size_t substr_len;
+
+ /* prevent extra recursion */
+ if (in_expand_to_path)
+ return;
+
+ if (!istate || !istate->sparse_index)
+ return;
+
+ if (!istate->repo)
+ istate->repo = the_repository;
+
+ in_expand_to_path = 1;
+
+ /*
+ * We only need to actually expand a region if the
+ * following are both true:
+ *
+ * 1. 'path' is not already in the index.
+ * 2. Some parent directory of 'path' is a sparse directory.
+ */
+
+ if (index_file_exists(istate, path, pathlen, icase))
+ goto cleanup;
+
+ strbuf_add(&path_mutable, path, pathlen);
+ strbuf_addch(&path_mutable, '/');
+
+ /* Check the name hash for all parent directories */
+ substr_len = 0;
+ while (substr_len < pathlen) {
+ char temp;
+ char *replace = strchr(path_mutable.buf + substr_len, '/');
+
+ if (!replace)
+ break;
+
+ /* replace the character _after_ the slash */
+ replace++;
+ temp = *replace;
+ *replace = '\0';
+ if (index_file_exists(istate, path_mutable.buf,
+ path_mutable.len, icase)) {
+ /*
+ * We found a parent directory in the name-hash
+ * hashtable, which means that this entry could
+ * exist within a sparse-directory entry. Expand
+ * accordingly.
+ */
+ ensure_full_index(istate);
+ break;
+ }
+
+ *replace = temp;
+ substr_len = replace - path_mutable.buf;
+ }
+
+cleanup:
+ strbuf_release(&path_mutable);
+ in_expand_to_path = 0;
+}
diff --git a/sparse-index.h b/sparse-index.h
index 0268f38753c0..1115a0d7dd98 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -4,6 +4,19 @@
struct index_state;
int convert_to_sparse(struct index_state *istate);
+/*
+ * Some places in the codebase expect to search for a specific path.
+ * This path might be outside of the sparse-checkout definition, in
+ * which case a sparse-index may not contain a path for that index.
+ *
+ * Given an index and a path, check to see if a leading directory for
+ * 'path' exists in the index as a sparse directory. In that case,
+ * expand that sparse directory to a full range of cache entries and
+ * populate the index accordingly.
+ */
+void expand_to_path(struct index_state *istate,
+ const char *path, size_t pathlen, int icase);
+
struct repository;
int set_sparse_index_config(struct repository *repo, int enable);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* Re: [PATCH v2 24/25] sparse-index: expand_to_path()
2021-04-01 1:50 ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-05 19:32 ` Elijah Newren
2021-04-06 11:46 ` Derrick Stolee
0 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-04-05 19:32 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee
On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Some users of the index API have a specific path they are looking for,
> but choose to use index_file_exists() to rely on the name-hash hashtable
> instead of doing binary search with index_name_pos(). These users only
> need to know a yes/no answer, not a position within the cache array.
>
> When the index is sparse, the name-hash hash table does not contain the
> full list of paths within sparse directories. It _does_ contain the
> directory names for the sparse-directory entries.
>
> Create a helper function, expand_to_path(), for intended use with the
> name-hash hashtable functions. The integration with name-hash.c will
> follow in a later change.
>
> The solution here is to use ensure_full_index() when we determine that
> the requested path is within a sparse directory entry. This will
> populate the name-hash hashtable as the index is recomputed from
> scratch.
>
> There may be cases where the caller is trying to find an untracked path
> that is not in the index but also is not within a sparse directory
> entry. We want to minimize the overhead for these requests. If we used
> index_name_pos() to find the insertion order of the path, then we could
> determine from that position if a sparse-directory exists. (In fact,
> just calling index_name_pos() in that case would lead to expanding the
> index to a full index.) However, this takes O(log N) time where N is the
> number of cache entries.
>
> To keep the performance of this call based mostly on the input string,
> use index_file_exists() to look for the ancestors of the path. Using the
> heuristic that a sparse directory is likely to have a small number of
> parent directories, we start from the bottom and build up. Use a string
> buffer to allow mutating the path name to terminate after each slash for
> each hashset test.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
> sparse-index.h | 13 +++++++++
> 2 files changed, 85 insertions(+)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 95ea17174da3..8a1223041296 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate)
>
> trace2_region_leave("index", "ensure_full_index", istate->repo);
> }
> +
> +/*
> + * This static global helps avoid infinite recursion between
> + * expand_to_path() and index_file_exists().
> + */
> +static int in_expand_to_path = 0;
> +
> +void expand_to_path(struct index_state *istate,
> + const char *path, size_t pathlen, int icase)
> +{
> + struct strbuf path_mutable = STRBUF_INIT;
> + size_t substr_len;
> +
> + /* prevent extra recursion */
> + if (in_expand_to_path)
> + return;
> +
> + if (!istate || !istate->sparse_index)
> + return;
> +
> + if (!istate->repo)
> + istate->repo = the_repository;
> +
> + in_expand_to_path = 1;
> +
> + /*
> + * We only need to actually expand a region if the
> + * following are both true:
> + *
> + * 1. 'path' is not already in the index.
> + * 2. Some parent directory of 'path' is a sparse directory.
> + */
> +
> + if (index_file_exists(istate, path, pathlen, icase))
> + goto cleanup;
> +
> + strbuf_add(&path_mutable, path, pathlen);
> + strbuf_addch(&path_mutable, '/');
> +
> + /* Check the name hash for all parent directories */
> + substr_len = 0;
> + while (substr_len < pathlen) {
> + char temp;
> + char *replace = strchr(path_mutable.buf + substr_len, '/');
> +
> + if (!replace)
> + break;
> +
> + /* replace the character _after_ the slash */
> + replace++;
> + temp = *replace;
> + *replace = '\0';
> + if (index_file_exists(istate, path_mutable.buf,
> + path_mutable.len, icase)) {
> + /*
> + * We found a parent directory in the name-hash
> + * hashtable, which means that this entry could
> + * exist within a sparse-directory entry. Expand
> + * accordingly.
"this entry" is a bit unclear; it might be worth referring to the
"path" variable instead. I think it'd also help to explain the
reasoning for using the '/' character, because while it's clear when
looking at the series, just running across it in the code it might be
easy to forget or miss. Perhaps:
* We found a parent directory in the name-hash
* hashtable, because only sparse directory entries
* have a trailing '/' character. Since "path" wasn't
* in the index, perhaps it exists within this
* sparse-directory. Expand accordingly.
> + */
> + ensure_full_index(istate);
> + break;
> + }
> +
> + *replace = temp;
> + substr_len = replace - path_mutable.buf;
> + }
> +
> +cleanup:
> + strbuf_release(&path_mutable);
> + in_expand_to_path = 0;
> +}
> diff --git a/sparse-index.h b/sparse-index.h
> index 0268f38753c0..1115a0d7dd98 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -4,6 +4,19 @@
> struct index_state;
> int convert_to_sparse(struct index_state *istate);
>
> +/*
> + * Some places in the codebase expect to search for a specific path.
> + * This path might be outside of the sparse-checkout definition, in
> + * which case a sparse-index may not contain a path for that index.
> + *
> + * Given an index and a path, check to see if a leading directory for
> + * 'path' exists in the index as a sparse directory. In that case,
> + * expand that sparse directory to a full range of cache entries and
> + * populate the index accordingly.
> + */
> +void expand_to_path(struct index_state *istate,
> + const char *path, size_t pathlen, int icase);
> +
> struct repository;
> int set_sparse_index_config(struct repository *repo, int enable);
>
> --
> gitgitgadget
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v2 24/25] sparse-index: expand_to_path()
2021-04-05 19:32 ` Elijah Newren
@ 2021-04-06 11:46 ` Derrick Stolee
0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-04-06 11:46 UTC (permalink / raw)
To: Elijah Newren, Derrick Stolee via GitGitGadget
Cc: Git Mailing List, Junio C Hamano, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
On 4/5/2021 3:32 PM, Elijah Newren wrote:
> On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>>> + if (index_file_exists(istate, path_mutable.buf,
>> + path_mutable.len, icase)) {
>> + /*
>> + * We found a parent directory in the name-hash
>> + * hashtable, which means that this entry could
>> + * exist within a sparse-directory entry. Expand
>> + * accordingly.
>
> "this entry" is a bit unclear; it might be worth referring to the
> "path" variable instead. I think it'd also help to explain the
> reasoning for using the '/' character, because while it's clear when
> looking at the series, just running across it in the code it might be
> easy to forget or miss. Perhaps:
>
> * We found a parent directory in the name-hash
> * hashtable, because only sparse directory entries
> * have a trailing '/' character. Since "path" wasn't
> * in the index, perhaps it exists within this
> * sparse-directory. Expand accordingly.
A good recommendation. Thanks.
-Stolee
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v2 25/25] name-hash: use expand_to_path()
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (23 preceding siblings ...)
2021-04-01 1:50 ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-01 1:50 ` Derrick Stolee via GitGitGadget
2021-04-05 19:53 ` Elijah Newren
2021-04-01 7:07 ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
` (2 subsequent siblings)
27 siblings, 1 reply; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-01 1:50 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
A sparse-index loads the name-hash data for its entries, including the
sparse-directory entries. If a caller asks for a path that is contained
within a sparse-directory entry, we need to expand to a full index and
recalculate the name hash table before returning the result. Insert
calls to expand_to_path() to protect against this case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
name-hash.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/name-hash.c b/name-hash.c
index 4e03fac9bb12..75c159e06eeb 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,6 +8,7 @@
#include "cache.h"
#include "thread-utils.h"
#include "trace2.h"
+#include "sparse-index.h"
struct dir_entry {
struct hashmap_entry ent;
@@ -109,6 +110,12 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
if (ce->ce_flags & CE_HASHED)
return;
ce->ce_flags |= CE_HASHED;
+
+ if (S_ISSPARSEDIR(ce->ce_mode)) {
+ add_dir_entry(istate, ce);
+ return;
+ }
+
hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
hashmap_add(&istate->name_hash, &ce->ent);
@@ -680,6 +687,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
struct dir_entry *dir;
lazy_init_name_hash(istate);
+ expand_to_path(istate, name, namelen, 0);
dir = find_dir_entry(istate, name, namelen);
return dir && dir->nr;
}
@@ -690,6 +698,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
const char *ptr = startPtr;
lazy_init_name_hash(istate);
+ expand_to_path(istate, name, strlen(name), 0);
while (*ptr) {
while (*ptr && *ptr != '/')
ptr++;
@@ -713,6 +722,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
unsigned int hash = memihash(name, namelen);
lazy_init_name_hash(istate);
+ expand_to_path(istate, name, namelen, icase);
ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
struct cache_entry, ent);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* Re: [PATCH v2 25/25] name-hash: use expand_to_path()
2021-04-01 1:50 ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-05 19:53 ` Elijah Newren
0 siblings, 0 replies; 111+ messages in thread
From: Elijah Newren @ 2021-04-05 19:53 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
Matheus Tavares Bernardino, Derrick Stolee, Derrick Stolee
On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> A sparse-index loads the name-hash data for its entries, including the
> sparse-directory entries. If a caller asks for a path that is contained
> within a sparse-directory entry, we need to expand to a full index and
> recalculate the name hash table before returning the result. Insert
> calls to expand_to_path() to protect against this case.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> name-hash.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/name-hash.c b/name-hash.c
> index 4e03fac9bb12..75c159e06eeb 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -8,6 +8,7 @@
> #include "cache.h"
> #include "thread-utils.h"
> #include "trace2.h"
> +#include "sparse-index.h"
>
> struct dir_entry {
> struct hashmap_entry ent;
> @@ -109,6 +110,12 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
> if (ce->ce_flags & CE_HASHED)
> return;
> ce->ce_flags |= CE_HASHED;
> +
> + if (S_ISSPARSEDIR(ce->ce_mode)) {
> + add_dir_entry(istate, ce);
> + return;
> + }
I don't see how this relates to the commit message. Was this meant to
be included in another commit? If it is intended to be included here,
could an explanation be added?
> +
> hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
> hashmap_add(&istate->name_hash, &ce->ent);
>
> @@ -680,6 +687,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
> struct dir_entry *dir;
>
> lazy_init_name_hash(istate);
> + expand_to_path(istate, name, namelen, 0);
> dir = find_dir_entry(istate, name, namelen);
> return dir && dir->nr;
> }
> @@ -690,6 +698,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
> const char *ptr = startPtr;
>
> lazy_init_name_hash(istate);
> + expand_to_path(istate, name, strlen(name), 0);
> while (*ptr) {
> while (*ptr && *ptr != '/')
> ptr++;
> @@ -713,6 +722,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
> unsigned int hash = memihash(name, namelen);
>
> lazy_init_name_hash(istate);
> + expand_to_path(istate, name, namelen, icase);
>
> ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
> struct cache_entry, ent);
> --
> gitgitgadget
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v2 00/25] Sparse Index: API protections
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (24 preceding siblings ...)
2021-04-01 1:50 ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-01 7:07 ` Junio C Hamano
2021-04-01 13:32 ` Derrick Stolee
2021-04-05 19:55 ` Elijah Newren
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
27 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2021-04-01 7:07 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, newren, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Here is the second patch series submission coming out of the sparse-index
> RFC [1].
Queued. There were a few conflicts with new codepaths and I've
touched up a few of them (Alban's merge-strategies-in-c moves some
code that needs to dehydrate the sparse index entries from
builtin/merge-index.c to a new file, Elijah's ort has "const"
pointer to the istate that you need to be writable), but there may
be more that I missed. Please check the tip of 'seen' when it gets
pushed out.
Thanks.
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v2 00/25] Sparse Index: API protections
2021-04-01 7:07 ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
@ 2021-04-01 13:32 ` Derrick Stolee
0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-04-01 13:32 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget
Cc: git, newren, Matheus Tavares Bernardino, Derrick Stolee
On 4/1/2021 3:07 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Here is the second patch series submission coming out of the sparse-index
>> RFC [1].
>
> Queued. There were a few conflicts with new codepaths and I've
> touched up a few of them (Alban's merge-strategies-in-c moves some
> code that needs to dehydrate the sparse index entries from
> builtin/merge-index.c to a new file, Elijah's ort has "const"
> pointer to the istate that you need to be writable), but there may
> be more that I missed. Please check the tip of 'seen' when it gets
> pushed out.
Thank you for dealing with these complicated merges. I promise
that the series after this one are much more focused in their
code footprint. Thus, they should be easier to merge with
ongoing topics.
I validated your merge by rebasing the rest of my current work
in this area onto 'seen'. The tests for 'git status' and 'git
add' with sparse-indexes adds to the confidence of the merge.
I did see a small conflict with mt/add-rm-in-sparse-checkout
in one of my future patches because of an adjacent edit. I'll
be sure to base my next submission on a merge with that topic.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v2 00/25] Sparse Index: API protections
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (25 preceding siblings ...)
2021-04-01 7:07 ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
@ 2021-04-05 19:55 ` Elijah Newren
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
27 siblings, 0 replies; 111+ messages in thread
From: Elijah Newren @ 2021-04-05 19:55 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
Matheus Tavares Bernardino, Derrick Stolee
On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the second patch series submission coming out of the sparse-index
> RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> This is based on v3 of the format series [2].
>
> [2]
> https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@gmail.com/
>
> The point of this series is to insert protections for the consumers of the
> in-memory index to avoid unintended behavior change when using a sparse
> index versus a full one.
>
> We mark certain regions of code as needing a full index, so we call
> ensure_full_index() to expand a sparse index to a full one, if necessary.
> These protections are inserted file-by-file in every loop over all cache
> entries. Well, "most" loops, because some are going to be handled in the
> very next series so I leave them out.
>
> Many callers use index_name_pos() to find a path by name. In these cases, we
> can check if that position resolves to a sparse directory instance. In those
> cases, we just expand to a full index and run the search again.
>
> The last few patches deal with the name-hash hashtable for doing O(1)
> lookups.
>
> These protections don't do much right now, since the previous series created
> the_repository->settings.command_requires_full_index to guard all index
> reads and writes to ensure the in-memory copy is full for commands that have
> not been tested with the sparse index yet.
>
> However, after this series is complete, we now have a straight-forward plan
> for making commands "sparse aware" one-by-one:
>
> 1. Disable settings.command_requires_full_index to allow an in-memory
> sparse-index.
> 2. Run versions of that command under a debugger, breaking on
> ensure_full_index().
> 3. Examine the call stack to determine the context of that expansion, then
> implement the proper behavior in those locations.
> 4. Add tests to ensure we are checking this logic in the presence of sparse
> directory entries.
>
> I will admit that mostly it is the writing of the test cases that takes the
> most time in the conversions I've done so far.
>
>
> Updates in v2
> =============
>
> * Rebased onto v5 of ds/sparse-index
> * Updated the technical doc to describe how these protections are guards to
> keep behavior consistent between a sparse-index and a full index. Whether
> or not that behavior is "correct" can be interrogated later.
> * Calls to ensure_full_index() are marked with a TODO comment saying these
> calls should be audited later (with tests).
> * Fixed an incorrectly squashed commit message.
> * Dropped the diff-lib.c commit because it was erroneously included in v2.
> * Dropped the merge-ort.c commit because of conflicts with work in flight
> and a quick audit that it is not needed.
> * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
> found it equivalent to what I would have done.
You addressed most my concerns moving from the RFC to v1, and this
version addresses most of my remaining comments/questions/suggestions.
I only had two minor comments on the last two patches. This series is
looking pretty good to me.
> Thanks, -Stolee
>
> Derrick Stolee (25):
> sparse-index: API protection strategy
> *: remove 'const' qualifier for struct index_state
> read-cache: expand on query into sparse-directory entry
> cache: move ensure_full_index() to cache.h
> add: ensure full index
> checkout-index: ensure full index
> checkout: ensure full index
> commit: ensure full index
> difftool: ensure full index
> fsck: ensure full index
> grep: ensure full index
> ls-files: ensure full index
> merge-index: ensure full index
> rm: ensure full index
> stash: ensure full index
> update-index: ensure full index
> dir: ensure full index
> entry: ensure full index
> merge-recursive: ensure full index
> pathspec: ensure full index
> read-cache: ensure full index
> resolve-undo: ensure full index
> revision: ensure full index
> sparse-index: expand_to_path()
> name-hash: use expand_to_path()
>
> Documentation/technical/sparse-index.txt | 37 +++++++++++-
> attr.c | 14 ++---
> attr.h | 4 +-
> builtin/add.c | 2 +
> builtin/checkout-index.c | 2 +
> builtin/checkout.c | 5 ++
> builtin/commit.c | 4 ++
> builtin/difftool.c | 3 +
> builtin/fsck.c | 2 +
> builtin/grep.c | 2 +
> builtin/ls-files.c | 14 +++--
> builtin/merge-index.c | 5 ++
> builtin/rm.c | 2 +
> builtin/stash.c | 2 +
> builtin/update-index.c | 2 +
> cache.h | 7 ++-
> convert.c | 26 ++++-----
> convert.h | 20 +++----
> dir.c | 14 +++--
> dir.h | 8 +--
> entry.c | 2 +
> merge-recursive.c | 4 +-
> name-hash.c | 10 ++++
> pathspec.c | 8 ++-
> pathspec.h | 6 +-
> read-cache.c | 35 ++++++++++--
> resolve-undo.c | 4 ++
> revision.c | 2 +
> sparse-index.c | 72 ++++++++++++++++++++++++
> sparse-index.h | 14 ++++-
> submodule.c | 6 +-
> submodule.h | 6 +-
> 32 files changed, 273 insertions(+), 71 deletions(-)
>
>
> base-commit: 66602733cc95d9a53594520cd8b28d3338e258ea
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/906
>
> Range-diff vs v1:
>
> 3: bbf19f8a2be5 ! 1: 7484e085e342 sparse-index: API protection strategy
> @@ Commit message
> Looking for non-zero stage:
> * builtin/add.c:chmod_pathspec()
> * builtin/merge.c:count_unmerged_entries()
> + * merge-ort.c:record_conflicted_index_entries()
> * read-cache.c:unmerged_index()
> * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
> * revision.c:prepare_show_merge()
> @@ Commit message
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
> ## Documentation/technical/sparse-index.txt ##
> -@@ Documentation/technical/sparse-index.txt: also introduce other features that have been considered for improving the
> - index, as well.
> +@@ Documentation/technical/sparse-index.txt: index, as well.
>
> Next, consumers of the index will be guarded against operating on a
> --sparse-index by inserting calls to `ensure_full_index()` or
> + sparse-index by inserting calls to `ensure_full_index()` or
> -`expand_index_to_path()`. After these guards are in place, we can begin
> -leaving sparse-directory entries in the in-memory index structure.
> -+sparse-index by inserting calls to `ensure_full_index()` before iterating
> -+over all cache entries. If a specific path is requested, then those will
> ++`expand_index_to_path()`. If a specific path is requested, then those will
> +be protected from within the `index_file_exists()` and `index_name_pos()`
> -+API calls: they will call `ensure_full_index()` if necessary.
> ++API calls: they will call `ensure_full_index()` if necessary. The
> ++intention here is to preserve existing behavior when interacting with a
> ++sparse-checkout. We don't want a change to happen by accident, without
> ++tests. Many of these locations may not need any change before removing the
> ++guards, but we should not do so without tests to ensure the expected
> ++behavior happens.
> ++
> ++It may be desirable to _change_ the behavior of some commands in the
> ++presence of a sparse index or more generally in any sparse-checkout
> ++scenario. In such cases, these should be carefully communicated and
> ++tested. No such behavior changes are intended during this phase.
> +
> +During a scan of the codebase, not every iteration of the cache entries
> +needs an `ensure_full_index()` check. The basic reasons include:
> @@ Documentation/technical/sparse-index.txt: also introduce other features that hav
> +
> +6. The sparse-index is disabled at this point when using the split-index
> + feature, so no effort is made to protect the split-index API.
> -+
> -+After these guards are in place, we can begin leaving sparse-directory
> -+entries in the in-memory index structure.
>
> Even after inserting these guards, we will keep expanding sparse-indexes
> for most Git commands using the `command_requires_full_index` repository
> 1: 628e16dd3fc7 = 2: 098b2c9ef352 *: remove 'const' qualifier for struct index_state
> 2: 8e11e8917019 = 3: 737d27e18d64 read-cache: expand on query into sparse-directory entry
> 4: 7e4079c48eb2 = 4: db5c100f3e2b cache: move ensure_full_index() to cache.h
> 5: 142df1ab8e3e ! 5: 4a5fc2eb5a9f add: ensure full index
> @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths
> {
> int i, retval = 0;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++) {
> struct cache_entry *ce = active_cache[i];
> 6: bfa0164cc3c1 ! 6: 11c38f7277c5 checkout-index: ensure full index
> @@ builtin/checkout-index.c: static void checkout_all(const char *prefix, int prefi
> int i, errs = 0;
> struct cache_entry *last_ce = NULL;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr ; i++) {
> struct cache_entry *ce = active_cache[i];
> 7: 1bb7b6117e41 ! 7: fd04adbb3f79 checkout: ensure full index
> @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt
> NULL);
>
> enable_delayed_checkout(&state);
> ++
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (pos = 0; pos < active_nr; pos++) {
> struct cache_entry *ce = active_cache[pos];
> @@ builtin/checkout.c: static int checkout_paths(const struct checkout_opts *opts,
> * Make sure all pathspecs participated in locating the paths
> * to be checked out.
> */
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (pos = 0; pos < active_nr; pos++)
> if (opts->overlay_mode)
> 8: a59e45c4ae8f ! 8: 65704f39edc9 commit: ensure full index
> @@ builtin/commit.c: static int list_paths(struct string_list *list, const char *wi
> free(max_prefix);
> }
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++) {
> const struct cache_entry *ce = active_cache[i];
> @@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const cha
> if (get_oid(parent, &oid)) {
> int i, ita_nr = 0;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++)
> if (ce_intent_to_add(active_cache[i]))
> 9: 83040e7558f3 ! 9: 739f3fe9edf2 difftool: ensure full index
> @@ builtin/difftool.c: static int run_dir_diff(const char *extcmd, int symlinks, co
> setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
> rc = run_command_v_opt(helper_argv, flags);
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&wtindex);
> +
> /*
> 10: 988f7bd2d736 ! 10: 779a86ad1ec4 fsck: ensure full index
> @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
> verify_index_checksum = 1;
> verify_ce_order = 1;
> read_cache();
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++) {
> unsigned int mode;
> 11: abe189051a0c ! 11: 8c0d377054fa grep: ensure full index
> @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt,
> if (repo_read_index(repo) < 0)
> die(_("index file corrupt"));
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(repo->index);
> for (nr = 0; nr < repo->index->cache_nr; nr++) {
> const struct cache_entry *ce = repo->index->cache[nr];
> 12: 00c8a7e1d119 ! 12: beaa1467cabb ls-files: ensure full index
> @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
>
> if (!(show_cached || show_stage || show_deleted || show_modified))
> return;
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(repo->index);
> for (i = 0; i < repo->index->cache_nr; i++) {
> const struct cache_entry *ce = repo->index->cache[i];
> @@ builtin/ls-files.c: void overlay_tree_on_index(struct index_state *istate,
> die("bad tree-ish %s", tree_name);
>
> /* Hoist the unmerged entries up to stage #3 to make room */
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> struct cache_entry *ce = istate->cache[i];
> 13: 1288f02177d2 ! 13: 73684141fcff merge-index: ensure full index
> @@ builtin/merge-index.c: static void merge_one_path(const char *path)
> static void merge_all(void)
> {
> int i;
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++) {
> const struct cache_entry *ce = active_cache[i];
> @@ builtin/merge-index.c: int cmd_merge_index(int argc, const char **argv, const ch
>
> read_cache();
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> +
> i = 1;
> 14: d6816560b32f ! 14: 6ea81a49c6b5 rm: ensure full index
> @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
>
> seen = xcalloc(pathspec.nr, 1);
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++) {
> const struct cache_entry *ce = active_cache[i];
> 15: 92ccd31dd343 ! 15: 49ca5ed05c8d sparse-checkout: ensure full index
> @@ Metadata
> Author: Derrick Stolee <dstolee@microsoft.com>
>
> ## Commit message ##
> - sparse-checkout: ensure full index
> + stash: ensure full index
>
> - When adjusting the sparse-checkout definition, ensure that a sparse
> - index is expanded to a full one to avoid unexpected behavior. This is a
> - top candidate for later integration with the sparse-index, but requires
> - careful testing.
> + Before iterating over all cache entries, ensure that a sparse index is
> + expanded to a full index to avoid unexpected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>
> @@ builtin/stash.c: static int do_push_stash(const struct pathspec *ps, const char
> int i;
> char *ps_matched = xcalloc(ps->nr, 1);
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (i = 0; i < active_nr; i++)
> ce_path_match(&the_index, active_cache[i], ps,
> 16: 796449758a08 ! 16: 9c4bb187c15d update-index: ensure full index
> @@ builtin/update-index.c: static int do_reupdate(int ac, const char **av,
> */
> has_head = 0;
> redo:
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(&the_index);
> for (pos = 0; pos < active_nr; pos++) {
> const struct cache_entry *ce = active_cache[pos];
> 17: 45bbed6150a2 < -: ------------ diff-lib: ensure full index
> 18: 8ce1f80985a4 ! 17: fae4c078c3ef dir: ensure full index
> @@ dir.c: static void connect_wt_gitdir_in_nested(const char *sub_worktree,
> if (repo_read_index(&subrepo) < 0)
> die(_("index file corrupt in repo %s"), subrepo.gitdir);
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(subrepo.index);
> for (i = 0; i < subrepo.index->cache_nr; i++) {
> const struct cache_entry *ce = subrepo.index->cache[i];
> 19: 6e0b452f44c1 ! 18: 2b9180ee77d3 entry: ensure full index
> @@ entry.c: static void mark_colliding_entries(const struct checkout *state,
>
> ce->ce_flags |= CE_MATCHED;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(state->istate);
> for (i = 0; i < state->istate->cache_nr; i++) {
> struct cache_entry *dup = state->istate->cache[i];
> 20: fb4c7038b746 < -: ------------ merge-ort: ensure full index
> 21: 57d59825627f ! 19: 1e3f6085a405 merge-recursive: ensure full index
> @@ merge-recursive.c: static struct string_list *get_unmerged(struct index_state *i
>
> unmerged->strdup_strings = 1;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> struct string_list_item *item;
> 22: f4c470686b27 ! 20: e62a597a9725 pathspec: ensure full index
> @@ pathspec.c: void add_pathspec_matches_against_index(const struct pathspec *paths
> num_unmatched++;
> if (!num_unmatched)
> return;
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> const struct cache_entry *ce = istate->cache[i];
> 23: 81519782d4b2 ! 21: ebfffdbdd6ad read-cache: ensure full index
> @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
> */
> preload_index(istate, pathspec, 0);
> trace2_region_enter("index", "refresh", NULL);
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> struct cache_entry *ce, *new_entry;
> @@ read-cache.c: int repo_index_has_changes(struct repository *repo,
> diff_flush(&opt);
> return opt.flags.has_changes != 0;
> } else {
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; sb && i < istate->cache_nr; i++) {
> if (i)
> 24: 39e8bea0c1ca ! 22: 495b07a87973 resolve-undo: ensure full index
> @@ resolve-undo.c: void unmerge_marked_index(struct index_state *istate)
> if (!istate->resolve_undo)
> return;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> const struct cache_entry *ce = istate->cache[i];
> @@ resolve-undo.c: void unmerge_index(struct index_state *istate, const struct path
> if (!istate->resolve_undo)
> return;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> const struct cache_entry *ce = istate->cache[i];
> 25: 6abb1813ae10 ! 23: 3144114d1a75 revision: ensure full index
> @@ revision.c: static void do_add_index_objects_to_pending(struct rev_info *revs,
> {
> int i;
>
> ++ /* TODO: audit for interaction with sparse-index. */
> + ensure_full_index(istate);
> for (i = 0; i < istate->cache_nr; i++) {
> struct cache_entry *ce = istate->cache[i];
> 26: 1690d33c11e0 = 24: d52c72b4a7b9 sparse-index: expand_to_path()
> 27: 40cfc59f695a = 25: 7e2d3fae9a2a name-hash: use expand_to_path()
>
> --
> gitgitgadget
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v3 00/26] Sparse Index: API protections
2021-04-01 1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
` (26 preceding siblings ...)
2021-04-05 19:55 ` Elijah Newren
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
` (26 more replies)
27 siblings, 27 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee
Here is the second patch series submission coming out of the sparse-index
RFC [1].
[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
This is based on ds/sparse-index.
The point of this series is to insert protections for the consumers of the
in-memory index to avoid unintended behavior change when using a sparse
index versus a full one.
We mark certain regions of code as needing a full index, so we call
ensure_full_index() to expand a sparse index to a full one, if necessary.
These protections are inserted file-by-file in every loop over all cache
entries. Well, "most" loops, because some are going to be handled in the
very next series so I leave them out.
Many callers use index_name_pos() to find a path by name. In these cases, we
can check if that position resolves to a sparse directory instance. In those
cases, we just expand to a full index and run the search again.
The last few patches deal with the name-hash hashtable for doing O(1)
lookups.
These protections don't do much right now, since the previous series created
the_repository->settings.command_requires_full_index to guard all index
reads and writes to ensure the in-memory copy is full for commands that have
not been tested with the sparse index yet.
However, after this series is complete, we now have a straight-forward plan
for making commands "sparse aware" one-by-one:
1. Disable settings.command_requires_full_index to allow an in-memory
sparse-index.
2. Run versions of that command under a debugger, breaking on
ensure_full_index().
3. Examine the call stack to determine the context of that expansion, then
implement the proper behavior in those locations.
4. Add tests to ensure we are checking this logic in the presence of sparse
directory entries.
I will admit that mostly it is the writing of the test cases that takes the
most time in the conversions I've done so far.
Updates in v3
=============
* I updated based on Elijah's feedback.
* One new patch splits out a change that Elijah (rightfully) pointed out
did not belong with the patch it was originally in.
I gave it time to see if any other comments came in, but it looks like
review stabilized. I probably waited a bit longer than I should have.
Updates in v2
=============
* Rebased onto v5 of ds/sparse-index
* Updated the technical doc to describe how these protections are guards to
keep behavior consistent between a sparse-index and a full index. Whether
or not that behavior is "correct" can be interrogated later.
* Calls to ensure_full_index() are marked with a TODO comment saying these
calls should be audited later (with tests).
* Fixed an incorrectly squashed commit message.
* Dropped the diff-lib.c commit because it was erroneously included in v2.
* Dropped the merge-ort.c commit because of conflicts with work in flight
and a quick audit that it is not needed.
* I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
found it equivalent to what I would have done.
Thanks, -Stolee
Derrick Stolee (26):
sparse-index: API protection strategy
*: remove 'const' qualifier for struct index_state
read-cache: expand on query into sparse-directory entry
cache: move ensure_full_index() to cache.h
add: ensure full index
checkout-index: ensure full index
checkout: ensure full index
commit: ensure full index
difftool: ensure full index
fsck: ensure full index
grep: ensure full index
ls-files: ensure full index
merge-index: ensure full index
rm: ensure full index
stash: ensure full index
update-index: ensure full index
dir: ensure full index
entry: ensure full index
merge-recursive: ensure full index
pathspec: ensure full index
read-cache: ensure full index
resolve-undo: ensure full index
revision: ensure full index
name-hash: don't add directories to name_hash
sparse-index: expand_to_path()
name-hash: use expand_to_path()
Documentation/technical/sparse-index.txt | 37 +++++++++++-
attr.c | 14 ++---
attr.h | 4 +-
builtin/add.c | 2 +
builtin/checkout-index.c | 2 +
builtin/checkout.c | 5 ++
builtin/commit.c | 4 ++
builtin/difftool.c | 3 +
builtin/fsck.c | 2 +
builtin/grep.c | 2 +
builtin/ls-files.c | 14 +++--
builtin/merge-index.c | 5 ++
builtin/rm.c | 2 +
builtin/stash.c | 2 +
builtin/update-index.c | 2 +
cache.h | 7 ++-
convert.c | 26 ++++-----
convert.h | 20 +++----
dir.c | 14 +++--
dir.h | 8 +--
entry.c | 2 +
merge-recursive.c | 4 +-
name-hash.c | 11 +++-
pathspec.c | 8 ++-
pathspec.h | 6 +-
read-cache.c | 35 ++++++++++--
resolve-undo.c | 4 ++
revision.c | 2 +
sparse-index.c | 73 ++++++++++++++++++++++++
sparse-index.h | 14 ++++-
submodule.c | 6 +-
submodule.h | 6 +-
32 files changed, 273 insertions(+), 73 deletions(-)
base-commit: c9e40ae8ec41c5566e5849a87c969fa81ef49fcd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/906
Range-diff vs v2:
1: 7484e085e342 = 1: 4731f610ba6e sparse-index: API protection strategy
2: 098b2c9ef352 = 2: d3a92538edb6 *: remove 'const' qualifier for struct index_state
3: 737d27e18d64 = 3: f4b77aa18b93 read-cache: expand on query into sparse-directory entry
4: db5c100f3e2b = 4: da17774a53c5 cache: move ensure_full_index() to cache.h
5: 4a5fc2eb5a9f = 5: b59c9f482828 add: ensure full index
6: 11c38f7277c5 = 6: 0082855b5961 checkout-index: ensure full index
7: fd04adbb3f79 = 7: e2ac527143ff checkout: ensure full index
8: 65704f39edc9 = 8: 1a3b51fd3c4b commit: ensure full index
9: 739f3fe9edf2 = 9: 8c61d40dfe01 difftool: ensure full index
10: 779a86ad1ec4 = 10: 45b603379422 fsck: ensure full index
11: 8c0d377054fa = 11: 97124e9fdc7f grep: ensure full index
12: beaa1467cabb = 12: b00e214515e8 ls-files: ensure full index
13: 73684141fcff = 13: 6497f2ce225b merge-index: ensure full index
14: 6ea81a49c6b5 = 14: 175f3bc6b336 rm: ensure full index
15: 49ca5ed05c8d = 15: daa77e84e0e2 stash: ensure full index
16: 9c4bb187c15d = 16: 8c5336964d9b update-index: ensure full index
17: fae4c078c3ef = 17: 08a62c23c8f7 dir: ensure full index
18: 2b9180ee77d3 = 18: 825ebceee508 entry: ensure full index
19: 1e3f6085a405 = 19: 3673db517235 merge-recursive: ensure full index
20: e62a597a9725 = 20: 4d3f6de29a63 pathspec: ensure full index
21: ebfffdbdd6ad = 21: bda9cab15966 read-cache: ensure full index
22: 495b07a87973 = 22: 38f295a41ec1 resolve-undo: ensure full index
23: 3144114d1a75 = 23: f928e104f0d3 revision: ensure full index
-: ------------ > 24: 5fd83dcf2747 name-hash: don't add directories to name_hash
24: d52c72b4a7b9 ! 25: 335fec3676a0 sparse-index: expand_to_path()
@@ sparse-index.c: void ensure_full_index(struct index_state *istate)
+ path_mutable.len, icase)) {
+ /*
+ * We found a parent directory in the name-hash
-+ * hashtable, which means that this entry could
-+ * exist within a sparse-directory entry. Expand
-+ * accordingly.
++ * hashtable, because only sparse directory entries
++ * have a trailing '/' character. Since "path" wasn't
++ * in the index, perhaps it exists within this
++ * sparse-directory. Expand accordingly.
+ */
+ ensure_full_index(istate);
+ break;
25: 7e2d3fae9a2a ! 26: 1f3af8a886e5 name-hash: use expand_to_path()
@@ name-hash.c
struct dir_entry {
struct hashmap_entry ent;
-@@ name-hash.c: static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
- if (ce->ce_flags & CE_HASHED)
- return;
- ce->ce_flags |= CE_HASHED;
-+
-+ if (S_ISSPARSEDIR(ce->ce_mode)) {
-+ add_dir_entry(istate, ce);
-+ return;
-+ }
-+
- hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
- hashmap_add(&istate->name_hash, &ce->ent);
-
@@ name-hash.c: int index_dir_exists(struct index_state *istate, const char *name, int namelen)
struct dir_entry *dir;
--
gitgitgadget
^ permalink raw reply [flat|nested] 111+ messages in thread
* [PATCH v3 01/26] sparse-index: API protection strategy
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
` (25 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Edit and expand the sparse-index design document with the plan for
guarding index operations with ensure_full_index().
Notably, the plan has changed to not have an expand_to_path() method in
favor of checking for a sparse-directory hit inside of the
index_path_pos() API.
The changes that follow this one will incrementally add
ensure_full_index() guards to iterations over all cache entries. Some
iterations over the cache entries are not protected due to a few
categories listed in the document. Since these are not being modified,
here is a short list of the files and methods that will not receive
these guards:
Looking for non-zero stage:
* builtin/add.c:chmod_pathspec()
* builtin/merge.c:count_unmerged_entries()
* merge-ort.c:record_conflicted_index_entries()
* read-cache.c:unmerged_index()
* rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
* revision.c:prepare_show_merge()
* sequencer.c:append_conflicts_hint()
* wt-status.c:wt_status_collect_changes_initial()
Looking for submodules:
* builtin/submodule--helper.c:module_list_compute()
* submodule.c: several methods
* worktree.c:validate_no_submodules()
Part of the index API:
* name-hash.c: lazy init methods
* preload-index.c:preload_thread(), preload_index()
* read-cache.c: file format methods
Checking for correct order of cache entries:
* read-cache.c:check_ce_order()
Ignores SKIP_WORKTREE entries or already aware:
* unpack-trees.c:mark_new_skip_worktree()
* wt-status.c:wt_status_check_sparse_checkout()
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
Documentation/technical/sparse-index.txt | 37 ++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt
index 8d3d80804604..3b24c1a219f8 100644
--- a/Documentation/technical/sparse-index.txt
+++ b/Documentation/technical/sparse-index.txt
@@ -85,8 +85,41 @@ index, as well.
Next, consumers of the index will be guarded against operating on a
sparse-index by inserting calls to `ensure_full_index()` or
-`expand_index_to_path()`. After these guards are in place, we can begin
-leaving sparse-directory entries in the in-memory index structure.
+`expand_index_to_path()`. If a specific path is requested, then those will
+be protected from within the `index_file_exists()` and `index_name_pos()`
+API calls: they will call `ensure_full_index()` if necessary. The
+intention here is to preserve existing behavior when interacting with a
+sparse-checkout. We don't want a change to happen by accident, without
+tests. Many of these locations may not need any change before removing the
+guards, but we should not do so without tests to ensure the expected
+behavior happens.
+
+It may be desirable to _change_ the behavior of some commands in the
+presence of a sparse index or more generally in any sparse-checkout
+scenario. In such cases, these should be carefully communicated and
+tested. No such behavior changes are intended during this phase.
+
+During a scan of the codebase, not every iteration of the cache entries
+needs an `ensure_full_index()` check. The basic reasons include:
+
+1. The loop is scanning for entries with non-zero stage. These entries
+ are not collapsed into a sparse-directory entry.
+
+2. The loop is scanning for submodules. These entries are not collapsed
+ into a sparse-directory entry.
+
+3. The loop is part of the index API, especially around reading or
+ writing the format.
+
+4. The loop is checking for correct order of cache entries and that is
+ correct if and only if the sparse-directory entries are in the correct
+ location.
+
+5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is
+ otherwise already aware of sparse directory entries.
+
+6. The sparse-index is disabled at this point when using the split-index
+ feature, so no effort is made to protect the split-index API.
Even after inserting these guards, we will keep expanding sparse-indexes
for most Git commands using the `command_requires_full_index` repository
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
` (24 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
attr.c | 14 +++++++-------
attr.h | 4 ++--
builtin/ls-files.c | 10 +++++-----
cache.h | 6 +++---
convert.c | 26 +++++++++++++-------------
convert.h | 20 ++++++++++----------
dir.c | 12 ++++++------
dir.h | 8 ++++----
merge-recursive.c | 2 +-
pathspec.c | 6 +++---
pathspec.h | 6 +++---
read-cache.c | 10 +++++-----
submodule.c | 6 +++---
submodule.h | 6 +++---
14 files changed, 68 insertions(+), 68 deletions(-)
diff --git a/attr.c b/attr.c
index 4ef85d668b54..8de553293ee4 100644
--- a/attr.c
+++ b/attr.c
@@ -718,7 +718,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
return res;
}
-static struct attr_stack *read_attr_from_index(const struct index_state *istate,
+static struct attr_stack *read_attr_from_index(struct index_state *istate,
const char *path,
int macro_ok)
{
@@ -748,7 +748,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
return res;
}
-static struct attr_stack *read_attr(const struct index_state *istate,
+static struct attr_stack *read_attr(struct index_state *istate,
const char *path, int macro_ok)
{
struct attr_stack *res = NULL;
@@ -840,7 +840,7 @@ static void push_stack(struct attr_stack **attr_stack_p,
}
}
-static void bootstrap_attr_stack(const struct index_state *istate,
+static void bootstrap_attr_stack(struct index_state *istate,
struct attr_stack **stack)
{
struct attr_stack *e;
@@ -878,7 +878,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
push_stack(stack, e, NULL, 0);
}
-static void prepare_attr_stack(const struct index_state *istate,
+static void prepare_attr_stack(struct index_state *istate,
const char *path, int dirlen,
struct attr_stack **stack)
{
@@ -1078,7 +1078,7 @@ static void determine_macros(struct all_attrs_item *all_attrs,
* If check->check_nr is non-zero, only attributes in check[] are collected.
* Otherwise all attributes are collected.
*/
-static void collect_some_attrs(const struct index_state *istate,
+static void collect_some_attrs(struct index_state *istate,
const char *path,
struct attr_check *check)
{
@@ -1107,7 +1107,7 @@ static void collect_some_attrs(const struct index_state *istate,
fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
}
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
const char *path,
struct attr_check *check)
{
@@ -1124,7 +1124,7 @@ void git_check_attr(const struct index_state *istate,
}
}
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
const char *path, struct attr_check *check)
{
int i;
diff --git a/attr.h b/attr.h
index 404548f028a8..3732505edae8 100644
--- a/attr.h
+++ b/attr.h
@@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check);
*/
const char *git_attr_name(const struct git_attr *);
-void git_check_attr(const struct index_state *istate,
+void git_check_attr(struct index_state *istate,
const char *path, struct attr_check *check);
/*
* Retrieve all attributes that apply to the specified path.
* check holds the attributes and their values.
*/
-void git_all_attrs(const struct index_state *istate,
+void git_all_attrs(struct index_state *istate,
const char *path, struct attr_check *check);
enum git_attr_direction {
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 60a2913a01e9..4f9ed1fb29b7 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -57,7 +57,7 @@ static const char *tag_modified = "";
static const char *tag_skip_worktree = "";
static const char *tag_resolve_undo = "";
-static void write_eolinfo(const struct index_state *istate,
+static void write_eolinfo(struct index_state *istate,
const struct cache_entry *ce, const char *path)
{
if (show_eol) {
@@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce)
}
}
-static void show_dir_entry(const struct index_state *istate,
+static void show_dir_entry(struct index_state *istate,
const char *tag, struct dir_entry *ent)
{
int len = max_prefix_len;
@@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate,
write_name(ent->name);
}
-static void show_other_files(const struct index_state *istate,
+static void show_other_files(struct index_state *istate,
const struct dir_struct *dir)
{
int i;
@@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate,
}
}
-static void show_killed_files(const struct index_state *istate,
+static void show_killed_files(struct index_state *istate,
const struct dir_struct *dir)
{
int i;
@@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
}
}
-static void show_ru_info(const struct index_state *istate)
+static void show_ru_info(struct index_state *istate)
{
struct string_list_item *item;
diff --git a/cache.h b/cache.h
index 8aede373aeb3..5006278c13ca 100644
--- a/cache.h
+++ b/cache.h
@@ -800,7 +800,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
* index_name_pos(&index, "f", 1) -> -3
* index_name_pos(&index, "g", 1) -> -5
*/
-int index_name_pos(const struct index_state *, const char *name, int namelen);
+int index_name_pos(struct index_state *, const char *name, int namelen);
/*
* Some functions return the negative complement of an insert position when a
@@ -850,8 +850,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags);
int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
-int index_name_is_other(const struct index_state *, const char *, int);
-void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
+int index_name_is_other(struct index_state *, const char *, int);
+void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
/* do stat comparison even if CE_VALID is true */
#define CE_MATCH_IGNORE_VALID 01
diff --git a/convert.c b/convert.c
index ee360c2f07ce..20b19abffa36 100644
--- a/convert.c
+++ b/convert.c
@@ -138,7 +138,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
}
}
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
const char *path)
{
const char *ret;
@@ -222,7 +222,7 @@ static void check_global_conv_flags_eol(const char *path,
}
}
-static int has_crlf_in_index(const struct index_state *istate, const char *path)
+static int has_crlf_in_index(struct index_state *istate, const char *path)
{
unsigned long sz;
void *data;
@@ -496,7 +496,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
return 1;
}
-static int crlf_to_git(const struct index_state *istate,
+static int crlf_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *buf,
enum crlf_action crlf_action, int conv_flags)
@@ -1307,7 +1307,7 @@ struct conv_attrs {
static struct attr_check *check;
-static void convert_attrs(const struct index_state *istate,
+static void convert_attrs(struct index_state *istate,
struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
@@ -1369,7 +1369,7 @@ void reset_parsed_attributes(void)
user_convert_tail = NULL;
}
-int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
+int would_convert_to_git_filter_fd(struct index_state *istate, const char *path)
{
struct conv_attrs ca;
@@ -1388,7 +1388,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL);
}
-const char *get_convert_attr_ascii(const struct index_state *istate, const char *path)
+const char *get_convert_attr_ascii(struct index_state *istate, const char *path)
{
struct conv_attrs ca;
@@ -1414,7 +1414,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char
return "";
}
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags)
{
@@ -1448,7 +1448,7 @@ int convert_to_git(const struct index_state *istate,
return ret | ident_to_git(src, len, dst, ca.ident);
}
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
const char *path, int fd, struct strbuf *dst,
int conv_flags)
{
@@ -1466,7 +1466,7 @@ void convert_to_git_filter_fd(const struct index_state *istate,
ident_to_git(dst->buf, dst->len, dst, ca.ident);
}
-static int convert_to_working_tree_internal(const struct index_state *istate,
+static int convert_to_working_tree_internal(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
int normalizing,
@@ -1510,7 +1510,7 @@ static int convert_to_working_tree_internal(const struct index_state *istate,
return ret | ret_filter;
}
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
@@ -1519,7 +1519,7 @@ int async_convert_to_working_tree(const struct index_state *istate,
return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
}
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta)
@@ -1527,7 +1527,7 @@ int convert_to_working_tree(const struct index_state *istate,
return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
}
-int renormalize_buffer(const struct index_state *istate, const char *path,
+int renormalize_buffer(struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL);
@@ -1964,7 +1964,7 @@ static struct stream_filter *ident_filter(const struct object_id *oid)
* Note that you would be crazy to set CRLF, smudge/clean or ident to a
* large binary blob you would want us not to slurp into the memory!
*/
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
const char *path,
const struct object_id *oid)
{
diff --git a/convert.h b/convert.h
index e29d1026a686..0f7c8a1f0446 100644
--- a/convert.h
+++ b/convert.h
@@ -65,41 +65,41 @@ struct checkout_metadata {
extern enum eol core_eol;
extern char *check_roundtrip_encoding;
-const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+const char *get_cached_convert_stats_ascii(struct index_state *istate,
const char *path);
const char *get_wt_convert_stats_ascii(const char *path);
-const char *get_convert_attr_ascii(const struct index_state *istate,
+const char *get_convert_attr_ascii(struct index_state *istate,
const char *path);
/* returns 1 if *dst was used */
-int convert_to_git(const struct index_state *istate,
+int convert_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags);
-int convert_to_working_tree(const struct index_state *istate,
+int convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta);
-int async_convert_to_working_tree(const struct index_state *istate,
+int async_convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
void *dco);
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
-int renormalize_buffer(const struct index_state *istate,
+int renormalize_buffer(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst);
-static inline int would_convert_to_git(const struct index_state *istate,
+static inline int would_convert_to_git(struct index_state *istate,
const char *path)
{
return convert_to_git(istate, path, NULL, 0, NULL, 0);
}
/* Precondition: would_convert_to_git_filter_fd(path) == true */
-void convert_to_git_filter_fd(const struct index_state *istate,
+void convert_to_git_filter_fd(struct index_state *istate,
const char *path, int fd,
struct strbuf *dst,
int conv_flags);
-int would_convert_to_git_filter_fd(const struct index_state *istate,
+int would_convert_to_git_filter_fd(struct index_state *istate,
const char *path);
/*
@@ -133,7 +133,7 @@ void reset_parsed_attributes(void);
struct stream_filter; /* opaque */
-struct stream_filter *get_stream_filter(const struct index_state *istate,
+struct stream_filter *get_stream_filter(struct index_state *istate,
const char *path,
const struct object_id *);
void free_stream_filter(struct stream_filter *);
diff --git a/dir.c b/dir.c
index fd8aa7c40faa..5b00dfb5b144 100644
--- a/dir.c
+++ b/dir.c
@@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
* [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
* [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
*/
-static int match_pathspec_item(const struct index_state *istate,
+static int match_pathspec_item(struct index_state *istate,
const struct pathspec_item *item, int prefix,
const char *name, int namelen, unsigned flags)
{
@@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate,
* pathspec did not match any names, which could indicate that the
* user mistyped the nth pathspec.
*/
-static int do_match_pathspec(const struct index_state *istate,
+static int do_match_pathspec(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen,
@@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate,
return retval;
}
-static int match_pathspec_with_flags(const struct index_state *istate,
+static int match_pathspec_with_flags(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen, unsigned flags)
@@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate,
return negative ? 0 : positive;
}
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen, int is_dir)
@@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate,
/**
* Check if a submodule is a superset of the pathspec
*/
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
const struct pathspec *ps,
const char *submodule_name,
char *seen)
@@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base,
add_pattern_to_hashsets(pl, pattern);
}
-static int read_skip_worktree_file_from_index(const struct index_state *istate,
+static int read_skip_worktree_file_from_index(struct index_state *istate,
const char *path,
size_t *size_out, char **data_out,
struct oid_stat *oid_stat)
diff --git a/dir.h b/dir.h
index facfae47402a..51cb0e217247 100644
--- a/dir.h
+++ b/dir.h
@@ -354,7 +354,7 @@ int count_slashes(const char *s);
int simple_length(const char *match);
int no_wildcard(const char *string);
char *common_prefix(const struct pathspec *pathspec);
-int match_pathspec(const struct index_state *istate,
+int match_pathspec(struct index_state *istate,
const struct pathspec *pathspec,
const char *name, int namelen,
int prefix, char *seen, int is_dir);
@@ -492,12 +492,12 @@ int git_fnmatch(const struct pathspec_item *item,
const char *pattern, const char *string,
int prefix);
-int submodule_path_match(const struct index_state *istate,
+int submodule_path_match(struct index_state *istate,
const struct pathspec *ps,
const char *submodule_name,
char *seen);
-static inline int ce_path_match(const struct index_state *istate,
+static inline int ce_path_match(struct index_state *istate,
const struct cache_entry *ce,
const struct pathspec *pathspec,
char *seen)
@@ -506,7 +506,7 @@ static inline int ce_path_match(const struct index_state *istate,
S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
}
-static inline int dir_path_match(const struct index_state *istate,
+static inline int dir_path_match(struct index_state *istate,
const struct dir_entry *ent,
const struct pathspec *pathspec,
int prefix, char *seen)
diff --git a/merge-recursive.c b/merge-recursive.c
index 3d9207455b3a..b8de7a704eae 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2987,7 +2987,7 @@ static int blob_unchanged(struct merge_options *opt,
struct strbuf obuf = STRBUF_INIT;
struct strbuf abuf = STRBUF_INIT;
int ret = 0; /* assume changed for safety */
- const struct index_state *idx = opt->repo->index;
+ struct index_state *idx = opt->repo->index;
if (a->mode != o->mode)
return 0;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22f2..b6e333965cb4 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -20,7 +20,7 @@
* to use find_pathspecs_matching_against_index() instead.
*/
void add_pathspec_matches_against_index(const struct pathspec *pathspec,
- const struct index_state *istate,
+ struct index_state *istate,
char *seen)
{
int num_unmatched = 0, i;
@@ -51,7 +51,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
* given pathspecs achieves against all items in the index.
*/
char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
- const struct index_state *istate)
+ struct index_state *istate)
{
char *seen = xcalloc(pathspec->nr, 1);
add_pathspec_matches_against_index(pathspec, istate, seen);
@@ -702,7 +702,7 @@ void clear_pathspec(struct pathspec *pathspec)
pathspec->nr = 0;
}
-int match_pathspec_attrs(const struct index_state *istate,
+int match_pathspec_attrs(struct index_state *istate,
const char *name, int namelen,
const struct pathspec_item *item)
{
diff --git a/pathspec.h b/pathspec.h
index 454ce364fac7..2ccc8080b6cc 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -150,11 +150,11 @@ static inline int ps_strcmp(const struct pathspec_item *item,
}
void add_pathspec_matches_against_index(const struct pathspec *pathspec,
- const struct index_state *istate,
+ struct index_state *istate,
char *seen);
char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
- const struct index_state *istate);
-int match_pathspec_attrs(const struct index_state *istate,
+ struct index_state *istate);
+int match_pathspec_attrs(struct index_state *istate,
const char *name, int namelen,
const struct pathspec_item *item);
diff --git a/read-cache.c b/read-cache.c
index 2410e6e0df13..3ad94578095e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -549,7 +549,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char
return 0;
}
-static int index_name_stage_pos(const struct index_state *istate, const char *name, int namelen, int stage)
+static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage)
{
int first, last;
@@ -570,7 +570,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na
return -first-1;
}
-int index_name_pos(const struct index_state *istate, const char *name, int namelen)
+int index_name_pos(struct index_state *istate, const char *name, int namelen)
{
return index_name_stage_pos(istate, name, namelen, 0);
}
@@ -3389,8 +3389,8 @@ int repo_read_index_unmerged(struct repository *repo)
* We helpfully remove a trailing "/" from directories so that
* the output of read_directory can be used as-is.
*/
-int index_name_is_other(const struct index_state *istate, const char *name,
- int namelen)
+int index_name_is_other(struct index_state *istate, const char *name,
+ int namelen)
{
int pos;
if (namelen && name[namelen - 1] == '/')
@@ -3408,7 +3408,7 @@ int index_name_is_other(const struct index_state *istate, const char *name,
return 1;
}
-void *read_blob_data_from_index(const struct index_state *istate,
+void *read_blob_data_from_index(struct index_state *istate,
const char *path, unsigned long *size)
{
int pos, len;
diff --git a/submodule.c b/submodule.c
index 9767ba9893cc..83809a4f7bd2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -33,7 +33,7 @@ static struct oid_array ref_tips_after_fetch;
* will be disabled because we can't guess what might be configured in
* .gitmodules unless the user resolves the conflict.
*/
-int is_gitmodules_unmerged(const struct index_state *istate)
+int is_gitmodules_unmerged(struct index_state *istate)
{
int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE));
if (pos < 0) { /* .gitmodules not found or isn't merged */
@@ -301,7 +301,7 @@ int is_submodule_populated_gently(const char *path, int *return_error_code)
/*
* Dies if the provided 'prefix' corresponds to an unpopulated submodule
*/
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
const char *prefix)
{
int i, prefixlen;
@@ -331,7 +331,7 @@ void die_in_unpopulated_submodule(const struct index_state *istate,
/*
* Dies if any paths in the provided pathspec descends into a submodule
*/
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
const struct pathspec *ps)
{
int i, j;
diff --git a/submodule.h b/submodule.h
index 4ac6e31cf1f7..84640c49c114 100644
--- a/submodule.h
+++ b/submodule.h
@@ -39,7 +39,7 @@ struct submodule_update_strategy {
};
#define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
-int is_gitmodules_unmerged(const struct index_state *istate);
+int is_gitmodules_unmerged(struct index_state *istate);
int is_writing_gitmodules_ok(void);
int is_staging_gitmodules_ok(struct index_state *istate);
int update_path_in_gitmodules(const char *oldpath, const char *newpath);
@@ -60,9 +60,9 @@ int is_submodule_active(struct repository *repo, const char *path);
* Otherwise the return error code is the same as of resolve_gitdir_gently.
*/
int is_submodule_populated_gently(const char *path, int *return_error_code);
-void die_in_unpopulated_submodule(const struct index_state *istate,
+void die_in_unpopulated_submodule(struct index_state *istate,
const char *prefix);
-void die_path_inside_submodule(const struct index_state *istate,
+void die_path_inside_submodule(struct index_state *istate,
const struct pathspec *ps);
enum submodule_update_type parse_submodule_update_type(const char *value);
int parse_submodule_update_strategy(const char *value,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
` (23 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Callers to index_name_pos() or index_name_stage_pos() have a specific
path in mind. If that happens to be a path with an ancestor being a
sparse-directory entry, it can lead to unexpected results.
In the case that we did not find the requested path, check to see if the
position _before_ the inserted position is a sparse directory entry that
matches the initial segment of the input path (including the directory
separator at the end of the directory name). If so, then expand the
index to be a full index and search again. This expansion will only
happen once per index read.
Future enhancements could be more careful to expand only the necessary
sparse directory entry, but then we would have a special "not fully
sparse, but also not fully expanded" mode that could affect writing the
index to file. Since this only occurs if a specific file is requested
outside of the sparse checkout definition, this is unlikely to be a
common situation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
read-cache.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/read-cache.c b/read-cache.c
index 3ad94578095e..3698bc7bf77d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -567,6 +567,27 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in
}
first = next+1;
}
+
+ if (istate->sparse_index &&
+ first > 0) {
+ /* Note: first <= istate->cache_nr */
+ struct cache_entry *ce = istate->cache[first - 1];
+
+ /*
+ * If we are in a sparse-index _and_ the entry before the
+ * insertion position is a sparse-directory entry that is
+ * an ancestor of 'name', then we need to expand the index
+ * and search again. This will only trigger once, because
+ * thereafter the index is fully expanded.
+ */
+ if (S_ISSPARSEDIR(ce->ce_mode) &&
+ ce_namelen(ce) < namelen &&
+ !strncmp(name, ce->name, ce_namelen(ce))) {
+ ensure_full_index(istate);
+ return index_name_stage_pos(istate, name, namelen, stage);
+ }
+ }
+
return -first-1;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 04/26] cache: move ensure_full_index() to cache.h
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2021-04-12 21:07 ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
` (22 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Soon we will insert ensure_full_index() calls across the codebase.
Instead of also adding include statements for sparse-index.h, let's just
use the fact that anything that cares about the index already has
cache.h in its includes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
cache.h | 1 +
sparse-index.h | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/cache.h b/cache.h
index 5006278c13ca..b7e20e9778db 100644
--- a/cache.h
+++ b/cache.h
@@ -350,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce);
void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
void free_name_hash(struct index_state *istate);
+void ensure_full_index(struct index_state *istate);
/* Cache entry creation and cleanup */
diff --git a/sparse-index.h b/sparse-index.h
index 39dcc859735e..0268f38753c0 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -2,7 +2,6 @@
#define SPARSE_INDEX_H__
struct index_state;
-void ensure_full_index(struct index_state *istate);
int convert_to_sparse(struct index_state *istate);
struct repository;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 05/26] add: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2021-04-12 21:07 ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
` (21 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/add.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3a2..afccf2fd5543 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -141,6 +141,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
{
int i, retval = 0;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 06/26] checkout-index: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2021-04-12 21:07 ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
` (20 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before we iterate over all cache entries, ensure that the index is not
sparse. This loop in checkout_all() might be safe to iterate over a
sparse index, but let's put this protection here until it can be
carefully tested.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/checkout-index.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 023e49e271c2..2c2936a9dae0 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -119,6 +119,8 @@ static void checkout_all(const char *prefix, int prefix_length)
int i, errs = 0;
struct cache_entry *last_ce = NULL;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr ; i++) {
struct cache_entry *ce = active_cache[i];
if (ce_stage(ce) != checkout_stage
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 07/26] checkout: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2021-04-12 21:07 ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:07 ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
` (19 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries in the checkout builtin, ensure
that we have a full index to avoid any unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/checkout.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0e6639052001..d0dbe63ea119 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -368,6 +368,9 @@ static int checkout_worktree(const struct checkout_opts *opts,
NULL);
enable_delayed_checkout(&state);
+
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++) {
struct cache_entry *ce = active_cache[pos];
if (ce->ce_flags & CE_MATCHED) {
@@ -512,6 +515,8 @@ static int checkout_paths(const struct checkout_opts *opts,
* Make sure all pathspecs participated in locating the paths
* to be checked out.
*/
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++)
if (opts->overlay_mode)
mark_ce_for_checkout_overlay(active_cache[pos],
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 08/26] commit: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (6 preceding siblings ...)
2021-04-12 21:07 ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:07 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
` (18 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:07 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
These two loops iterate over all cache entries, so ensure that a sparse
index is expanded to a full index before we do so.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/commit.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/builtin/commit.c b/builtin/commit.c
index 739110c5a7f6..cf0c36d1dcb2 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -251,6 +251,8 @@ static int list_paths(struct string_list *list, const char *with_tree,
free(max_prefix);
}
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
struct string_list_item *item;
@@ -931,6 +933,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
if (get_oid(parent, &oid)) {
int i, ita_nr = 0;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++)
if (ce_intent_to_add(active_cache[i]))
ita_nr++;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 09/26] difftool: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (7 preceding siblings ...)
2021-04-12 21:07 ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
` (17 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index has
been expanded to a full one to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/difftool.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 6e18e623fddf..32c914dde6a0 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -584,6 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
rc = run_command_v_opt(helper_argv, flags);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&wtindex);
+
/*
* If the diff includes working copy files and those
* files were modified during the diff, then the changes
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 10/26] fsck: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (8 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
` (16 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
When verifying all blobs reachable from the index, ensure that a sparse
index has been expanded to a full one to avoid missing some blobs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/fsck.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c706..4d7f5c63ce0d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -883,6 +883,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
verify_index_checksum = 1;
verify_ce_order = 1;
read_cache();
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
unsigned int mode;
struct blob *blob;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 11/26] grep: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (9 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
` (15 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one so we do not miss blobs to scan. Later, this can
integrate more carefully with sparse indexes with proper testing.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/grep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/grep.c b/builtin/grep.c
index 4e91a253ac3b..c2d40414e975 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -504,6 +504,8 @@ static int grep_cache(struct grep_opt *opt,
if (repo_read_index(repo) < 0)
die(_("index file corrupt"));
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(repo->index);
for (nr = 0; nr < repo->index->cache_nr; nr++) {
const struct cache_entry *ce = repo->index->cache[nr];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 12/26] ls-files: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (10 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
` (14 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid missing files.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/ls-files.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 4f9ed1fb29b7..a0b4e54d1149 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -317,6 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
if (!(show_cached || show_stage || show_deleted || show_modified))
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(repo->index);
for (i = 0; i < repo->index->cache_nr; i++) {
const struct cache_entry *ce = repo->index->cache[i];
struct stat st;
@@ -494,6 +496,8 @@ void overlay_tree_on_index(struct index_state *istate,
die("bad tree-ish %s", tree_name);
/* Hoist the unmerged entries up to stage #3 to make room */
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
if (!ce_stage(ce))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 13/26] merge-index: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (11 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
` (13 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full one to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/merge-index.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/builtin/merge-index.c b/builtin/merge-index.c
index 38ea6ad6ca25..c0383fe9df9a 100644
--- a/builtin/merge-index.c
+++ b/builtin/merge-index.c
@@ -58,6 +58,8 @@ static void merge_one_path(const char *path)
static void merge_all(void)
{
int i;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
@@ -80,6 +82,9 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
read_cache();
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
+
i = 1;
if (!strcmp(argv[i], "-o")) {
one_shot = 1;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 14/26] rm: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (12 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
` (12 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/rm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f02..5559a0b453a3 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -293,6 +293,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
seen = xcalloc(pathspec.nr, 1);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
if (!ce_path_match(&the_index, ce, &pathspec, seen))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 15/26] stash: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (13 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
` (11 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/stash.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/stash.c b/builtin/stash.c
index ba774cce674f..6fb7178ef2fa 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1350,6 +1350,8 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
int i;
char *ps_matched = xcalloc(ps->nr, 1);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++)
ce_path_match(&the_index, active_cache[i], ps,
ps_matched);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 16/26] update-index: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (14 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
` (10 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
builtin/update-index.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 79087bccea4b..f1f16f2de526 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -745,6 +745,8 @@ static int do_reupdate(int ac, const char **av,
*/
has_head = 0;
redo:
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++) {
const struct cache_entry *ce = active_cache[pos];
struct cache_entry *old = NULL;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 17/26] dir: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (15 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
` (9 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
dir.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/dir.c b/dir.c
index 5b00dfb5b144..166238e79f52 100644
--- a/dir.c
+++ b/dir.c
@@ -3533,6 +3533,8 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
if (repo_read_index(&subrepo) < 0)
die(_("index file corrupt in repo %s"), subrepo.gitdir);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(subrepo.index);
for (i = 0; i < subrepo.index->cache_nr; i++) {
const struct cache_entry *ce = subrepo.index->cache[i];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 18/26] entry: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (16 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
` (8 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
entry.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/entry.c b/entry.c
index 7b9f43716f76..891e4ba2b45a 100644
--- a/entry.c
+++ b/entry.c
@@ -412,6 +412,8 @@ static void mark_colliding_entries(const struct checkout *state,
ce->ce_flags |= CE_MATCHED;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(state->istate);
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 19/26] merge-recursive: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (17 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
` (7 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
merge-recursive.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/merge-recursive.c b/merge-recursive.c
index b8de7a704eae..91d8597728c1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -522,6 +522,8 @@ static struct string_list *get_unmerged(struct index_state *istate)
unmerged->strdup_strings = 1;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct string_list_item *item;
struct stage_data *e;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 20/26] pathspec: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (18 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
` (6 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
pathspec.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/pathspec.c b/pathspec.c
index b6e333965cb4..d67688bab74b 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -36,6 +36,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
num_unmatched++;
if (!num_unmatched)
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
ce_path_match(istate, ce, pathspec, seen);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 21/26] read-cache: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (19 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
read-cache.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/read-cache.c b/read-cache.c
index 3698bc7bf77d..a9dcf0ab4f78 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1577,6 +1577,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
*/
preload_index(istate, pathspec, 0);
trace2_region_enter("index", "refresh", NULL);
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce, *new_entry;
int cache_errno = 0;
@@ -2498,6 +2500,8 @@ int repo_index_has_changes(struct repository *repo,
diff_flush(&opt);
return opt.flags.has_changes != 0;
} else {
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; sb && i < istate->cache_nr; i++) {
if (i)
strbuf_addch(sb, ' ');
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 22/26] resolve-undo: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (20 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
resolve-undo.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/resolve-undo.c b/resolve-undo.c
index 236320f179cb..ee36ca58b135 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -172,6 +172,8 @@ void unmerge_marked_index(struct index_state *istate)
if (!istate->resolve_undo)
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
if (ce->ce_flags & CE_MATCHED)
@@ -186,6 +188,8 @@ void unmerge_index(struct index_state *istate, const struct pathspec *pathspec)
if (!istate->resolve_undo)
return;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
if (!ce_path_match(istate, ce, pathspec, NULL))
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 23/26] revision: ensure full index
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (21 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Before iterating over all index entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior. This case could
be integrated later by ensuring that we walk the tree in the
sparse-directory entry, but the current behavior is only expecting
blobs. Save this integration for later when it can be properly tested.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
revision.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/revision.c b/revision.c
index b78733f5089b..b72e0ac1bdca 100644
--- a/revision.c
+++ b/revision.c
@@ -1680,6 +1680,8 @@ static void do_add_index_objects_to_pending(struct rev_info *revs,
{
int i;
+ /* TODO: audit for interaction with sparse-index. */
+ ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
struct blob *blob;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 24/26] name-hash: don't add directories to name_hash
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (22 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Sparse directory entries represent a directory that is outside the
sparse-checkout definition. These are not paths to blobs, so should not
be added to the name_hash table. Instead, they should be added to the
directory hashtable when 'ignore_case' is true.
Add a condition to avoid placing sparse directories into the name_hash
hashtable. This avoids filling the table with extra entries that will
never be queried.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
name-hash.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/name-hash.c b/name-hash.c
index 4e03fac9bb12..d08deaa2c9e7 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -109,8 +109,11 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
if (ce->ce_flags & CE_HASHED)
return;
ce->ce_flags |= CE_HASHED;
- hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
- hashmap_add(&istate->name_hash, &ce->ent);
+
+ if (!S_ISSPARSEDIR(ce->ce_mode)) {
+ hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce)));
+ hashmap_add(&istate->name_hash, &ce->ent);
+ }
if (ignore_case)
add_dir_entry(istate, ce);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 25/26] sparse-index: expand_to_path()
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (23 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-12 21:08 ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-13 16:02 ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.
When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.
Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.
The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.
There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.
To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
sparse-index.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
sparse-index.h | 13 +++++++++
2 files changed, 86 insertions(+)
diff --git a/sparse-index.c b/sparse-index.c
index 95ea17174da3..6f21397e2ee0 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -283,3 +283,76 @@ void ensure_full_index(struct index_state *istate)
trace2_region_leave("index", "ensure_full_index", istate->repo);
}
+
+/*
+ * This static global helps avoid infinite recursion between
+ * expand_to_path() and index_file_exists().
+ */
+static int in_expand_to_path = 0;
+
+void expand_to_path(struct index_state *istate,
+ const char *path, size_t pathlen, int icase)
+{
+ struct strbuf path_mutable = STRBUF_INIT;
+ size_t substr_len;
+
+ /* prevent extra recursion */
+ if (in_expand_to_path)
+ return;
+
+ if (!istate || !istate->sparse_index)
+ return;
+
+ if (!istate->repo)
+ istate->repo = the_repository;
+
+ in_expand_to_path = 1;
+
+ /*
+ * We only need to actually expand a region if the
+ * following are both true:
+ *
+ * 1. 'path' is not already in the index.
+ * 2. Some parent directory of 'path' is a sparse directory.
+ */
+
+ if (index_file_exists(istate, path, pathlen, icase))
+ goto cleanup;
+
+ strbuf_add(&path_mutable, path, pathlen);
+ strbuf_addch(&path_mutable, '/');
+
+ /* Check the name hash for all parent directories */
+ substr_len = 0;
+ while (substr_len < pathlen) {
+ char temp;
+ char *replace = strchr(path_mutable.buf + substr_len, '/');
+
+ if (!replace)
+ break;
+
+ /* replace the character _after_ the slash */
+ replace++;
+ temp = *replace;
+ *replace = '\0';
+ if (index_file_exists(istate, path_mutable.buf,
+ path_mutable.len, icase)) {
+ /*
+ * We found a parent directory in the name-hash
+ * hashtable, because only sparse directory entries
+ * have a trailing '/' character. Since "path" wasn't
+ * in the index, perhaps it exists within this
+ * sparse-directory. Expand accordingly.
+ */
+ ensure_full_index(istate);
+ break;
+ }
+
+ *replace = temp;
+ substr_len = replace - path_mutable.buf;
+ }
+
+cleanup:
+ strbuf_release(&path_mutable);
+ in_expand_to_path = 0;
+}
diff --git a/sparse-index.h b/sparse-index.h
index 0268f38753c0..1115a0d7dd98 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -4,6 +4,19 @@
struct index_state;
int convert_to_sparse(struct index_state *istate);
+/*
+ * Some places in the codebase expect to search for a specific path.
+ * This path might be outside of the sparse-checkout definition, in
+ * which case a sparse-index may not contain a path for that index.
+ *
+ * Given an index and a path, check to see if a leading directory for
+ * 'path' exists in the index as a sparse directory. In that case,
+ * expand that sparse directory to a full range of cache entries and
+ * populate the index accordingly.
+ */
+void expand_to_path(struct index_state *istate,
+ const char *path, size_t pathlen, int icase);
+
struct repository;
int set_sparse_index_config(struct repository *repo, int enable);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* [PATCH v3 26/26] name-hash: use expand_to_path()
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (24 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-12 21:08 ` Derrick Stolee via GitGitGadget
2021-04-13 16:02 ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
26 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2021-04-12 21:08 UTC (permalink / raw)
To: git
Cc: newren, gitster, Derrick Stolee, Matheus Tavares Bernardino,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
A sparse-index loads the name-hash data for its entries, including the
sparse-directory entries. If a caller asks for a path that is contained
within a sparse-directory entry, we need to expand to a full index and
recalculate the name hash table before returning the result. Insert
calls to expand_to_path() to protect against this case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
name-hash.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/name-hash.c b/name-hash.c
index d08deaa2c9e7..383cf589969c 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,6 +8,7 @@
#include "cache.h"
#include "thread-utils.h"
#include "trace2.h"
+#include "sparse-index.h"
struct dir_entry {
struct hashmap_entry ent;
@@ -683,6 +684,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
struct dir_entry *dir;
lazy_init_name_hash(istate);
+ expand_to_path(istate, name, namelen, 0);
dir = find_dir_entry(istate, name, namelen);
return dir && dir->nr;
}
@@ -693,6 +695,7 @@ void adjust_dirname_case(struct index_state *istate, char *name)
const char *ptr = startPtr;
lazy_init_name_hash(istate);
+ expand_to_path(istate, name, strlen(name), 0);
while (*ptr) {
while (*ptr && *ptr != '/')
ptr++;
@@ -716,6 +719,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
unsigned int hash = memihash(name, namelen);
lazy_init_name_hash(istate);
+ expand_to_path(istate, name, namelen, icase);
ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL,
struct cache_entry, ent);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 111+ messages in thread
* Re: [PATCH v3 00/26] Sparse Index: API protections
2021-04-12 21:07 ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
` (25 preceding siblings ...)
2021-04-12 21:08 ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
@ 2021-04-13 16:02 ` Elijah Newren
2021-04-14 20:44 ` Junio C Hamano
26 siblings, 1 reply; 111+ messages in thread
From: Elijah Newren @ 2021-04-13 16:02 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
Matheus Tavares Bernardino, Derrick Stolee
On Mon, Apr 12, 2021 at 2:08 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Here is the second patch series submission coming out of the sparse-index
> RFC [1].
>
> [1]
> https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/
>
> This is based on ds/sparse-index.
>
> The point of this series is to insert protections for the consumers of the
> in-memory index to avoid unintended behavior change when using a sparse
> index versus a full one.
>
> We mark certain regions of code as needing a full index, so we call
> ensure_full_index() to expand a sparse index to a full one, if necessary.
> These protections are inserted file-by-file in every loop over all cache
> entries. Well, "most" loops, because some are going to be handled in the
> very next series so I leave them out.
>
> Many callers use index_name_pos() to find a path by name. In these cases, we
> can check if that position resolves to a sparse directory instance. In those
> cases, we just expand to a full index and run the search again.
>
> The last few patches deal with the name-hash hashtable for doing O(1)
> lookups.
>
> These protections don't do much right now, since the previous series created
> the_repository->settings.command_requires_full_index to guard all index
> reads and writes to ensure the in-memory copy is full for commands that have
> not been tested with the sparse index yet.
>
> However, after this series is complete, we now have a straight-forward plan
> for making commands "sparse aware" one-by-one:
>
> 1. Disable settings.command_requires_full_index to allow an in-memory
> sparse-index.
> 2. Run versions of that command under a debugger, breaking on
> ensure_full_index().
> 3. Examine the call stack to determine the context of that expansion, then
> implement the proper behavior in those locations.
> 4. Add tests to ensure we are checking this logic in the presence of sparse
> directory entries.
>
> I will admit that mostly it is the writing of the test cases that takes the
> most time in the conversions I've done so far.
>
>
> Updates in v3
> =============
>
> * I updated based on Elijah's feedback.
> * One new patch splits out a change that Elijah (rightfully) pointed out
> did not belong with the patch it was originally in.
>
> I gave it time to see if any other comments came in, but it looks like
> review stabilized. I probably waited a bit longer than I should have.
This round looks good to me.
Reviewed-by: Elijah Newren <newren@gmail.com>
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v3 00/26] Sparse Index: API protections
2021-04-13 16:02 ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
@ 2021-04-14 20:44 ` Junio C Hamano
2021-04-15 2:42 ` Derrick Stolee
0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2021-04-14 20:44 UTC (permalink / raw)
To: Elijah Newren
Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Derrick Stolee,
Matheus Tavares Bernardino, Derrick Stolee
Elijah Newren <newren@gmail.com> writes:
> This round looks good to me.
>
> Reviewed-by: Elijah Newren <newren@gmail.com>
Thanks; this kind of change inevitably would involve semantic
conflicts with topics in flight, but we've seen Derrick works well
together with others in such scenarios already, so let's run with
this version and see what happens.
Will merge to 'next' until there are new issues found, by the end of
the week at the latest.
Thanks.
^ permalink raw reply [flat|nested] 111+ messages in thread
* Re: [PATCH v3 00/26] Sparse Index: API protections
2021-04-14 20:44 ` Junio C Hamano
@ 2021-04-15 2:42 ` Derrick Stolee
0 siblings, 0 replies; 111+ messages in thread
From: Derrick Stolee @ 2021-04-15 2:42 UTC (permalink / raw)
To: Junio C Hamano, Elijah Newren
Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
Matheus Tavares Bernardino, Derrick Stolee
On 4/14/2021 4:44 PM, Junio C Hamano wrote:
> Elijah Newren <newren@gmail.com> writes:
>
>> This round looks good to me.
>>
>> Reviewed-by: Elijah Newren <newren@gmail.com>
>
> Thanks; this kind of change inevitably would involve semantic
> conflicts with topics in flight, but we've seen Derrick works well
> together with others in such scenarios already, so let's run with
> this version and see what happens.
Thanks!
Semantic conflicts like the one in mt/add-rm-sparse-checkout won't
show up until I start relaxing command_requires_full_index, so I'll
need to be careful at those points. But I'll keep a lookout for
other issues contributors have when integrating across this change.
> Will merge to 'next' until there are new issues found, by the end of
> the week at the latest.
That timeline works for me.
-Stolee
^ permalink raw reply [flat|nested] 111+ messages in thread