* [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' @ 2021-10-15 19:54 Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-15 19:54 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye This series updates do_read_index to use the index.sparse config setting when determining whether the index should be expanded or collapsed. If the command & repo allow use of a sparse index, index.sparse is enabled, and a full index is read from disk, the index is collapsed before returning to the caller. Conversely, if index.sparse is disabled but the index read from disk is sparse, the index is expanded before returning. This allows index.sparse to control use of the sparse index in addition to its existing control over how the index is written to disk. It also introduces the ability to enable/disable the sparse index on a command-by-command basis (e.g., allowing a user to troubleshoot a sparse-aware command with '-c index.sparse=false' [1]). While testing this change, a bug was found in 'test-tool read-cache' in which config settings for the repository were not initialized before preparing the repo settings. This caused index.sparse to always be 'false' when using the test helper in a cone-mode sparse checkout, breaking tests in t1091 and t1092. The issue is fixed by moving prepare_repo_settings after config setup. [1] https://lore.kernel.org/git/cc60c6e7-ecef-ae22-8ec7-ab290ff2b830@gmail.com/ Thanks! -Victoria Victoria Dye (2): test-read-cache.c: prepare_repo_settings after config init sparse-index: update index read to consider index.sparse config read-cache.c | 5 +++- t/helper/test-read-cache.c | 5 ++-- t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 3 deletions(-) base-commit: f443b226ca681d87a3a31e245a70e6bc2769123c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1059%2Fvdye%2Fsparse%2Findex-sparse-config-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1059/vdye/sparse/index-sparse-config-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1059 -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init 2021-10-15 19:54 [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget @ 2021-10-15 19:54 ` Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 2/2] sparse-index: update index read to consider index.sparse config Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-15 19:54 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Move `prepare_repo_settings` after the git directory has been set up in `test-read-cache.c`. The git directory settings must be initialized to properly assign repo settings using the worktree-level git config. Signed-off-by: Victoria Dye <vdye@github.com> --- t/helper/test-read-cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index b52c174acc7..0d9f08931a1 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv) int table = 0, expand = 0; initialize_the_repository(); - prepare_repo_settings(r); - r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv) setup_git_directory(); git_config(git_default_config, NULL); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; + for (i = 0; i < cnt; i++) { repo_read_index(r); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-15 19:54 [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget @ 2021-10-15 19:54 ` Victoria Dye via GitGitGadget 2021-10-15 21:53 ` Junio C Hamano 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2 siblings, 1 reply; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-15 19:54 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Use the index.sparse config setting to expand or collapse the index when read. Previously, index.sparse would determine how the index would be written to disk, but would not enforce whether the index is read into memory as full or sparse. Now, the index is expanded when a sparse index is read with `index.sparse=false` and is collapsed to sparse when a full index is read with `index.sparse=true` (and the command does not require a full index). This makes the behavior of `index.sparse` more intuitive, as it now clearly enables/disables usage of a sparse index. It also provides users with a way to disable the sparse index per-command (e.g., for troubleshooting purposes) without needing to re-initialize the sparse-checkout. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- read-cache.c | 5 +++- t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index a78b88a41bf..c3f31718b19 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2338,8 +2338,11 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) if (!istate->repo) istate->repo = the_repository; prepare_repo_settings(istate->repo); - if (istate->repo->settings.command_requires_full_index) + if (!istate->repo->settings.sparse_index || + istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else if (!istate->sparse_index) + convert_to_sparse(istate, 0); return istate->cache_nr; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca91c6a67f8..59accde1fa3 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' test_region index ensure_full_index trace2.txt ' +test_expect_success 'index.sparse disabled inline uses full index' ' + init_repos && + + # When index.sparse is disabled inline with `git status`, the + # index is expanded at the beginning of the execution then never + # converted back to sparse. It is then written to disk as a full index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c index.sparse=false status && + ! test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + # Since index.sparse is set to true at a repo level, the index + # is converted from full to sparse when read, then never expanded + # over the course of `git status`. It is written to disk as a sparse + # index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt && + + # Now that the index has been written to disk as sparse, it is not + # converted to sparse (or expanded to full) when read by `git status`. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + ! test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt +' + ensure_not_expanded () { rm -f trace2.txt && echo >>sparse-index/untracked.txt && -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-15 19:54 ` [PATCH 2/2] sparse-index: update index read to consider index.sparse config Victoria Dye via GitGitGadget @ 2021-10-15 21:53 ` Junio C Hamano 2021-10-17 1:21 ` Derrick Stolee 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-10-15 21:53 UTC (permalink / raw) To: Victoria Dye via GitGitGadget; +Cc: git, stolee, Victoria Dye "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Victoria Dye <vdye@github.com> > > Use the index.sparse config setting to expand or collapse the index when > read. Previously, index.sparse would determine how the index would be > written to disk, but would not enforce whether the index is read into memory > as full or sparse. Now, the index is expanded when a sparse index is read > with `index.sparse=false` and is collapsed to sparse when a full index is > read with `index.sparse=true` (and the command does not require a full > index). Instead of calling both in-core index and on-disk index, perhaps use "the in-core index" as appropriately in the above description and the result would become much less ambigous. My knee-jerk reaction was that it is of dubious value to spend cycles to make the in-core index sparse after reading a non-sparse index from the disk to give to the caller, but this hurts only the commands that are not yet sparse-aware and call ensure_full_index() as the first thing they do. To them, we are wasting cycles to shrink and expand for no good reason, and after they are done, the final writeout would create a sparse on-disk index. Besides, the on-disk index is expected to be sparse most of the time when index.sparse is true, so it is hopefully a one-time waste that corrects by itself. For all commands that are sparse-aware, especially when asked to perform their operation on the paths that are not hidden by a tree-like index entry, it may or may not be a win, but the downside would be much smaller. The cost to shrink a full in-core index before writing out as a sparse one should be comparable to the cost to shrink a full in-core index just read from the disk before the sparse-index-aware caller works on it and leaves a still mostly sparse in-core index to be written out without much extra work to re-shrink it to the disk. > This makes the behavior of `index.sparse` more intuitive, as it now clearly > enables/disables usage of a sparse index. It is a minor thing, so I am willing to let it pass, but I am not sure about this claim. The write-out codepath ensures, independent of this change, that a full on-disk index is corrected to become sparse when the configuration is set to true, and a sparse on-disk index is corrected to become full when the configuration is set to false, no? So the only "intuitive"-ness we may be gaining is that the codepaths that are sparse-aware would work in their "sparse" (non-"sparse") mode when index.sparse is set to true (false), respectively, no matter how sparse (or not sparse) the on-disk index they work on is initially. That might help debuggability (assuming that converting between the full and sparse forms are working correctly), but I am not sure if that is something end users would even care about. > - if (istate->repo->settings.command_requires_full_index) > + if (!istate->repo->settings.sparse_index || > + istate->repo->settings.command_requires_full_index) > ensure_full_index(istate); > + else if (!istate->sparse_index) > + convert_to_sparse(istate, 0); > > return istate->cache_nr; Quite straight-forward. Looking good. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-15 21:53 ` Junio C Hamano @ 2021-10-17 1:21 ` Derrick Stolee 2021-10-17 5:58 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-10-17 1:21 UTC (permalink / raw) To: Junio C Hamano, Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye On 10/15/2021 5:53 PM, Junio C Hamano wrote: > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Victoria Dye <vdye@github.com> >> >> Use the index.sparse config setting to expand or collapse the index when >> read. Previously, index.sparse would determine how the index would be >> written to disk, but would not enforce whether the index is read into memory >> as full or sparse. Now, the index is expanded when a sparse index is read >> with `index.sparse=false` and is collapsed to sparse when a full index is >> read with `index.sparse=true` (and the command does not require a full >> index). > > Instead of calling both in-core index and on-disk index, perhaps use > "the in-core index" as appropriately in the above description and > the result would become much less ambigous. > > My knee-jerk reaction was that it is of dubious value to spend > cycles to make the in-core index sparse after reading a non-sparse > index from the disk to give to the caller, but this hurts only the > commands that are not yet sparse-aware and call ensure_full_index() > as the first thing they do. To them, we are wasting cycles to > shrink and expand for no good reason, and after they are done, the > final writeout would create a sparse on-disk index. I think you are slightly mistaken here: If index.sparse=true, then a full index will be converted to one on write, but not immediately upon read. This means that subsequent commands will read a sparse index, and they will either benefit from that or not depending on whether they are integrated with the sparse index or not. The new behavior here is that if index.sparse=false, then we convert a sparse index to a full one upon read, avoiding any chance that a Git command is operating on a sparse index in-memory. The simplest summary I can say is here: * If index.sparse=false, then a sparse index will be converted to full upon read. * If index.sparse=true, then a full index will be converted to sparse on write. > Besides, the on-disk index is expected to be sparse most of the time > when index.sparse is true, so it is hopefully a one-time waste that > corrects by itself. > > For all commands that are sparse-aware, especially when asked to > perform their operation on the paths that are not hidden by a > tree-like index entry, it may or may not be a win, but the downside > would be much smaller. The cost to shrink a full in-core index > before writing out as a sparse one should be comparable to the cost > to shrink a full in-core index just read from the disk before the > sparse-index-aware caller works on it and leaves a still mostly > sparse in-core index to be written out without much extra work to > re-shrink it to the disk. > >> This makes the behavior of `index.sparse` more intuitive, as it now clearly >> enables/disables usage of a sparse index. > > It is a minor thing, so I am willing to let it pass, but I am not > sure about this claim. The write-out codepath ensures, independent > of this change, that a full on-disk index is corrected to become > sparse when the configuration is set to true, and a sparse on-disk > index is corrected to become full when the configuration is set to > false, no? The previous behavior required an index write for the sparse-to-full transition. After this change, an index write is still required for the full-to-sparse transition. > So the only "intuitive"-ness we may be gaining is that the codepaths > that are sparse-aware would work in their "sparse" (non-"sparse") > mode when index.sparse is set to true (false), respectively, no > matter how sparse (or not sparse) the on-disk index they work on is > initially. That might help debuggability (assuming that converting > between the full and sparse forms are working correctly), but I am > not sure if that is something end users would even care about. I think you have a case for concern with the word "intuitive". Even though I agree that the new setup is the best possible interpretation of the setting, its inherit asymmetry can be confusing. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-17 1:21 ` Derrick Stolee @ 2021-10-17 5:58 ` Junio C Hamano 2021-10-17 19:33 ` Derrick Stolee 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-10-17 5:58 UTC (permalink / raw) To: Derrick Stolee; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye Derrick Stolee <stolee@gmail.com> writes: > I think you are slightly mistaken here: If index.sparse=true, then a > full index will be converted to one on write, but not immediately upon > read. This means that subsequent commands will read a sparse index, and > they will either benefit from that or not depending on whether they are > integrated with the sparse index or not. > > The new behavior here is that if index.sparse=false, then we convert > a sparse index to a full one upon read, avoiding any chance that a > Git command is operating on a sparse index in-memory. And if index.sparse=true, then we convert a full on-disk index to a sparse one in-core upon reading, right? My comment was solely on that side of the picture, not on the "index.sparse is set to false so we automatically expand" case. > The simplest summary I can say is here: > > * If index.sparse=false, then a sparse index will be converted to > full upon read. > > * If index.sparse=true, then a full index will be converted to sparse > on write. Oh, I see, so yes I was very much misunderstanding what you guys are trying to do. I somehow thought that sparse-to-full and full-to-sparse conversions (1) already happen on the write codepath, and (2) this patch makes them both happen also on the read codepath. IOW: * If index.sparse=false, a sparse index will be written as full, and if it is true, a non-sparse index will be written as sparse, even before these patches. * In addition, with these patches, if index.sparse=false, a sparse index will be expaned to full upon reading, and if it is true, a non-sparse index will be shrunk to sparse upon reading was what I was expecting. What your summary above is saying is very much different. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-17 5:58 ` Junio C Hamano @ 2021-10-17 19:33 ` Derrick Stolee 2021-10-18 1:15 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-10-17 19:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye On 10/17/2021 1:58 AM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> I think you are slightly mistaken here: If index.sparse=true, then a >> full index will be converted to one on write, but not immediately upon >> read. This means that subsequent commands will read a sparse index, and >> they will either benefit from that or not depending on whether they are >> integrated with the sparse index or not. >> >> The new behavior here is that if index.sparse=false, then we convert >> a sparse index to a full one upon read, avoiding any chance that a >> Git command is operating on a sparse index in-memory. > > And if index.sparse=true, then we convert a full on-disk index to a > sparse one in-core upon reading, right? My comment was solely on > that side of the picture, not on the "index.sparse is set to false > so we automatically expand" case. > >> The simplest summary I can say is here: >> >> * If index.sparse=false, then a sparse index will be converted to >> full upon read. >> >> * If index.sparse=true, then a full index will be converted to sparse >> on write. > > Oh, I see, so yes I was very much misunderstanding what you guys are > trying to do. I somehow thought that sparse-to-full and > full-to-sparse conversions (1) already happen on the write codepath, > and (2) this patch makes them both happen also on the read codepath. > > IOW: > > * If index.sparse=false, a sparse index will be written as full, > and if it is true, a non-sparse index will be written as > sparse, even before these patches. This statement is true. > * In addition, with these patches, if index.sparse=false, a > sparse index will be expaned to full upon reading, and if it > is true, a non-sparse index will be shrunk to sparse upon > reading This is only half true. If "index.sparse=false", then a sparse index will be expanded to full upon reading. If "index.sparse=true" then nothing special will happen to an index on reading. > What your summary above is saying is very much different. Yes, these are different things. It also gets around the concern that "if we have a non-integrated command, then index.sparse=true would convert a full index to a sparse one, only to be expanded again". That doesn't happen if we only convert from full to sparse on write. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-17 19:33 ` Derrick Stolee @ 2021-10-18 1:15 ` Junio C Hamano 2021-10-18 13:25 ` Derrick Stolee 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-10-18 1:15 UTC (permalink / raw) To: Derrick Stolee; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye Derrick Stolee <stolee@gmail.com> writes: >> * In addition, with these patches, if index.sparse=false, a >> sparse index will be expaned to full upon reading, and if it >> is true, a non-sparse index will be shrunk to sparse upon >> reading > > This is only half true. If "index.sparse=false", then a sparse > index will be expanded to full upon reading. If "index.sparse=true" > then nothing special will happen to an index on reading. OK. I somehow got the impression that we convert in both ways from the patch text under discussion, specifically this part in do_read_index(): > - if (istate->repo->settings.command_requires_full_index) > + if (!istate->repo->settings.sparse_index || > + istate->repo->settings.command_requires_full_index) > ensure_full_index(istate); > + else if (!istate->sparse_index) > + convert_to_sparse(istate, 0); > > return istate->cache_nr; We used to say "when we know we are running a command that is not sparse ready, expand it here" and nothing else. We now added a bit more condition for expansion, i.e. "when we are told that the repository does not want sparse index, or when the command is not sparse ready". But the patch goes one more step. "If we didn't find a reason to expand to full, and if the in-core index we read is not sparse, then call convert_to_sparse() on it". So if the repo settings tells us that the repository wants a sparse index, and the index we read was not sparse, what does convert_to_sparse() without MEM_ONLY flag bit do? Nothing special? It internally checks the repo->settings.sparse_index again and refrains from shrinking the entries, but we are talking about the case where that repo settings is set to "true", so wouldn't we end up getting the originally full in-core index turned into sparse? It is a confusing piece of code. I see many unconditional calls to convert_to_sparse(istate, 0) on the write code paths, where I instead would expect "if the repo wants sparse, call convert_to_sparse(), and if the repo does not want sparse, call ensure_full_index()", before actualy writing the entries out. These also are setting traps to confuse readers. Perhaps we want a helper function "ensure_right_sparsity(istate)" that would be called where we have if (condition) convert_to_sparse(); else ensure_full_index(); or an unconditonal call to convert_to_sparse() to unconfuse readers? Still puzzled... Thanks. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-18 1:15 ` Junio C Hamano @ 2021-10-18 13:25 ` Derrick Stolee 2021-10-18 14:14 ` Victoria Dye 2021-10-18 16:09 ` Junio C Hamano 0 siblings, 2 replies; 44+ messages in thread From: Derrick Stolee @ 2021-10-18 13:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye On 10/17/2021 9:15 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>> * In addition, with these patches, if index.sparse=false, a >>> sparse index will be expaned to full upon reading, and if it >>> is true, a non-sparse index will be shrunk to sparse upon >>> reading >> >> This is only half true. If "index.sparse=false", then a sparse >> index will be expanded to full upon reading. If "index.sparse=true" >> then nothing special will happen to an index on reading. > > OK. I somehow got the impression that we convert in both ways from > the patch text under discussion, specifically this part in > do_read_index(): > >> - if (istate->repo->settings.command_requires_full_index) >> + if (!istate->repo->settings.sparse_index || >> + istate->repo->settings.command_requires_full_index) >> ensure_full_index(istate); >> + else if (!istate->sparse_index) >> + convert_to_sparse(istate, 0); >> >> return istate->cache_nr; > > We used to say "when we know we are running a command that is not > sparse ready, expand it here" and nothing else. > > We now added a bit more condition for expansion, i.e. "when we are > told that the repository does not want sparse index, or when the > command is not sparse ready". > > But the patch goes one more step. "If we didn't find a reason to > expand to full, and if the in-core index we read is not sparse, then > call convert_to_sparse() on it". So if the repo settings tells us > that the repository wants a sparse index, and the index we read was > not sparse, what does convert_to_sparse() without MEM_ONLY flag bit > do? Nothing special? You are absolutely right. I've been talking about what I _thought_ the code does (and should do) but I missed this 'else if' which is in fact doing what you have been claiming the entire time. I should have done a better job double-checking the code before doubling down on my claims. I think the 'else if' should be removed, which would match my expectations. > I see many unconditional calls to convert_to_sparse(istate, 0) on > the write code paths, where I instead would expect "if the repo > wants sparse, call convert_to_sparse(), and if the repo does not > want sparse, call ensure_full_index()", before actualy writing the > entries out. These also are setting traps to confuse readers. > > Perhaps we want a helper function "ensure_right_sparsity(istate)" > that would be called where we have > > if (condition) > convert_to_sparse(); > else > ensure_full_index(); > > or an unconditonal call to convert_to_sparse() to unconfuse readers? convert_to_sparse() has several checks that prevent the conversion from happening, such as having a split index. In particular, it will skip the conversion if the_repository->settings.sparse_index is false. Thus, calling it unconditionally in the write code is correct. Doing these conditional checks within convert_to_sparse() make sense to avoid repeating the conditionals in all callers. ensure_full_index() doesn't do the same because we absolutely want a full index at the end of that method. Perhaps a rename to something like "convert_to_sparse_if_able()" would make sense? But it might be best to leave that one lie. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-18 13:25 ` Derrick Stolee @ 2021-10-18 14:14 ` Victoria Dye 2021-10-21 13:26 ` Derrick Stolee 2021-10-18 16:09 ` Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: Victoria Dye @ 2021-10-18 14:14 UTC (permalink / raw) To: Derrick Stolee, Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git Derrick Stolee wrote: > On 10/17/2021 9:15 PM, Junio C Hamano wrote: >> Derrick Stolee <stolee@gmail.com> writes: >> >>>> * In addition, with these patches, if index.sparse=false, a >>>> sparse index will be expaned to full upon reading, and if it >>>> is true, a non-sparse index will be shrunk to sparse upon >>>> reading >>> >>> This is only half true. If "index.sparse=false", then a sparse >>> index will be expanded to full upon reading. If "index.sparse=true" >>> then nothing special will happen to an index on reading. >> >> OK. I somehow got the impression that we convert in both ways from >> the patch text under discussion, specifically this part in >> do_read_index(): >> >>> - if (istate->repo->settings.command_requires_full_index) >>> + if (!istate->repo->settings.sparse_index || >>> + istate->repo->settings.command_requires_full_index) >>> ensure_full_index(istate); >>> + else if (!istate->sparse_index) >>> + convert_to_sparse(istate, 0); >>> >>> return istate->cache_nr; >> >> We used to say "when we know we are running a command that is not >> sparse ready, expand it here" and nothing else. >> >> We now added a bit more condition for expansion, i.e. "when we are >> told that the repository does not want sparse index, or when the >> command is not sparse ready". >> >> But the patch goes one more step. "If we didn't find a reason to >> expand to full, and if the in-core index we read is not sparse, then >> call convert_to_sparse() on it". So if the repo settings tells us >> that the repository wants a sparse index, and the index we read was >> not sparse, what does convert_to_sparse() without MEM_ONLY flag bit >> do? Nothing special? > > You are absolutely right. I've been talking about what I _thought_ > the code does (and should do) but I missed this 'else if' which is > in fact doing what you have been claiming the entire time. I should > have done a better job double-checking the code before doubling > down on my claims. > > I think the 'else if' should be removed, which would match my > expectations. > By leaving that part out, wouldn't you only solve half of the "mismatch" between in-core index and repo setting? Earlier, you described your expectation as: > * If index.sparse=false, then a sparse index will be converted to > full upon read. > > * If index.sparse=true, then a full index will be converted to sparse > on write. Why should the direction of change to the setting value (false->true vs true->false) cause the index to convert at different times? Consider the scenario: # In a cone-mode, sparse index-enabled sparse checkout repo $ git -c index.sparse=false status # 1 $ git status # 2 $ git status # 3 Before this patch, the index has the following states per command: (1) the index is sparse in-core, writes full on-disk (2) the index is full in-core, writes sparse on-disk (3) the index is sparse in-core, writes sparse on-disk Here, I see two mismatches in my expectations: (1) operates on an in-core sparse index, despite `index.sparse=false`, and (2) operates on an in-core full index, despite `index.sparse=true`. What you're suggesting solves only the first mismatch. However, the second mismatch incurs the performance hit of a supposedly-sparse command actually operating on an in-core full index. It also creates an inconsistency between (2) and (3) in their use of the sparse index. What I'd like this patch to do is: (1) the index is full in-core, writes full on-disk (2) the index is sparse in-core, writes sparse on-disk (3) the index is sparse in-core, writes sparse on-disk Here, there are no more mismatches between in-core index usage and what is written to disk, and (2) and (3) use the same index sparsity. >> I see many unconditional calls to convert_to_sparse(istate, 0) on >> the write code paths, where I instead would expect "if the repo >> wants sparse, call convert_to_sparse(), and if the repo does not >> want sparse, call ensure_full_index()", before actualy writing the >> entries out. These also are setting traps to confuse readers. >> >> Perhaps we want a helper function "ensure_right_sparsity(istate)" >> that would be called where we have >> >> if (condition) >> convert_to_sparse(); >> else >> ensure_full_index(); >> >> or an unconditonal call to convert_to_sparse() to unconfuse readers? > > convert_to_sparse() has several checks that prevent the conversion > from happening, such as having a split index. In particular, it will > skip the conversion if the_repository->settings.sparse_index is false. > Thus, calling it unconditionally in the write code is correct. > I may have introduced some confusion by redundantly checking `!istate->sparse_index` before converting to sparse (my goal was readability - showing the index does not need to be converted to sparse if it is already sparse - but I think it ended up doing the opposite). The condition could be removed entirely, thanks to an equivalent check inside `convert_to_sparse`: - if (istate->repo->settings.command_requires_full_index) + if (!istate->repo->settings.sparse_index || + istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else + convert_to_sparse(istate, 0); ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-18 14:14 ` Victoria Dye @ 2021-10-21 13:26 ` Derrick Stolee 0 siblings, 0 replies; 44+ messages in thread From: Derrick Stolee @ 2021-10-21 13:26 UTC (permalink / raw) To: Victoria Dye, Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git On 10/18/2021 10:14 AM, Victoria Dye wrote: > Derrick Stolee wrote: >> On 10/17/2021 9:15 PM, Junio C Hamano wrote: >>> OK. I somehow got the impression that we convert in both ways from >>> the patch text under discussion, specifically this part in >>> do_read_index(): >>> >>>> - if (istate->repo->settings.command_requires_full_index) >>>> + if (!istate->repo->settings.sparse_index || >>>> + istate->repo->settings.command_requires_full_index) >>>> ensure_full_index(istate); >>>> + else if (!istate->sparse_index) >>>> + convert_to_sparse(istate, 0); >>>> >>>> return istate->cache_nr; >>> >>> We used to say "when we know we are running a command that is not >>> sparse ready, expand it here" and nothing else. >>> >>> We now added a bit more condition for expansion, i.e. "when we are >>> told that the repository does not want sparse index, or when the >>> command is not sparse ready". >>> >>> But the patch goes one more step. "If we didn't find a reason to >>> expand to full, and if the in-core index we read is not sparse, then >>> call convert_to_sparse() on it". So if the repo settings tells us >>> that the repository wants a sparse index, and the index we read was >>> not sparse, what does convert_to_sparse() without MEM_ONLY flag bit >>> do? Nothing special? >> >> You are absolutely right. I've been talking about what I _thought_ >> the code does (and should do) but I missed this 'else if' which is >> in fact doing what you have been claiming the entire time. I should >> have done a better job double-checking the code before doubling >> down on my claims. >> >> I think the 'else if' should be removed, which would match my >> expectations. >> > > By leaving that part out, wouldn't you only solve half of the "mismatch" > between in-core index and repo setting? Earlier, you described your > expectation as: > >> * If index.sparse=false, then a sparse index will be converted to >> full upon read. >> >> * If index.sparse=true, then a full index will be converted to sparse >> on write. > > Why should the direction of change to the setting value (false->true vs > true->false) cause the index to convert at different times? Consider the > scenario: > > # In a cone-mode, sparse index-enabled sparse checkout repo > $ git -c index.sparse=false status # 1 > $ git status # 2 > $ git status # 3 > > Before this patch, the index has the following states per command: > > (1) the index is sparse in-core, writes full on-disk > (2) the index is full in-core, writes sparse on-disk > (3) the index is sparse in-core, writes sparse on-disk > > Here, I see two mismatches in my expectations: (1) operates on an in-core > sparse index, despite `index.sparse=false`, and (2) operates on an in-core > full index, despite `index.sparse=true`. > > What you're suggesting solves only the first mismatch. However, the second > mismatch incurs the performance hit of a supposedly-sparse command actually > operating on an in-core full index. It also creates an inconsistency between > (2) and (3) in their use of the sparse index. What I'd like this patch to do > is: > > (1) the index is full in-core, writes full on-disk > (2) the index is sparse in-core, writes sparse on-disk > (3) the index is sparse in-core, writes sparse on-disk > > Here, there are no more mismatches between in-core index usage and what is > written to disk, and (2) and (3) use the same index sparsity. I suppose that my perspective is that we always need to handle a full index in-core, because it is a subset of the capabilities of a sparse index. There is not much value in requiring the in-core index be sparse. But also, I'm now less concerned about commands that convert from full-to-sparse on read and expand back to full because of command_requires_full_index. This should be a short-lived issue because the index.sparse config is unlikely to be changing frequently, so once the index is converted to sparse on write, we no longer need to do any work to convert the in-core index to sparse on read. The thing to keep in mind is that not every command that reads the index also writes it. For example, the two 'git status' commands you list might not write the index if there is no new information in the filesystem that would trigger an index write. In short: I've shifted my view and no longer oppose this conversion immediately upon reading. >>> I see many unconditional calls to convert_to_sparse(istate, 0) on >>> the write code paths, where I instead would expect "if the repo >>> wants sparse, call convert_to_sparse(), and if the repo does not >>> want sparse, call ensure_full_index()", before actualy writing the >>> entries out. These also are setting traps to confuse readers. >>> >>> Perhaps we want a helper function "ensure_right_sparsity(istate)" >>> that would be called where we have >>> >>> if (condition) >>> convert_to_sparse(); >>> else >>> ensure_full_index(); >>> >>> or an unconditonal call to convert_to_sparse() to unconfuse readers? >> >> convert_to_sparse() has several checks that prevent the conversion >> from happening, such as having a split index. In particular, it will >> skip the conversion if the_repository->settings.sparse_index is false. >> Thus, calling it unconditionally in the write code is correct. >> > > I may have introduced some confusion by redundantly checking > `!istate->sparse_index` before converting to sparse (my goal was readability > - showing the index does not need to be converted to sparse if it is already > sparse - but I think it ended up doing the opposite). The condition could be > removed entirely, thanks to an equivalent check inside `convert_to_sparse`: > > - if (istate->repo->settings.command_requires_full_index) > + if (!istate->repo->settings.sparse_index || > + istate->repo->settings.command_requires_full_index) > ensure_full_index(istate); > + else > + convert_to_sparse(istate, 0); This is simpler. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config 2021-10-18 13:25 ` Derrick Stolee 2021-10-18 14:14 ` Victoria Dye @ 2021-10-18 16:09 ` Junio C Hamano 1 sibling, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2021-10-18 16:09 UTC (permalink / raw) To: Derrick Stolee; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye Derrick Stolee <stolee@gmail.com> writes: > I think the 'else if' should be removed, which would match my > expectations. > ... > convert_to_sparse() has several checks that prevent the conversion > from happening, such as having a split index. In particular, it will > skip the conversion if the_repository->settings.sparse_index is false. > Thus, calling it unconditionally in the write code is correct. Heh ;-) It was exactly the fact that convert_to_sparse() has many checks inside that confused me into thinking that the new "else" may turn into a no-op and the behaviour of the new code matches your description. It is confusing, and it being one way street does not help readers, either. Which in turn led me into worrying that our unconditional calls to convert_to_sparse(), which are not accompanied by corresponding calls to ensure_full_index() on the write code paths, were problematic. With the patch under discussion, in a repository that does not want to see a sparse index, if we did not have "otherwise, do sparsify the in-core index", lack of ensure_full_index() may not be a problem, because we are guaranteeing that the in-core index is fully expanded by the time the control reaches the write code path. But if we by default shrunk the in-core index upon reading (i.e. with the "else" clause), its correctness becomes muddy. Convert_to_sparse() may not turn a non-sparse in-core index into sparse with these checks, but it would not expand a sparse in-core index before writing out. Which in turn led me to suggest, instead of unconditional calls to convert_to_sparse(), make unconditional calls to a helper function that fixes the sparseness. I knew convert_to_sparse() refrains from touching the in-core index when the repository does not want a sparse index, but I also am reasonably sure that it lacks support to expanding the sparse index into full, so it cannot be the "fix to the correct sparseness" function. If our promise to a repository that does not want a sparse index is that the read code path ensures that the in-core index is fully expanded, and that nobody (including convert_to_sparse()) places a sparse entry in the in-core index, then the promise would not be broken, but I still wonder if the whole code is human-reader friendly. It wasn't to me ;-) > Perhaps a rename to something like "convert_to_sparse_if_able()" would > make sense? But it might be best to leave that one lie. "-if-able" does not make much sense to me, but "-as-needed" may. By not calling it "fix-sparseness", but saying "convert-to-sparse", we are making it clear that it is a one-way street that may make it sparse but never removes sparseness. I do not know if that is the guarantee we want to keep giving the callers (not saying that I have a hunch that it would be better to allow us to optionally expand to full in that function---I genuinely do not know), or we may want to call it a neutral "fix sparseness" to keep the door open for a future where we may expand an already sparse in-core index into full based on the setting. Thanks. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' 2021-10-15 19:54 [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 2/2] sparse-index: update index read to consider index.sparse config Victoria Dye via GitGitGadget @ 2021-10-21 20:48 ` Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget ` (3 more replies) 2 siblings, 4 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-21 20:48 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye This series updates do_read_index to use the index.sparse config setting when determining whether the index should be expanded or collapsed. If the command & repo allow use of a sparse index, index.sparse is enabled, and a full index is read from disk, the index is collapsed before returning to the caller. Conversely, if index.sparse is disabled but the index read from disk is sparse, the index is expanded before returning. This allows index.sparse to control use of the sparse index in addition to its existing control over how the index is written to disk. It also introduces the ability to enable/disable the sparse index on a command-by-command basis (e.g., allowing a user to troubleshoot a sparse-aware command with '-c index.sparse=false' [1]). While testing this change, a bug was found in 'test-tool read-cache' in which config settings for the repository were not initialized before preparing the repo settings. This caused index.sparse to always be 'false' when using the test helper in a cone-mode sparse checkout, breaking tests in t1091 and t1092. The issue is fixed by moving prepare_repo_settings after config setup. Changes since V1 ================ * Add ensure_correct_sparsity function that ensures the index is sparse if the repository settings (including index.sparse) allow it, otherwise ensuring the index is expanded to full. * Restructure condition in do_read_index to, rather than check specifically for the index.sparse config setting, call ensure_correct_sparsity unconditionally when command_requires_full_index is false. [1] https://lore.kernel.org/git/cc60c6e7-ecef-ae22-8ec7-ab290ff2b830@gmail.com/ Thanks! -Victoria Victoria Dye (3): test-read-cache.c: prepare_repo_settings after config init sparse-index: add ensure_correct_sparsity function sparse-index: update do_read_index to ensure correct sparsity read-cache.c | 8 +++++ sparse-index.c | 42 ++++++++++++++++++++++-- sparse-index.h | 2 ++ t/helper/test-read-cache.c | 5 +-- t/t1092-sparse-checkout-compatibility.sh | 31 +++++++++++++++++ 5 files changed, 83 insertions(+), 5 deletions(-) base-commit: f443b226ca681d87a3a31e245a70e6bc2769123c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1059%2Fvdye%2Fsparse%2Findex-sparse-config-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1059/vdye/sparse/index-sparse-config-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1059 Range-diff vs v1: 1: 6974ce7e7f5 = 1: 6974ce7e7f5 test-read-cache.c: prepare_repo_settings after config init -: ----------- > 2: 0b6e6633bb2 sparse-index: add ensure_correct_sparsity function 2: c6279b22545 ! 3: 437cf398256 sparse-index: update index read to consider index.sparse config @@ Metadata Author: Victoria Dye <vdye@github.com> ## Commit message ## - sparse-index: update index read to consider index.sparse config + sparse-index: update do_read_index to ensure correct sparsity - Use the index.sparse config setting to expand or collapse the index when - read. Previously, index.sparse would determine how the index would be - written to disk, but would not enforce whether the index is read into memory - as full or sparse. Now, the index is expanded when a sparse index is read - with `index.sparse=false` and is collapsed to sparse when a full index is - read with `index.sparse=true` (and the command does not require a full - index). + If `command_requires_full_index` is false, ensure correct in-core index + sparsity on read by calling `ensure_correct_sparsity`. This change is meant + to update the how the index is read in a command after sparse index-related + repository settings are modified. Previously, for example, if `index.sparse` + were changed from `true` to `false`, the in-core index on the next command + would be sparse. The index would only be expanded to full when it was next + written to disk. - This makes the behavior of `index.sparse` more intuitive, as it now clearly - enables/disables usage of a sparse index. It also provides users with a way - to disable the sparse index per-command (e.g., for troubleshooting purposes) - without needing to re-initialize the sparse-checkout. + By adding a call to `ensure_correct_sparsity`, the in-core index now matches + the sparsity dictated by the relevant repository settings as soon as it is + read into memory, rather than when it is later written to disk. + Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> ## read-cache.c ## @@ read-cache.c: int do_read_index(struct index_state *istate, const char *path, int must_exist) + if (!istate->repo) istate->repo = the_repository; ++ ++ /* ++ * If the command explicitly requires a full index, force it ++ * to be full. Otherwise, correct the sparsity based on repository ++ * settings and other properties of the index (if necessary). ++ */ prepare_repo_settings(istate->repo); -- if (istate->repo->settings.command_requires_full_index) -+ if (!istate->repo->settings.sparse_index || -+ istate->repo->settings.command_requires_full_index) + if (istate->repo->settings.command_requires_full_index) ensure_full_index(istate); -+ else if (!istate->sparse_index) -+ convert_to_sparse(istate, 0); ++ else ++ ensure_correct_sparsity(istate); return istate->cache_nr; -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget @ 2021-10-21 20:48 ` Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget ` (2 subsequent siblings) 3 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-21 20:48 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Move `prepare_repo_settings` after the git directory has been set up in `test-read-cache.c`. The git directory settings must be initialized to properly assign repo settings using the worktree-level git config. Signed-off-by: Victoria Dye <vdye@github.com> --- t/helper/test-read-cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index b52c174acc7..0d9f08931a1 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv) int table = 0, expand = 0; initialize_the_repository(); - prepare_repo_settings(r); - r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv) setup_git_directory(); git_config(git_default_config, NULL); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; + for (i = 0; i < cnt; i++) { repo_read_index(r); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget @ 2021-10-21 20:48 ` Victoria Dye via GitGitGadget 2021-10-21 22:20 ` Junio C Hamano 2021-10-21 20:48 ` [PATCH v2 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 3 siblings, 1 reply; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-21 20:48 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> The purpose of the `ensure_correct_sparsity` function is to provide a means of aligning the in-core index with the sparsity required by the repository settings and other properties of the index. The function will first attempt to convert the index to sparse, now with a "SPARSE_INDEX_VERIFY_ALLOWED" flag that forces the function to return a nonzero value if repository settings do not allow use of a sparse index. If a nonzero value is returned, the index is expanded to full with `ensure_full_index`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- sparse-index.c | 42 +++++++++++++++++++++++++++++++++++++++--- sparse-index.h | 2 ++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 7b7ff79e044..4273453e078 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -122,11 +122,10 @@ static int index_has_unmerged_entries(struct index_state *istate) return 0; } -int convert_to_sparse(struct index_state *istate, int flags) +static int can_convert_to_sparse(struct index_state *istate, int flags) { int test_env; - if (istate->sparse_index || !istate->cache_nr || - !core_apply_sparse_checkout || !core_sparse_checkout_cone) + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; if (!istate->repo) @@ -187,6 +186,30 @@ int convert_to_sparse(struct index_state *istate, int flags) if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) return 0; + return 1; +} + +int convert_to_sparse(struct index_state *istate, int flags) +{ + int verify = flags & SPARSE_INDEX_VERIFY_ALLOWED; + + /* + * If validating with strict checks against whether the sparse index is + * allowed, we want to check `can_convert_to_sparse` *before* exiting + * early due to an already sparse or empty index. + * + * If not performing strict validation, the order is reversed to avoid + * the more expensive checks in `can_convert_to_sparse` whenver possible. + */ + if (verify) { + if (!can_convert_to_sparse(istate, flags)) + return -1; + else if (istate->sparse_index || !istate->cache_nr) + return 0; + } else if (istate->sparse_index || !istate->cache_nr || + !can_convert_to_sparse(istate, flags)) + return 0; + remove_fsmonitor(istate); trace2_region_enter("index", "convert_to_sparse", istate->repo); @@ -313,6 +336,19 @@ void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } +void ensure_correct_sparsity(struct index_state *istate) +{ + /* + * First check whether the index can be converted to sparse by attempting + * to convert it with the SPARSE_INDEX_VERIFY_ALLOWED flag. If the + * SPARSE_INDEX_VERIFY_ALLOWED checks indicate that the index cannot + * be converted because repository settings and/or index properties + * do not allow it, expand the index to full. + */ + if (convert_to_sparse(istate, SPARSE_INDEX_VERIFY_ALLOWED)) + ensure_full_index(istate); +} + /* * This static global helps avoid infinite recursion between * expand_to_path() and index_file_exists(). diff --git a/sparse-index.h b/sparse-index.h index 9f3d7bc7faf..b61754f1f76 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -3,7 +3,9 @@ struct index_state; #define SPARSE_INDEX_MEMORY_ONLY (1 << 0) +#define SPARSE_INDEX_VERIFY_ALLOWED (1 << 1) int convert_to_sparse(struct index_state *istate, int flags); +void ensure_correct_sparsity(struct index_state *istate); /* * Some places in the codebase expect to search for a specific path. -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-21 20:48 ` [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget @ 2021-10-21 22:20 ` Junio C Hamano 2021-10-27 17:21 ` Victoria Dye 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-10-21 22:20 UTC (permalink / raw) To: Victoria Dye via GitGitGadget; +Cc: git, stolee, Victoria Dye "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > -int convert_to_sparse(struct index_state *istate, int flags) > +static int can_convert_to_sparse(struct index_state *istate, int flags) > { > int test_env; May not be a problem with this patch, but this variable does not need to be in this large a scope. > - if (istate->sparse_index || !istate->cache_nr || > - !core_apply_sparse_checkout || !core_sparse_checkout_cone) > + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) > return 0; > > if (!istate->repo) > @@ -187,6 +186,30 @@ int convert_to_sparse(struct index_state *istate, int flags) > if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) > return 0; > > + return 1; > +} > + > +int convert_to_sparse(struct index_state *istate, int flags) > +{ > + int verify = flags & SPARSE_INDEX_VERIFY_ALLOWED; > + > + /* > + * If validating with strict checks against whether the sparse index is > + * allowed, we want to check `can_convert_to_sparse` *before* exiting > + * early due to an already sparse or empty index. > + * > + * If not performing strict validation, the order is reversed to avoid > + * the more expensive checks in `can_convert_to_sparse` whenver possible. > + */ > + if (verify) { > + if (!can_convert_to_sparse(istate, flags)) > + return -1; > + else if (istate->sparse_index || !istate->cache_nr) > + return 0; > + } else if (istate->sparse_index || !istate->cache_nr || > + !can_convert_to_sparse(istate, flags)) > + return 0; > + > remove_fsmonitor(istate); > > trace2_region_enter("index", "convert_to_sparse", istate->repo); > @@ -313,6 +336,19 @@ void ensure_full_index(struct index_state *istate) > trace2_region_leave("index", "ensure_full_index", istate->repo); > } > > +void ensure_correct_sparsity(struct index_state *istate) > +{ > + /* > + * First check whether the index can be converted to sparse by attempting > + * to convert it with the SPARSE_INDEX_VERIFY_ALLOWED flag. If the > + * SPARSE_INDEX_VERIFY_ALLOWED checks indicate that the index cannot > + * be converted because repository settings and/or index properties > + * do not allow it, expand the index to full. > + */ The logic may be OK, but the need to give this long description is a sign that the meaning of the value returned from the function is not clear from the name of the function. > + if (convert_to_sparse(istate, SPARSE_INDEX_VERIFY_ALLOWED)) > + ensure_full_index(istate); It might make it more straight-forward to (1) drop the "if (verify)" part in convert_to_sparse(), which would mean that for all callers convert_to_sparse() will retain the same behaviour as before; (2) make a call to can_convert_to_sparse() here, and if that returns true, make a call to ensure_full_index(); this would behave identically to what this patch does when can_convert is false; and (3) correct the can_convert_to_sparse() function to not blow away the cache_tree unconditionally and recompute, so that calling it twice in a row will not be costly. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-21 22:20 ` Junio C Hamano @ 2021-10-27 17:21 ` Victoria Dye 0 siblings, 0 replies; 44+ messages in thread From: Victoria Dye @ 2021-10-27 17:21 UTC (permalink / raw) To: Junio C Hamano, Victoria Dye via GitGitGadget; +Cc: git, stolee Junio C Hamano wrote: > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> @@ -313,6 +336,19 @@ void ensure_full_index(struct index_state *istate) >> trace2_region_leave("index", "ensure_full_index", istate->repo); >> } >> >> +void ensure_correct_sparsity(struct index_state *istate) >> +{ >> + /* >> + * First check whether the index can be converted to sparse by attempting >> + * to convert it with the SPARSE_INDEX_VERIFY_ALLOWED flag. If the >> + * SPARSE_INDEX_VERIFY_ALLOWED checks indicate that the index cannot >> + * be converted because repository settings and/or index properties >> + * do not allow it, expand the index to full. >> + */ > > The logic may be OK, but the need to give this long description is a > sign that the meaning of the value returned from the function is not > clear from the name of the function. > >> + if (convert_to_sparse(istate, SPARSE_INDEX_VERIFY_ALLOWED)) >> + ensure_full_index(istate); > > It might make it more straight-forward to > > (1) drop the "if (verify)" part in convert_to_sparse(), which would > mean that for all callers convert_to_sparse() will retain the > same behaviour as before; > > (2) make a call to can_convert_to_sparse() here, and if that > returns true, make a call to ensure_full_index(); this would > behave identically to what this patch does when can_convert is > false; and > > (3) correct the can_convert_to_sparse() function to not blow away > the cache_tree unconditionally and recompute, so that calling > it twice in a row will not be costly. > Agreed, this approach is a lot easier to follow. I went a bit overboard trying to handle *all* possible cases where `convert_to_sparse` returns before converting, but the primary goal of this series is updating the behavior when config settings change. I'll include the suggested restructuring in my next re-roll. Thanks! ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 3/3] sparse-index: update do_read_index to ensure correct sparsity 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget @ 2021-10-21 20:48 ` Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 3 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-21 20:48 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> If `command_requires_full_index` is false, ensure correct in-core index sparsity on read by calling `ensure_correct_sparsity`. This change is meant to update the how the index is read in a command after sparse index-related repository settings are modified. Previously, for example, if `index.sparse` were changed from `true` to `false`, the in-core index on the next command would be sparse. The index would only be expanded to full when it was next written to disk. By adding a call to `ensure_correct_sparsity`, the in-core index now matches the sparsity dictated by the relevant repository settings as soon as it is read into memory, rather than when it is later written to disk. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- read-cache.c | 8 ++++++ t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/read-cache.c b/read-cache.c index a78b88a41bf..b3772ba70a1 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) if (!istate->repo) istate->repo = the_repository; + + /* + * If the command explicitly requires a full index, force it + * to be full. Otherwise, correct the sparsity based on repository + * settings and other properties of the index (if necessary). + */ prepare_repo_settings(istate->repo); if (istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else + ensure_correct_sparsity(istate); return istate->cache_nr; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca91c6a67f8..59accde1fa3 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' test_region index ensure_full_index trace2.txt ' +test_expect_success 'index.sparse disabled inline uses full index' ' + init_repos && + + # When index.sparse is disabled inline with `git status`, the + # index is expanded at the beginning of the execution then never + # converted back to sparse. It is then written to disk as a full index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c index.sparse=false status && + ! test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + # Since index.sparse is set to true at a repo level, the index + # is converted from full to sparse when read, then never expanded + # over the course of `git status`. It is written to disk as a sparse + # index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt && + + # Now that the index has been written to disk as sparse, it is not + # converted to sparse (or expanded to full) when read by `git status`. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + ! test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt +' + ensure_not_expanded () { rm -f trace2.txt && echo >>sparse-index/untracked.txt && -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget ` (2 preceding siblings ...) 2021-10-21 20:48 ` [PATCH v2 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget @ 2021-10-27 18:20 ` Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget ` (4 more replies) 3 siblings, 5 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-27 18:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye This series updates do_read_index to use the index.sparse config setting when determining whether the index should be expanded or collapsed. If the command & repo allow use of a sparse index, index.sparse is enabled, and a full index is read from disk, the index is collapsed before returning to the caller. Conversely, if index.sparse is disabled but the index read from disk is sparse, the index is expanded before returning. This allows index.sparse to control use of the sparse index in addition to its existing control over how the index is written to disk. It also introduces the ability to enable/disable the sparse index on a command-by-command basis (e.g., allowing a user to troubleshoot a sparse-aware command with '-c index.sparse=false' [1]). While testing this change, a bug was found in 'test-tool read-cache' in which config settings for the repository were not initialized before preparing the repo settings. This caused index.sparse to always be 'false' when using the test helper in a cone-mode sparse checkout, breaking tests in t1091 and t1092. The issue is fixed by moving prepare_repo_settings after config setup. Changes since V1 ================ * Add ensure_correct_sparsity function that ensures the index is sparse if the repository settings (including index.sparse) allow it, otherwise ensuring the index is expanded to full. * Restructure condition in do_read_index to, rather than check specifically for the index.sparse config setting, call ensure_correct_sparsity unconditionally when command_requires_full_index is false. Changes since V2 ================ * Rename can_convert_to_sparse to is_sparse_index_allowed to more accurately reflect what the function returns. * Remove index-iterating checks from is_sparse_index_allowed, leaving only inexpensive checks on config settings & sparse checkout patterns. Checks are still part of convert_to_sparse to ensure it behaves exactly as it did before this series. * Restructure ensure_correct_sparsity for better readability. * Fix test_env variable scope. [1] https://lore.kernel.org/git/cc60c6e7-ecef-ae22-8ec7-ab290ff2b830@gmail.com/ Thanks! -Victoria Victoria Dye (3): test-read-cache.c: prepare_repo_settings after config init sparse-index: add ensure_correct_sparsity function sparse-index: update do_read_index to ensure correct sparsity read-cache.c | 8 ++++++ sparse-index.c | 33 +++++++++++++++++++++--- sparse-index.h | 1 + t/helper/test-read-cache.c | 5 ++-- t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++ 5 files changed, 72 insertions(+), 6 deletions(-) base-commit: f443b226ca681d87a3a31e245a70e6bc2769123c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1059%2Fvdye%2Fsparse%2Findex-sparse-config-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1059/vdye/sparse/index-sparse-config-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1059 Range-diff vs v2: 1: 6974ce7e7f5 = 1: 6974ce7e7f5 test-read-cache.c: prepare_repo_settings after config init 2: 0b6e6633bb2 ! 2: 9d6511db072 sparse-index: add ensure_correct_sparsity function @@ Metadata ## Commit message ## sparse-index: add ensure_correct_sparsity function - The purpose of the `ensure_correct_sparsity` function is to provide a means - of aligning the in-core index with the sparsity required by the repository - settings and other properties of the index. The function will first attempt - to convert the index to sparse, now with a "SPARSE_INDEX_VERIFY_ALLOWED" - flag that forces the function to return a nonzero value if repository - settings do not allow use of a sparse index. If a nonzero value is returned, - the index is expanded to full with `ensure_full_index`. + The `ensure_correct_sparsity` function is intended to provide a means of + aligning the in-core index with the sparsity required by the repository + settings and other properties of the index. The function first checks + whether a sparse index is allowed (per repository & sparse checkout pattern + settings). If the sparse index may be used, the index is converted to + sparse; otherwise, it is explicitly expanded with `ensure_full_index`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> @@ sparse-index.c: static int index_has_unmerged_entries(struct index_state *istate } -int convert_to_sparse(struct index_state *istate, int flags) -+static int can_convert_to_sparse(struct index_state *istate, int flags) ++static int is_sparse_index_allowed(struct index_state *istate, int flags) { - int test_env; +- int test_env; - if (istate->sparse_index || !istate->cache_nr || - !core_apply_sparse_checkout || !core_sparse_checkout_cone) + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; if (!istate->repo) + istate->repo = the_repository; + + if (!(flags & SPARSE_INDEX_MEMORY_ONLY)) { ++ int test_env; ++ + /* + * The sparse index is not (yet) integrated with a split index. + */ @@ sparse-index.c: int convert_to_sparse(struct index_state *istate, int flags) - if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) + if (!istate->sparse_checkout_patterns->use_cone_patterns) return 0; + return 1; @@ sparse-index.c: int convert_to_sparse(struct index_state *istate, int flags) + +int convert_to_sparse(struct index_state *istate, int flags) +{ -+ int verify = flags & SPARSE_INDEX_VERIFY_ALLOWED; -+ + /* -+ * If validating with strict checks against whether the sparse index is -+ * allowed, we want to check `can_convert_to_sparse` *before* exiting -+ * early due to an already sparse or empty index. -+ * -+ * If not performing strict validation, the order is reversed to avoid -+ * the more expensive checks in `can_convert_to_sparse` whenver possible. ++ * If the index is already sparse, empty, or otherwise ++ * cannot be converted to sparse, do not convert. + */ -+ if (verify) { -+ if (!can_convert_to_sparse(istate, flags)) -+ return -1; -+ else if (istate->sparse_index || !istate->cache_nr) -+ return 0; -+ } else if (istate->sparse_index || !istate->cache_nr || -+ !can_convert_to_sparse(istate, flags)) ++ if (istate->sparse_index || !istate->cache_nr || ++ !is_sparse_index_allowed(istate, flags)) + return 0; + - remove_fsmonitor(istate); - - trace2_region_enter("index", "convert_to_sparse", istate->repo); + /* + * NEEDSWORK: If we have unmerged entries, then stay full. + * Unmerged entries prevent the cache-tree extension from working. @@ sparse-index.c: void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } @@ sparse-index.c: void ensure_full_index(struct index_state *istate) +void ensure_correct_sparsity(struct index_state *istate) +{ + /* -+ * First check whether the index can be converted to sparse by attempting -+ * to convert it with the SPARSE_INDEX_VERIFY_ALLOWED flag. If the -+ * SPARSE_INDEX_VERIFY_ALLOWED checks indicate that the index cannot -+ * be converted because repository settings and/or index properties -+ * do not allow it, expand the index to full. ++ * If the index can be sparse, make it sparse. Otherwise, ++ * ensure the index is full. + */ -+ if (convert_to_sparse(istate, SPARSE_INDEX_VERIFY_ALLOWED)) ++ if (is_sparse_index_allowed(istate, 0)) ++ convert_to_sparse(istate, 0); ++ else + ensure_full_index(istate); +} + @@ sparse-index.c: void ensure_full_index(struct index_state *istate) ## sparse-index.h ## @@ - struct index_state; #define SPARSE_INDEX_MEMORY_ONLY (1 << 0) -+#define SPARSE_INDEX_VERIFY_ALLOWED (1 << 1) int convert_to_sparse(struct index_state *istate, int flags); +void ensure_correct_sparsity(struct index_state *istate); 3: 437cf398256 = 3: d6c3c694e1e sparse-index: update do_read_index to ensure correct sparsity -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget @ 2021-10-27 18:20 ` Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget ` (3 subsequent siblings) 4 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-27 18:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Move `prepare_repo_settings` after the git directory has been set up in `test-read-cache.c`. The git directory settings must be initialized to properly assign repo settings using the worktree-level git config. Signed-off-by: Victoria Dye <vdye@github.com> --- t/helper/test-read-cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index b52c174acc7..0d9f08931a1 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv) int table = 0, expand = 0; initialize_the_repository(); - prepare_repo_settings(r); - r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv) setup_git_directory(); git_config(git_default_config, NULL); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; + for (i = 0; i < cnt; i++) { repo_read_index(r); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget @ 2021-10-27 18:20 ` Victoria Dye via GitGitGadget 2021-10-27 20:07 ` Derrick Stolee 2021-10-27 18:20 ` [PATCH v3 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget ` (2 subsequent siblings) 4 siblings, 1 reply; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-27 18:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> The `ensure_correct_sparsity` function is intended to provide a means of aligning the in-core index with the sparsity required by the repository settings and other properties of the index. The function first checks whether a sparse index is allowed (per repository & sparse checkout pattern settings). If the sparse index may be used, the index is converted to sparse; otherwise, it is explicitly expanded with `ensure_full_index`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- sparse-index.c | 33 +++++++++++++++++++++++++++++---- sparse-index.h | 1 + 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 7b7ff79e044..bc3ee358c63 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -122,17 +122,17 @@ static int index_has_unmerged_entries(struct index_state *istate) return 0; } -int convert_to_sparse(struct index_state *istate, int flags) +static int is_sparse_index_allowed(struct index_state *istate, int flags) { - int test_env; - if (istate->sparse_index || !istate->cache_nr || - !core_apply_sparse_checkout || !core_sparse_checkout_cone) + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; if (!istate->repo) istate->repo = the_repository; if (!(flags & SPARSE_INDEX_MEMORY_ONLY)) { + int test_env; + /* * The sparse index is not (yet) integrated with a split index. */ @@ -168,6 +168,19 @@ int convert_to_sparse(struct index_state *istate, int flags) if (!istate->sparse_checkout_patterns->use_cone_patterns) return 0; + return 1; +} + +int convert_to_sparse(struct index_state *istate, int flags) +{ + /* + * If the index is already sparse, empty, or otherwise + * cannot be converted to sparse, do not convert. + */ + if (istate->sparse_index || !istate->cache_nr || + !is_sparse_index_allowed(istate, flags)) + return 0; + /* * NEEDSWORK: If we have unmerged entries, then stay full. * Unmerged entries prevent the cache-tree extension from working. @@ -313,6 +326,18 @@ void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } +void ensure_correct_sparsity(struct index_state *istate) +{ + /* + * If the index can be sparse, make it sparse. Otherwise, + * ensure the index is full. + */ + if (is_sparse_index_allowed(istate, 0)) + convert_to_sparse(istate, 0); + else + ensure_full_index(istate); +} + /* * This static global helps avoid infinite recursion between * expand_to_path() and index_file_exists(). diff --git a/sparse-index.h b/sparse-index.h index 9f3d7bc7faf..656bd835b25 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -4,6 +4,7 @@ struct index_state; #define SPARSE_INDEX_MEMORY_ONLY (1 << 0) int convert_to_sparse(struct index_state *istate, int flags); +void ensure_correct_sparsity(struct index_state *istate); /* * Some places in the codebase expect to search for a specific path. -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-27 18:20 ` [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget @ 2021-10-27 20:07 ` Derrick Stolee 2021-10-27 21:32 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-10-27 20:07 UTC (permalink / raw) To: Victoria Dye via GitGitGadget, git; +Cc: gitster, Victoria Dye On 10/27/2021 2:20 PM, Victoria Dye via GitGitGadget wrote: > From: Victoria Dye <vdye@github.com> > +static int is_sparse_index_allowed(struct index_state *istate, int flags) I agree this name is better. > { > - int test_env; > - if (istate->sparse_index || !istate->cache_nr || > - !core_apply_sparse_checkout || !core_sparse_checkout_cone) > + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) > return 0; > > if (!istate->repo) > istate->repo = the_repository; > > if (!(flags & SPARSE_INDEX_MEMORY_ONLY)) { > + int test_env; > + > /* > * The sparse index is not (yet) integrated with a split index. > */ Nice that most of the implementation comes over without showing up in the diff. > if (!istate->sparse_checkout_patterns->use_cone_patterns) > return 0; > > + return 1; > +} > + > +int convert_to_sparse(struct index_state *istate, int flags) > +{ > + /* > + * If the index is already sparse, empty, or otherwise > + * cannot be converted to sparse, do not convert. > + */ > + if (istate->sparse_index || !istate->cache_nr || > + !is_sparse_index_allowed(istate, flags)) > + return 0; > +void ensure_correct_sparsity(struct index_state *istate) > +{ > + /* > + * If the index can be sparse, make it sparse. Otherwise, > + * ensure the index is full. > + */ > + if (is_sparse_index_allowed(istate, 0)) > + convert_to_sparse(istate, 0); > + else > + ensure_full_index(istate); > +} These two methods become very simple. Excellent. -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-27 20:07 ` Derrick Stolee @ 2021-10-27 21:32 ` Junio C Hamano 2021-10-28 1:24 ` Derrick Stolee 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-10-27 21:32 UTC (permalink / raw) To: Derrick Stolee; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye Derrick Stolee <stolee@gmail.com> writes: >> +int convert_to_sparse(struct index_state *istate, int flags) >> +{ >> + /* >> + * If the index is already sparse, empty, or otherwise >> + * cannot be converted to sparse, do not convert. >> + */ >> + if (istate->sparse_index || !istate->cache_nr || >> + !is_sparse_index_allowed(istate, flags)) >> + return 0; Shouldn't we also at least do this? Blindly blowing away the entire cache-tree and rebuilding it from scratch may be hiding a latent bug somewhere else, but is never supposed to be needed, and is a huge waste of computational resources. I say "at least" here, because a cache tree that is partially valid should be safely salvageable---at least that was the intention back when I designed the subsystem. sparse-index.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git c/sparse-index.c w/sparse-index.c index bc3ee358c6..a95c3386f3 100644 --- c/sparse-index.c +++ w/sparse-index.c @@ -188,17 +188,19 @@ int convert_to_sparse(struct index_state *istate, int flags) if (index_has_unmerged_entries(istate)) return 0; - /* Clear and recompute the cache-tree */ - cache_tree_free(&istate->cache_tree); - /* - * Silently return if there is a problem with the cache tree update, - * which might just be due to a conflict state in some entry. - * - * This might create new tree objects, so be sure to use - * WRITE_TREE_MISSING_OK. - */ - if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) - return 0; + if (!cache_tree_fully_valid(&istate->cache_tree)) { + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + /* + * Silently return if there is a problem with the cache tree update, + * which might just be due to a conflict state in some entry. + * + * This might create new tree objects, so be sure to use + * WRITE_TREE_MISSING_OK. + */ + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) + return 0; + } remove_fsmonitor(istate); ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-27 21:32 ` Junio C Hamano @ 2021-10-28 1:24 ` Derrick Stolee 2021-10-29 13:43 ` Victoria Dye 0 siblings, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-10-28 1:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye On 10/27/2021 5:32 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>> +int convert_to_sparse(struct index_state *istate, int flags) >>> +{ >>> + /* >>> + * If the index is already sparse, empty, or otherwise >>> + * cannot be converted to sparse, do not convert. >>> + */ >>> + if (istate->sparse_index || !istate->cache_nr || >>> + !is_sparse_index_allowed(istate, flags)) >>> + return 0; > > Shouldn't we also at least do this? Blindly blowing away the entire > cache-tree and rebuilding it from scratch may be hiding a latent bug > somewhere else, but is never supposed to be needed, and is a huge > waste of computational resources. > > I say "at least" here, because a cache tree that is partially valid > should be safely salvageable---at least that was the intention back > when I designed the subsystem. I think you are right, what you propose below. It certainly seems like it would work, and even speed up the conversion from full to sparse. I think I erred on the side of extreme caution and used a hope that converting to sparse would be rare. > sparse-index.c | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git c/sparse-index.c w/sparse-index.c > index bc3ee358c6..a95c3386f3 100644 > --- c/sparse-index.c > +++ w/sparse-index.c > @@ -188,17 +188,19 @@ int convert_to_sparse(struct index_state *istate, int flags) > if (index_has_unmerged_entries(istate)) > return 0; > > - /* Clear and recompute the cache-tree */ > - cache_tree_free(&istate->cache_tree); > - /* > - * Silently return if there is a problem with the cache tree update, > - * which might just be due to a conflict state in some entry. > - * > - * This might create new tree objects, so be sure to use > - * WRITE_TREE_MISSING_OK. > - */ > - if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) > - return 0; > + if (!cache_tree_fully_valid(&istate->cache_tree)) { > + /* Clear and recompute the cache-tree */ > + cache_tree_free(&istate->cache_tree); > + /* > + * Silently return if there is a problem with the cache tree update, > + * which might just be due to a conflict state in some entry. > + * > + * This might create new tree objects, so be sure to use > + * WRITE_TREE_MISSING_OK. > + */ > + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) > + return 0; > + } I think at this point we have enough tests that check the sparse index and its different conversion points that the test suite might catch if this is a bad idea. Note that this is only a change of behavior if the cache-tree is valid, which I expect to be the case most of the time in the tests. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function 2021-10-28 1:24 ` Derrick Stolee @ 2021-10-29 13:43 ` Victoria Dye 0 siblings, 0 replies; 44+ messages in thread From: Victoria Dye @ 2021-10-29 13:43 UTC (permalink / raw) To: Derrick Stolee, Junio C Hamano; +Cc: Victoria Dye via GitGitGadget, git Derrick Stolee wrote: > On 10/27/2021 5:32 PM, Junio C Hamano wrote: >> Derrick Stolee <stolee@gmail.com> writes: >> >>>> +int convert_to_sparse(struct index_state *istate, int flags) >>>> +{ >>>> + /* >>>> + * If the index is already sparse, empty, or otherwise >>>> + * cannot be converted to sparse, do not convert. >>>> + */ >>>> + if (istate->sparse_index || !istate->cache_nr || >>>> + !is_sparse_index_allowed(istate, flags)) >>>> + return 0; >> >> Shouldn't we also at least do this? Blindly blowing away the entire >> cache-tree and rebuilding it from scratch may be hiding a latent bug >> somewhere else, but is never supposed to be needed, and is a huge >> waste of computational resources. >> >> I say "at least" here, because a cache tree that is partially valid >> should be safely salvageable---at least that was the intention back >> when I designed the subsystem. > > I think you are right, what you propose below. It certainly seems > like it would work, and even speed up the conversion from full to > sparse. I think I erred on the side of extreme caution and used > a hope that converting to sparse would be rare. > >> sparse-index.c | 24 +++++++++++++----------- >> 1 file changed, 13 insertions(+), 11 deletions(-) >> >> diff --git c/sparse-index.c w/sparse-index.c >> index bc3ee358c6..a95c3386f3 100644 >> --- c/sparse-index.c >> +++ w/sparse-index.c >> @@ -188,17 +188,19 @@ int convert_to_sparse(struct index_state *istate, int flags) >> if (index_has_unmerged_entries(istate)) >> return 0; >> >> - /* Clear and recompute the cache-tree */ >> - cache_tree_free(&istate->cache_tree); >> - /* >> - * Silently return if there is a problem with the cache tree update, >> - * which might just be due to a conflict state in some entry. >> - * >> - * This might create new tree objects, so be sure to use >> - * WRITE_TREE_MISSING_OK. >> - */ >> - if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) >> - return 0; >> + if (!cache_tree_fully_valid(&istate->cache_tree)) { >> + /* Clear and recompute the cache-tree */ >> + cache_tree_free(&istate->cache_tree); >> + /* >> + * Silently return if there is a problem with the cache tree update, >> + * which might just be due to a conflict state in some entry. >> + * >> + * This might create new tree objects, so be sure to use >> + * WRITE_TREE_MISSING_OK. >> + */ >> + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) >> + return 0; >> + } > > I think at this point we have enough tests that check the sparse index > and its different conversion points that the test suite might catch if > this is a bad idea. Note that this is only a change of behavior if the > cache-tree is valid, which I expect to be the case most of the time in > the tests. > > Thanks, > -Stolee > This change doesn't appear to introduce any test failures or other unwanted behavior, so I don't see a reason not to include it. I'll add it in a re-roll - thanks! ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 3/3] sparse-index: update do_read_index to ensure correct sparsity 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget @ 2021-10-27 18:20 ` Victoria Dye via GitGitGadget 2021-10-27 20:08 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Derrick Stolee 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget 4 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-27 18:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> If `command_requires_full_index` is false, ensure correct in-core index sparsity on read by calling `ensure_correct_sparsity`. This change is meant to update the how the index is read in a command after sparse index-related repository settings are modified. Previously, for example, if `index.sparse` were changed from `true` to `false`, the in-core index on the next command would be sparse. The index would only be expanded to full when it was next written to disk. By adding a call to `ensure_correct_sparsity`, the in-core index now matches the sparsity dictated by the relevant repository settings as soon as it is read into memory, rather than when it is later written to disk. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- read-cache.c | 8 ++++++ t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/read-cache.c b/read-cache.c index a78b88a41bf..b3772ba70a1 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) if (!istate->repo) istate->repo = the_repository; + + /* + * If the command explicitly requires a full index, force it + * to be full. Otherwise, correct the sparsity based on repository + * settings and other properties of the index (if necessary). + */ prepare_repo_settings(istate->repo); if (istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else + ensure_correct_sparsity(istate); return istate->cache_nr; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca91c6a67f8..59accde1fa3 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' test_region index ensure_full_index trace2.txt ' +test_expect_success 'index.sparse disabled inline uses full index' ' + init_repos && + + # When index.sparse is disabled inline with `git status`, the + # index is expanded at the beginning of the execution then never + # converted back to sparse. It is then written to disk as a full index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c index.sparse=false status && + ! test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + # Since index.sparse is set to true at a repo level, the index + # is converted from full to sparse when read, then never expanded + # over the course of `git status`. It is written to disk as a sparse + # index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt && + + # Now that the index has been written to disk as sparse, it is not + # converted to sparse (or expanded to full) when read by `git status`. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + ! test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt +' + ensure_not_expanded () { rm -f trace2.txt && echo >>sparse-index/untracked.txt && -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget ` (2 preceding siblings ...) 2021-10-27 18:20 ` [PATCH v3 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget @ 2021-10-27 20:08 ` Derrick Stolee 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget 4 siblings, 0 replies; 44+ messages in thread From: Derrick Stolee @ 2021-10-27 20:08 UTC (permalink / raw) To: Victoria Dye via GitGitGadget, git; +Cc: gitster, Victoria Dye On 10/27/2021 2:20 PM, Victoria Dye via GitGitGadget wrote: > Changes since V2 > ================ > > * Rename can_convert_to_sparse to is_sparse_index_allowed to more > accurately reflect what the function returns. > * Remove index-iterating checks from is_sparse_index_allowed, leaving only > inexpensive checks on config settings & sparse checkout patterns. Checks > are still part of convert_to_sparse to ensure it behaves exactly as it > did before this series. > * Restructure ensure_correct_sparsity for better readability. > * Fix test_env variable scope. I took a full read through this version, taking special look at the changes since v2. I think this version is good to go! Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget ` (3 preceding siblings ...) 2021-10-27 20:08 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Derrick Stolee @ 2021-10-29 13:51 ` Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget ` (5 more replies) 4 siblings, 6 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-29 13:51 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye This series updates do_read_index to use the index.sparse config setting when determining whether the index should be expanded or collapsed. If the command & repo allow use of a sparse index, index.sparse is enabled, and a full index is read from disk, the index is collapsed before returning to the caller. Conversely, if index.sparse is disabled but the index read from disk is sparse, the index is expanded before returning. This allows index.sparse to control use of the sparse index in addition to its existing control over how the index is written to disk. It also introduces the ability to enable/disable the sparse index on a command-by-command basis (e.g., allowing a user to troubleshoot a sparse-aware command with '-c index.sparse=false' [1]). While testing this change, a bug was found in 'test-tool read-cache' in which config settings for the repository were not initialized before preparing the repo settings. This caused index.sparse to always be 'false' when using the test helper in a cone-mode sparse checkout, breaking tests in t1091 and t1092. The issue is fixed by moving prepare_repo_settings after config setup. Changes since V1 ================ * Add ensure_correct_sparsity function that ensures the index is sparse if the repository settings (including index.sparse) allow it, otherwise ensuring the index is expanded to full. * Restructure condition in do_read_index to, rather than check specifically for the index.sparse config setting, call ensure_correct_sparsity unconditionally when command_requires_full_index is false. Changes since V2 ================ * Rename can_convert_to_sparse to is_sparse_index_allowed to more accurately reflect what the function returns. * Remove index-iterating checks from is_sparse_index_allowed, leaving only inexpensive checks on config settings & sparse checkout patterns. Checks are still part of convert_to_sparse to ensure it behaves exactly as it did before this series. * Restructure ensure_correct_sparsity for better readability. * Fix test_env variable scope. Changes since V3 ================ * Add a new patch to avoid unnecessary cache tree free/recreation when possible in convert_to_sparse. [1] https://lore.kernel.org/git/cc60c6e7-ecef-ae22-8ec7-ab290ff2b830@gmail.com/ Thanks! -Victoria Victoria Dye (4): test-read-cache.c: prepare_repo_settings after config init sparse-index: avoid unnecessary cache tree clearing sparse-index: add ensure_correct_sparsity function sparse-index: update do_read_index to ensure correct sparsity read-cache.c | 8 ++++ sparse-index.c | 58 ++++++++++++++++++------ sparse-index.h | 1 + t/helper/test-read-cache.c | 5 +- t/t1092-sparse-checkout-compatibility.sh | 31 +++++++++++++ 5 files changed, 86 insertions(+), 17 deletions(-) base-commit: f443b226ca681d87a3a31e245a70e6bc2769123c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1059%2Fvdye%2Fsparse%2Findex-sparse-config-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1059/vdye/sparse/index-sparse-config-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1059 Range-diff vs v3: 1: 6974ce7e7f5 = 1: 6974ce7e7f5 test-read-cache.c: prepare_repo_settings after config init -: ----------- > 2: 91351ac4bde sparse-index: avoid unnecessary cache tree clearing 2: 9d6511db072 = 3: d2133ca1724 sparse-index: add ensure_correct_sparsity function 3: d6c3c694e1e = 4: b67cd9d07f8 sparse-index: update do_read_index to ensure correct sparsity -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget @ 2021-10-29 13:51 ` Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget ` (4 subsequent siblings) 5 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-29 13:51 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Move `prepare_repo_settings` after the git directory has been set up in `test-read-cache.c`. The git directory settings must be initialized to properly assign repo settings using the worktree-level git config. Signed-off-by: Victoria Dye <vdye@github.com> --- t/helper/test-read-cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index b52c174acc7..0d9f08931a1 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv) int table = 0, expand = 0; initialize_the_repository(); - prepare_repo_settings(r); - r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv) setup_git_directory(); git_config(git_default_config, NULL); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; + for (i = 0; i < cnt; i++) { repo_read_index(r); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget @ 2021-10-29 13:51 ` Victoria Dye via GitGitGadget 2021-10-29 18:45 ` Junio C Hamano 2021-10-29 13:51 ` [PATCH v4 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget ` (3 subsequent siblings) 5 siblings, 1 reply; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-29 13:51 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> When converting a full index to sparse, clear and recreate the cache tree only if the cache tree is not fully valid. The convert_to_sparse operation should exit silently if a cache tree update cannot be successfully completed (e.g., due to a conflicted entry state). However, because this failure scenario only occurs when at least a portion of the cache tree is invalid, we can save ourselves the cost of clearing and recreating the cache tree by skipping the check when the cache tree is fully valid. Helped-by: Derrick Stolee <dstolee@microsoft.com> Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- sparse-index.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 7b7ff79e044..85613cd8a3a 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -175,17 +175,20 @@ int convert_to_sparse(struct index_state *istate, int flags) if (index_has_unmerged_entries(istate)) return 0; - /* Clear and recompute the cache-tree */ - cache_tree_free(&istate->cache_tree); - /* - * Silently return if there is a problem with the cache tree update, - * which might just be due to a conflict state in some entry. - * - * This might create new tree objects, so be sure to use - * WRITE_TREE_MISSING_OK. - */ - if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) - return 0; + if (!cache_tree_fully_valid(istate->cache_tree)) { + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + + /* + * Silently return if there is a problem with the cache tree update, + * which might just be due to a conflict state in some entry. + * + * This might create new tree objects, so be sure to use + * WRITE_TREE_MISSING_OK. + */ + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) + return 0; + } remove_fsmonitor(istate); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing 2021-10-29 13:51 ` [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget @ 2021-10-29 18:45 ` Junio C Hamano 2021-10-29 19:00 ` Derrick Stolee 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-10-29 18:45 UTC (permalink / raw) To: Victoria Dye via GitGitGadget; +Cc: git, stolee, Victoria Dye "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Victoria Dye <vdye@github.com> > > When converting a full index to sparse, clear and recreate the cache tree > only if the cache tree is not fully valid. The convert_to_sparse operation > should exit silently if a cache tree update cannot be successfully completed > (e.g., due to a conflicted entry state). However, because this failure > scenario only occurs when at least a portion of the cache tree is invalid, > we can save ourselves the cost of clearing and recreating the cache tree by > skipping the check when the cache tree is fully valid. I see in cache-tree.c::update_one() this snippet of code. /* * If the first entry of this region is a sparse directory * entry corresponding exactly to 'base', then this cache_tree * struct is a "leaf" in the data structure, pointing to the * tree OID specified in the entry. */ if (entries > 0) { const struct cache_entry *ce = cache[0]; if (S_ISSPARSEDIR(ce->ce_mode) && ce->ce_namelen == baselen && !strncmp(ce->name, base, baselen)) { it->entry_count = 1; oidcpy(&it->oid, &ce->oid); return 1; } } Sorry for not noticing it earlier, but does this mean that the content of a cache-tree changes shape when sparse-ness of the index changes? Is a cache-tree that knows about all of the subdirectories, even the ones that are descendants of a directory that is represented as a tree-ish entry in the main index array, still valid in a sparse index? If not, then I do not think of a quick and sure way to ensure that the cache-tree is valid when the sparse-ness changes. The earlier suggestion was based on my assumption that even when the main index array becomes sparse, the cache tree is still populated and valid, so that after writing a tree and writing an on-disk index, and then reading the on-disk index back (possibly in another process), would not have to incur the recomputation cost of the full tree when the reading codepath needs to flip the sparseness. But the above code snippet makes me worried a lot. A cache-tree that used to be valid when the corresponding in-core index array was not sparse will become invalid immediately when we decide to make it sparse, right? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing 2021-10-29 18:45 ` Junio C Hamano @ 2021-10-29 19:00 ` Derrick Stolee 2021-10-29 20:01 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-10-29 19:00 UTC (permalink / raw) To: Junio C Hamano, Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye On 10/29/2021 2:45 PM, Junio C Hamano wrote: > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Victoria Dye <vdye@github.com> >> >> When converting a full index to sparse, clear and recreate the cache tree >> only if the cache tree is not fully valid. The convert_to_sparse operation >> should exit silently if a cache tree update cannot be successfully completed >> (e.g., due to a conflicted entry state). However, because this failure >> scenario only occurs when at least a portion of the cache tree is invalid, >> we can save ourselves the cost of clearing and recreating the cache tree by >> skipping the check when the cache tree is fully valid. > > I see in cache-tree.c::update_one() this snippet of code. > > /* > * If the first entry of this region is a sparse directory > * entry corresponding exactly to 'base', then this cache_tree > * struct is a "leaf" in the data structure, pointing to the > * tree OID specified in the entry. > */ > if (entries > 0) { > const struct cache_entry *ce = cache[0]; > > if (S_ISSPARSEDIR(ce->ce_mode) && > ce->ce_namelen == baselen && > !strncmp(ce->name, base, baselen)) { > it->entry_count = 1; > oidcpy(&it->oid, &ce->oid); > return 1; > } > } > > Sorry for not noticing it earlier, but does this mean that the > content of a cache-tree changes shape when sparse-ness of the index > changes? Is a cache-tree that knows about all of the > subdirectories, even the ones that are descendants of a directory > that is represented as a tree-ish entry in the main index array, > still valid in a sparse index? > > If not, then I do not think of a quick and sure way to ensure that > the cache-tree is valid when the sparse-ness changes. > > The earlier suggestion was based on my assumption that even when the > main index array becomes sparse, the cache tree is still populated > and valid, so that after writing a tree and writing an on-disk index, > and then reading the on-disk index back (possibly in another process), > would not have to incur the recomputation cost of the full tree when > the reading codepath needs to flip the sparseness. > > But the above code snippet makes me worried a lot. A cache-tree > that used to be valid when the corresponding in-core index array was > not sparse will become invalid immediately when we decide to make it > sparse, right? I think I would argue that the cache-tree is valid in this case. Each node represents the tree that would be created from that region of the index if we called 'git commit'. We can reconstruct the contained subtrees by walking trees if necessary, but that isn't necessary at this point. I am writing a blog post about the sparse index, and I go into detail about how the cache tree is modified. I use the derrickstolee/trace-flamechart repo [1] as an example, so I will use it as a concrete discussion of what is happening here. Here is the cache-tree without sparse-index: [ root (left: 0, size: 16) ] | +---- [ bin/ (left: 2, size: 1) ] | +---- [ examples/ (left: 3, size: 11) ] | +---- [ fetch/ (left: 3, size: 8) ] | +---- [ maintenance/ (left: 11, size: 3) ] Then, I run git sparse-checkout init --cone --sparse-index git sparse-checkout set bin the cache-tree looks like this: [ root (left: 0, size: 6) ] | +---- [ bin/ (left: 2, size: 1) ] | +---- [ examples/ (left: 3, size: 1) ] The tree OIDs at each of "root", "bin" and "examples" match the trees for the nodes in the non-sparse cache tree. By contrast, including any subtrees of "examples" would cause a few things to happen: 1. The size of the cache-tree would increase (significantly in large repos). 2. The contained cache-tree nodes would have size _zero_, since there are no cache entries defined within their directories. I think in this pruned form, the cache-tree is valid and serves all of the roles it was designed for. We can quickly compute the root tree in 'git commit'. We can navigate the cache-tree in traverse_by_cache_tree() to accurately describe the trees in the index. These things are tested in t1092 and the p2000 performance test. Does this satisfy your concern, or is there a larger expectation of the cache-tree extension that I am not taking into account? Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing 2021-10-29 19:00 ` Derrick Stolee @ 2021-10-29 20:01 ` Junio C Hamano 0 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2021-10-29 20:01 UTC (permalink / raw) To: Derrick Stolee; +Cc: Victoria Dye via GitGitGadget, git, Victoria Dye Derrick Stolee <stolee@gmail.com> writes: > I think in this pruned form, the cache-tree is valid and serves all of > the roles it was designed for. We can quickly compute the root tree in > 'git commit'. We can navigate the cache-tree in traverse_by_cache_tree() > to accurately describe the trees in the index. These things are tested > in t1092 and the p2000 performance test. OK, then I was worried too much. Thanks for clarifying. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 3/4] sparse-index: add ensure_correct_sparsity function 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget @ 2021-10-29 13:51 ` Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget ` (2 subsequent siblings) 5 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-29 13:51 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> The `ensure_correct_sparsity` function is intended to provide a means of aligning the in-core index with the sparsity required by the repository settings and other properties of the index. The function first checks whether a sparse index is allowed (per repository & sparse checkout pattern settings). If the sparse index may be used, the index is converted to sparse; otherwise, it is explicitly expanded with `ensure_full_index`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- sparse-index.c | 33 +++++++++++++++++++++++++++++---- sparse-index.h | 1 + 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 85613cd8a3a..a1d505d50e9 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -122,17 +122,17 @@ static int index_has_unmerged_entries(struct index_state *istate) return 0; } -int convert_to_sparse(struct index_state *istate, int flags) +static int is_sparse_index_allowed(struct index_state *istate, int flags) { - int test_env; - if (istate->sparse_index || !istate->cache_nr || - !core_apply_sparse_checkout || !core_sparse_checkout_cone) + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; if (!istate->repo) istate->repo = the_repository; if (!(flags & SPARSE_INDEX_MEMORY_ONLY)) { + int test_env; + /* * The sparse index is not (yet) integrated with a split index. */ @@ -168,6 +168,19 @@ int convert_to_sparse(struct index_state *istate, int flags) if (!istate->sparse_checkout_patterns->use_cone_patterns) return 0; + return 1; +} + +int convert_to_sparse(struct index_state *istate, int flags) +{ + /* + * If the index is already sparse, empty, or otherwise + * cannot be converted to sparse, do not convert. + */ + if (istate->sparse_index || !istate->cache_nr || + !is_sparse_index_allowed(istate, flags)) + return 0; + /* * NEEDSWORK: If we have unmerged entries, then stay full. * Unmerged entries prevent the cache-tree extension from working. @@ -316,6 +329,18 @@ void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } +void ensure_correct_sparsity(struct index_state *istate) +{ + /* + * If the index can be sparse, make it sparse. Otherwise, + * ensure the index is full. + */ + if (is_sparse_index_allowed(istate, 0)) + convert_to_sparse(istate, 0); + else + ensure_full_index(istate); +} + /* * This static global helps avoid infinite recursion between * expand_to_path() and index_file_exists(). diff --git a/sparse-index.h b/sparse-index.h index 9f3d7bc7faf..656bd835b25 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -4,6 +4,7 @@ struct index_state; #define SPARSE_INDEX_MEMORY_ONLY (1 << 0) int convert_to_sparse(struct index_state *istate, int flags); +void ensure_correct_sparsity(struct index_state *istate); /* * Some places in the codebase expect to search for a specific path. -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget ` (2 preceding siblings ...) 2021-10-29 13:51 ` [PATCH v4 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget @ 2021-10-29 13:51 ` Victoria Dye via GitGitGadget 2021-11-22 17:36 ` Elijah Newren 2021-11-22 17:40 ` [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget 5 siblings, 1 reply; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-10-29 13:51 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> If `command_requires_full_index` is false, ensure correct in-core index sparsity on read by calling `ensure_correct_sparsity`. This change is meant to update the how the index is read in a command after sparse index-related repository settings are modified. Previously, for example, if `index.sparse` were changed from `true` to `false`, the in-core index on the next command would be sparse. The index would only be expanded to full when it was next written to disk. By adding a call to `ensure_correct_sparsity`, the in-core index now matches the sparsity dictated by the relevant repository settings as soon as it is read into memory, rather than when it is later written to disk. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- read-cache.c | 8 ++++++ t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/read-cache.c b/read-cache.c index a78b88a41bf..b3772ba70a1 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) if (!istate->repo) istate->repo = the_repository; + + /* + * If the command explicitly requires a full index, force it + * to be full. Otherwise, correct the sparsity based on repository + * settings and other properties of the index (if necessary). + */ prepare_repo_settings(istate->repo); if (istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else + ensure_correct_sparsity(istate); return istate->cache_nr; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca91c6a67f8..59accde1fa3 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' test_region index ensure_full_index trace2.txt ' +test_expect_success 'index.sparse disabled inline uses full index' ' + init_repos && + + # When index.sparse is disabled inline with `git status`, the + # index is expanded at the beginning of the execution then never + # converted back to sparse. It is then written to disk as a full index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c index.sparse=false status && + ! test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + # Since index.sparse is set to true at a repo level, the index + # is converted from full to sparse when read, then never expanded + # over the course of `git status`. It is written to disk as a sparse + # index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt && + + # Now that the index has been written to disk as sparse, it is not + # converted to sparse (or expanded to full) when read by `git status`. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + ! test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt +' + ensure_not_expanded () { rm -f trace2.txt && echo >>sparse-index/untracked.txt && -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity 2021-10-29 13:51 ` [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget @ 2021-11-22 17:36 ` Elijah Newren 2021-11-22 18:59 ` Victoria Dye 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-11-22 17:36 UTC (permalink / raw) To: Victoria Dye via GitGitGadget Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Victoria Dye On Fri, Oct 29, 2021 at 6:56 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > If `command_requires_full_index` is false, ensure correct in-core index > sparsity on read by calling `ensure_correct_sparsity`. This change is meant > to update the how the index is read in a command after sparse index-related s/update the how/update how/ ? > repository settings are modified. Previously, for example, if `index.sparse` > were changed from `true` to `false`, the in-core index on the next command > would be sparse. The index would only be expanded to full when it was next > written to disk. > > By adding a call to `ensure_correct_sparsity`, the in-core index now matches > the sparsity dictated by the relevant repository settings as soon as it is > read into memory, rather than when it is later written to disk. I split up reading this series across different days, and when I got here, my first reaction was "Okay, but why would you want that? Sounds like extra work for no gain." I went and looked up the cover letter and saw you mentioned that this "introduces the ability to enable/disable the sparse index on a command-by-command basis (e.g., allowing a user to troubleshoot a sparse-aware command with '-c index.sparse=false' [1]). That seems like a good reason to me, and sounds like it belongs in this commit message. But it sounds like you had other reasons in mind. If so, could you share them; I'm having difficulty understanding how this would make a difference other than in the troubleshooting scenario. > > Helped-by: Junio C Hamano <gitster@pobox.com> > Co-authored-by: Derrick Stolee <dstolee@microsoft.com> > Signed-off-by: Victoria Dye <vdye@github.com> > --- > read-cache.c | 8 ++++++ > t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ > 2 files changed, 39 insertions(+) > > diff --git a/read-cache.c b/read-cache.c > index a78b88a41bf..b3772ba70a1 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) > > if (!istate->repo) > istate->repo = the_repository; > + > + /* > + * If the command explicitly requires a full index, force it > + * to be full. Otherwise, correct the sparsity based on repository > + * settings and other properties of the index (if necessary). > + */ > prepare_repo_settings(istate->repo); > if (istate->repo->settings.command_requires_full_index) > ensure_full_index(istate); > + else > + ensure_correct_sparsity(istate); > > return istate->cache_nr; > > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index ca91c6a67f8..59accde1fa3 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' > test_region index ensure_full_index trace2.txt > ' > > +test_expect_success 'index.sparse disabled inline uses full index' ' > + init_repos && > + > + # When index.sparse is disabled inline with `git status`, the > + # index is expanded at the beginning of the execution then never > + # converted back to sparse. It is then written to disk as a full index. > + rm -f trace2.txt && > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > + git -C sparse-index -c index.sparse=false status && > + ! test_region index convert_to_sparse trace2.txt && > + test_region index ensure_full_index trace2.txt && > + > + # Since index.sparse is set to true at a repo level, the index > + # is converted from full to sparse when read, then never expanded > + # over the course of `git status`. It is written to disk as a sparse > + # index. > + rm -f trace2.txt && > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > + git -C sparse-index status && > + test_region index convert_to_sparse trace2.txt && > + ! test_region index ensure_full_index trace2.txt && > + > + # Now that the index has been written to disk as sparse, it is not > + # converted to sparse (or expanded to full) when read by `git status`. > + rm -f trace2.txt && > + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > + git -C sparse-index status && > + ! test_region index convert_to_sparse trace2.txt && > + ! test_region index ensure_full_index trace2.txt > +' > + > ensure_not_expanded () { > rm -f trace2.txt && > echo >>sparse-index/untracked.txt && > -- > gitgitgadget The rest of the patch looks fine. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity 2021-11-22 17:36 ` Elijah Newren @ 2021-11-22 18:59 ` Victoria Dye 0 siblings, 0 replies; 44+ messages in thread From: Victoria Dye @ 2021-11-22 18:59 UTC (permalink / raw) To: Elijah Newren, Victoria Dye via GitGitGadget Cc: Git Mailing List, Derrick Stolee, Junio C Hamano Elijah Newren wrote: > On Fri, Oct 29, 2021 at 6:56 AM Victoria Dye via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Victoria Dye <vdye@github.com> >> >> If `command_requires_full_index` is false, ensure correct in-core index >> sparsity on read by calling `ensure_correct_sparsity`. This change is meant >> to update the how the index is read in a command after sparse index-related > > s/update the how/update how/ ? > This was probably the result of mixing up "update the way" and "update how". I'll go with the latter. >> repository settings are modified. Previously, for example, if `index.sparse` >> were changed from `true` to `false`, the in-core index on the next command >> would be sparse. The index would only be expanded to full when it was next >> written to disk. >> >> By adding a call to `ensure_correct_sparsity`, the in-core index now matches >> the sparsity dictated by the relevant repository settings as soon as it is >> read into memory, rather than when it is later written to disk. > > I split up reading this series across different days, and when I got > here, my first reaction was "Okay, but why would you want that? > Sounds like extra work for no gain." I went and looked up the cover > letter and saw you mentioned that this "introduces the ability to > enable/disable the sparse index on a command-by-command basis (e.g., > allowing a user to troubleshoot a sparse-aware command with '-c > index.sparse=false' [1]). That seems like a good reason to me, and > sounds like it belongs in this commit message. But it sounds like you > had other reasons in mind. If so, could you share them; I'm having > difficulty understanding how this would make a difference other than > in the troubleshooting scenario. > The troubleshooting scenario was what inspired this change in the first place, but it applies any time someone changes repository settings outside of `git sparse-checkout init`. One relevant scenario I can imagine is if a user enables `index.sparse, commands would not use the sparse index until a command that *writes* the index is executed: $ git config index.sparse true $ git diff # read full index, full in-core, no write $ git add . # read full index, full in-core, write sparse $ git diff # read sparse index, sparse in-core, no write Another scenario might be enabling `index.sparse`, then running a command that turns out to be surprisingly slow because it's operating on a full index in-core (i.e., much slower than the convert_to_sparse + command with sparse index would be). These are definitely edge cases - they rely on manual config changes, and they're only really noticeable 1) if the config change is immediately followed by index read-only commands and 2) if the first index-writing command after the config change is slow. That said, on the off chance that a user (or future developer) encounter one of these scenarios, I think it'll be helpful to have `index.sparse` take effect "immediately" after the config change. I'll include these (and the troubleshooting) examples in the updated commit message. >> >> Helped-by: Junio C Hamano <gitster@pobox.com> >> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> >> Signed-off-by: Victoria Dye <vdye@github.com> >> --- >> read-cache.c | 8 ++++++ >> t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ >> 2 files changed, 39 insertions(+) >> >> diff --git a/read-cache.c b/read-cache.c >> index a78b88a41bf..b3772ba70a1 100644 >> --- a/read-cache.c >> +++ b/read-cache.c >> @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) >> >> if (!istate->repo) >> istate->repo = the_repository; >> + >> + /* >> + * If the command explicitly requires a full index, force it >> + * to be full. Otherwise, correct the sparsity based on repository >> + * settings and other properties of the index (if necessary). >> + */ >> prepare_repo_settings(istate->repo); >> if (istate->repo->settings.command_requires_full_index) >> ensure_full_index(istate); >> + else >> + ensure_correct_sparsity(istate); >> >> return istate->cache_nr; >> >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index ca91c6a67f8..59accde1fa3 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' >> test_region index ensure_full_index trace2.txt >> ' >> >> +test_expect_success 'index.sparse disabled inline uses full index' ' >> + init_repos && >> + >> + # When index.sparse is disabled inline with `git status`, the >> + # index is expanded at the beginning of the execution then never >> + # converted back to sparse. It is then written to disk as a full index. >> + rm -f trace2.txt && >> + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ >> + git -C sparse-index -c index.sparse=false status && >> + ! test_region index convert_to_sparse trace2.txt && >> + test_region index ensure_full_index trace2.txt && >> + >> + # Since index.sparse is set to true at a repo level, the index >> + # is converted from full to sparse when read, then never expanded >> + # over the course of `git status`. It is written to disk as a sparse >> + # index. >> + rm -f trace2.txt && >> + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ >> + git -C sparse-index status && >> + test_region index convert_to_sparse trace2.txt && >> + ! test_region index ensure_full_index trace2.txt && >> + >> + # Now that the index has been written to disk as sparse, it is not >> + # converted to sparse (or expanded to full) when read by `git status`. >> + rm -f trace2.txt && >> + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ >> + git -C sparse-index status && >> + ! test_region index convert_to_sparse trace2.txt && >> + ! test_region index ensure_full_index trace2.txt >> +' >> + >> ensure_not_expanded () { >> rm -f trace2.txt && >> echo >>sparse-index/untracked.txt && >> -- >> gitgitgadget > > The rest of the patch looks fine. > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget ` (3 preceding siblings ...) 2021-10-29 13:51 ` [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget @ 2021-11-22 17:40 ` Elijah Newren 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget 5 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-11-22 17:40 UTC (permalink / raw) To: Victoria Dye via GitGitGadget Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Victoria Dye On Fri, Oct 29, 2021 at 6:53 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > This series updates do_read_index to use the index.sparse config setting > when determining whether the index should be expanded or collapsed. If the > command & repo allow use of a sparse index, index.sparse is enabled, and a > full index is read from disk, the index is collapsed before returning to the > caller. Conversely, if index.sparse is disabled but the index read from disk > is sparse, the index is expanded before returning. This allows index.sparse > to control use of the sparse index in addition to its existing control over > how the index is written to disk. It also introduces the ability to > enable/disable the sparse index on a command-by-command basis (e.g., > allowing a user to troubleshoot a sparse-aware command with '-c > index.sparse=false' [1]). > > While testing this change, a bug was found in 'test-tool read-cache' in > which config settings for the repository were not initialized before > preparing the repo settings. This caused index.sparse to always be 'false' > when using the test helper in a cone-mode sparse checkout, breaking tests in > t1091 and t1092. The issue is fixed by moving prepare_repo_settings after > config setup. > > > Changes since V1 > ================ > > * Add ensure_correct_sparsity function that ensures the index is sparse if > the repository settings (including index.sparse) allow it, otherwise > ensuring the index is expanded to full. > * Restructure condition in do_read_index to, rather than check specifically > for the index.sparse config setting, call ensure_correct_sparsity > unconditionally when command_requires_full_index is false. > > > Changes since V2 > ================ > > * Rename can_convert_to_sparse to is_sparse_index_allowed to more > accurately reflect what the function returns. > * Remove index-iterating checks from is_sparse_index_allowed, leaving only > inexpensive checks on config settings & sparse checkout patterns. Checks > are still part of convert_to_sparse to ensure it behaves exactly as it > did before this series. > * Restructure ensure_correct_sparsity for better readability. > * Fix test_env variable scope. > > > Changes since V3 > ================ > > * Add a new patch to avoid unnecessary cache tree free/recreation when > possible in convert_to_sparse. I read over this series. I only spotted two minor things, both with the commit message of the final patch. Reviewed-by: Elijah Newren <newren@gmail.com> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget ` (4 preceding siblings ...) 2021-11-22 17:40 ` [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren @ 2021-11-23 0:20 ` Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget ` (4 more replies) 5 siblings, 5 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-11-23 0:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Elijah Newren, Victoria Dye This series updates do_read_index to use the index.sparse config setting when determining whether the index should be expanded or collapsed. If the command & repo allow use of a sparse index, index.sparse is enabled, and a full index is read from disk, the index is collapsed before returning to the caller. Conversely, if index.sparse is disabled but the index read from disk is sparse, the index is expanded before returning. This allows index.sparse to control use of the sparse index in addition to its existing control over how the index is written to disk. It also introduces the ability to enable/disable the sparse index on a command-by-command basis (e.g., allowing a user to troubleshoot a sparse-aware command with '-c index.sparse=false' [1]). While testing this change, a bug was found in 'test-tool read-cache' in which config settings for the repository were not initialized before preparing the repo settings. This caused index.sparse to always be 'false' when using the test helper in a cone-mode sparse checkout, breaking tests in t1091 and t1092. The issue is fixed by moving prepare_repo_settings after config setup. Changes since V1 ================ * Add ensure_correct_sparsity function that ensures the index is sparse if the repository settings (including index.sparse) allow it, otherwise ensuring the index is expanded to full. * Restructure condition in do_read_index to, rather than check specifically for the index.sparse config setting, call ensure_correct_sparsity unconditionally when command_requires_full_index is false. Changes since V2 ================ * Rename can_convert_to_sparse to is_sparse_index_allowed to more accurately reflect what the function returns. * Remove index-iterating checks from is_sparse_index_allowed, leaving only inexpensive checks on config settings & sparse checkout patterns. Checks are still part of convert_to_sparse to ensure it behaves exactly as it did before this series. * Restructure ensure_correct_sparsity for better readability. * Fix test_env variable scope. Changes since V3 ================ * Add a new patch to avoid unnecessary cache tree free/recreation when possible in convert_to_sparse. Changes since V4 ================ * Updated patch 4/4 commit message to better explain practical reasons for making this change. [1] https://lore.kernel.org/git/cc60c6e7-ecef-ae22-8ec7-ab290ff2b830@gmail.com/ Thanks! -Victoria Victoria Dye (4): test-read-cache.c: prepare_repo_settings after config init sparse-index: avoid unnecessary cache tree clearing sparse-index: add ensure_correct_sparsity function sparse-index: update do_read_index to ensure correct sparsity read-cache.c | 8 ++++ sparse-index.c | 58 ++++++++++++++++++------ sparse-index.h | 1 + t/helper/test-read-cache.c | 5 +- t/t1092-sparse-checkout-compatibility.sh | 31 +++++++++++++ 5 files changed, 86 insertions(+), 17 deletions(-) base-commit: f443b226ca681d87a3a31e245a70e6bc2769123c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1059%2Fvdye%2Fsparse%2Findex-sparse-config-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1059/vdye/sparse/index-sparse-config-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1059 Range-diff vs v4: 1: 6974ce7e7f5 = 1: 6974ce7e7f5 test-read-cache.c: prepare_repo_settings after config init 2: 91351ac4bde = 2: 91351ac4bde sparse-index: avoid unnecessary cache tree clearing 3: d2133ca1724 = 3: d2133ca1724 sparse-index: add ensure_correct_sparsity function 4: b67cd9d07f8 ! 4: 3237a519c80 sparse-index: update do_read_index to ensure correct sparsity @@ Metadata ## Commit message ## sparse-index: update do_read_index to ensure correct sparsity - If `command_requires_full_index` is false, ensure correct in-core index - sparsity on read by calling `ensure_correct_sparsity`. This change is meant - to update the how the index is read in a command after sparse index-related - repository settings are modified. Previously, for example, if `index.sparse` - were changed from `true` to `false`, the in-core index on the next command - would be sparse. The index would only be expanded to full when it was next - written to disk. + Unless `command_requires_full_index` forces index expansion, ensure in-core + index sparsity matches config settings on read by calling + `ensure_correct_sparsity`. This makes the behavior of the in-core index more + consistent between different methods of updating sparsity: manually changing + the `index.sparse` config setting vs. executing + `git sparse-checkout --[no-]sparse-index init` - By adding a call to `ensure_correct_sparsity`, the in-core index now matches - the sparsity dictated by the relevant repository settings as soon as it is - read into memory, rather than when it is later written to disk. + Although index sparsity is normally updated with `git sparse-checkout init`, + ensuring correct sparsity after a manual `index.sparse` change has some + practical benefits: + + 1. It allows for command-by-command sparsity toggling with + `-c index.sparse=<true|false>`, e.g. when troubleshooting issues with the + sparse index. + 2. It prevents users from experiencing abnormal slowness after setting + `index.sparse` to `true` due to use of a full index in all commands until + the on-disk index is updated. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget @ 2021-11-23 0:20 ` Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget ` (3 subsequent siblings) 4 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-11-23 0:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Elijah Newren, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Move `prepare_repo_settings` after the git directory has been set up in `test-read-cache.c`. The git directory settings must be initialized to properly assign repo settings using the worktree-level git config. Signed-off-by: Victoria Dye <vdye@github.com> --- t/helper/test-read-cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index b52c174acc7..0d9f08931a1 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv) int table = 0, expand = 0; initialize_the_repository(); - prepare_repo_settings(r); - r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv) setup_git_directory(); git_config(git_default_config, NULL); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; + for (i = 0; i < cnt; i++) { repo_read_index(r); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v5 2/4] sparse-index: avoid unnecessary cache tree clearing 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget @ 2021-11-23 0:20 ` Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget ` (2 subsequent siblings) 4 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-11-23 0:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Elijah Newren, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> When converting a full index to sparse, clear and recreate the cache tree only if the cache tree is not fully valid. The convert_to_sparse operation should exit silently if a cache tree update cannot be successfully completed (e.g., due to a conflicted entry state). However, because this failure scenario only occurs when at least a portion of the cache tree is invalid, we can save ourselves the cost of clearing and recreating the cache tree by skipping the check when the cache tree is fully valid. Helped-by: Derrick Stolee <dstolee@microsoft.com> Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- sparse-index.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 7b7ff79e044..85613cd8a3a 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -175,17 +175,20 @@ int convert_to_sparse(struct index_state *istate, int flags) if (index_has_unmerged_entries(istate)) return 0; - /* Clear and recompute the cache-tree */ - cache_tree_free(&istate->cache_tree); - /* - * Silently return if there is a problem with the cache tree update, - * which might just be due to a conflict state in some entry. - * - * This might create new tree objects, so be sure to use - * WRITE_TREE_MISSING_OK. - */ - if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) - return 0; + if (!cache_tree_fully_valid(istate->cache_tree)) { + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + + /* + * Silently return if there is a problem with the cache tree update, + * which might just be due to a conflict state in some entry. + * + * This might create new tree objects, so be sure to use + * WRITE_TREE_MISSING_OK. + */ + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) + return 0; + } remove_fsmonitor(istate); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v5 3/4] sparse-index: add ensure_correct_sparsity function 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget @ 2021-11-23 0:20 ` Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget 2021-11-23 17:21 ` [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren 4 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-11-23 0:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Elijah Newren, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> The `ensure_correct_sparsity` function is intended to provide a means of aligning the in-core index with the sparsity required by the repository settings and other properties of the index. The function first checks whether a sparse index is allowed (per repository & sparse checkout pattern settings). If the sparse index may be used, the index is converted to sparse; otherwise, it is explicitly expanded with `ensure_full_index`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- sparse-index.c | 33 +++++++++++++++++++++++++++++---- sparse-index.h | 1 + 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 85613cd8a3a..a1d505d50e9 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -122,17 +122,17 @@ static int index_has_unmerged_entries(struct index_state *istate) return 0; } -int convert_to_sparse(struct index_state *istate, int flags) +static int is_sparse_index_allowed(struct index_state *istate, int flags) { - int test_env; - if (istate->sparse_index || !istate->cache_nr || - !core_apply_sparse_checkout || !core_sparse_checkout_cone) + if (!core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; if (!istate->repo) istate->repo = the_repository; if (!(flags & SPARSE_INDEX_MEMORY_ONLY)) { + int test_env; + /* * The sparse index is not (yet) integrated with a split index. */ @@ -168,6 +168,19 @@ int convert_to_sparse(struct index_state *istate, int flags) if (!istate->sparse_checkout_patterns->use_cone_patterns) return 0; + return 1; +} + +int convert_to_sparse(struct index_state *istate, int flags) +{ + /* + * If the index is already sparse, empty, or otherwise + * cannot be converted to sparse, do not convert. + */ + if (istate->sparse_index || !istate->cache_nr || + !is_sparse_index_allowed(istate, flags)) + return 0; + /* * NEEDSWORK: If we have unmerged entries, then stay full. * Unmerged entries prevent the cache-tree extension from working. @@ -316,6 +329,18 @@ void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } +void ensure_correct_sparsity(struct index_state *istate) +{ + /* + * If the index can be sparse, make it sparse. Otherwise, + * ensure the index is full. + */ + if (is_sparse_index_allowed(istate, 0)) + convert_to_sparse(istate, 0); + else + ensure_full_index(istate); +} + /* * This static global helps avoid infinite recursion between * expand_to_path() and index_file_exists(). diff --git a/sparse-index.h b/sparse-index.h index 9f3d7bc7faf..656bd835b25 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -4,6 +4,7 @@ struct index_state; #define SPARSE_INDEX_MEMORY_ONLY (1 << 0) int convert_to_sparse(struct index_state *istate, int flags); +void ensure_correct_sparsity(struct index_state *istate); /* * Some places in the codebase expect to search for a specific path. -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v5 4/4] sparse-index: update do_read_index to ensure correct sparsity 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget ` (2 preceding siblings ...) 2021-11-23 0:20 ` [PATCH v5 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget @ 2021-11-23 0:20 ` Victoria Dye via GitGitGadget 2021-11-23 17:21 ` [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren 4 siblings, 0 replies; 44+ messages in thread From: Victoria Dye via GitGitGadget @ 2021-11-23 0:20 UTC (permalink / raw) To: git; +Cc: stolee, gitster, Elijah Newren, Victoria Dye, Victoria Dye From: Victoria Dye <vdye@github.com> Unless `command_requires_full_index` forces index expansion, ensure in-core index sparsity matches config settings on read by calling `ensure_correct_sparsity`. This makes the behavior of the in-core index more consistent between different methods of updating sparsity: manually changing the `index.sparse` config setting vs. executing `git sparse-checkout --[no-]sparse-index init` Although index sparsity is normally updated with `git sparse-checkout init`, ensuring correct sparsity after a manual `index.sparse` change has some practical benefits: 1. It allows for command-by-command sparsity toggling with `-c index.sparse=<true|false>`, e.g. when troubleshooting issues with the sparse index. 2. It prevents users from experiencing abnormal slowness after setting `index.sparse` to `true` due to use of a full index in all commands until the on-disk index is updated. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- read-cache.c | 8 ++++++ t/t1092-sparse-checkout-compatibility.sh | 31 ++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/read-cache.c b/read-cache.c index a78b88a41bf..b3772ba70a1 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2337,9 +2337,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) if (!istate->repo) istate->repo = the_repository; + + /* + * If the command explicitly requires a full index, force it + * to be full. Otherwise, correct the sparsity based on repository + * settings and other properties of the index (if necessary). + */ prepare_repo_settings(istate->repo); if (istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else + ensure_correct_sparsity(istate); return istate->cache_nr; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca91c6a67f8..59accde1fa3 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -694,6 +694,37 @@ test_expect_success 'sparse-index is expanded and converted back' ' test_region index ensure_full_index trace2.txt ' +test_expect_success 'index.sparse disabled inline uses full index' ' + init_repos && + + # When index.sparse is disabled inline with `git status`, the + # index is expanded at the beginning of the execution then never + # converted back to sparse. It is then written to disk as a full index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c index.sparse=false status && + ! test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + # Since index.sparse is set to true at a repo level, the index + # is converted from full to sparse when read, then never expanded + # over the course of `git status`. It is written to disk as a sparse + # index. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt && + + # Now that the index has been written to disk as sparse, it is not + # converted to sparse (or expanded to full) when read by `git status`. + rm -f trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index status && + ! test_region index convert_to_sparse trace2.txt && + ! test_region index ensure_full_index trace2.txt +' + ensure_not_expanded () { rm -f trace2.txt && echo >>sparse-index/untracked.txt && -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget ` (3 preceding siblings ...) 2021-11-23 0:20 ` [PATCH v5 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget @ 2021-11-23 17:21 ` Elijah Newren 4 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-11-23 17:21 UTC (permalink / raw) To: Victoria Dye via GitGitGadget Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Victoria Dye On Mon, Nov 22, 2021 at 4:20 PM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > This series updates do_read_index to use the index.sparse config setting > when determining whether the index should be expanded or collapsed. If the > command & repo allow use of a sparse index, index.sparse is enabled, and a > full index is read from disk, the index is collapsed before returning to the > caller. Conversely, if index.sparse is disabled but the index read from disk > is sparse, the index is expanded before returning. This allows index.sparse > to control use of the sparse index in addition to its existing control over > how the index is written to disk. It also introduces the ability to > enable/disable the sparse index on a command-by-command basis (e.g., > allowing a user to troubleshoot a sparse-aware command with '-c > index.sparse=false' [1]). > > While testing this change, a bug was found in 'test-tool read-cache' in > which config settings for the repository were not initialized before > preparing the repo settings. This caused index.sparse to always be 'false' > when using the test helper in a cone-mode sparse checkout, breaking tests in > t1091 and t1092. The issue is fixed by moving prepare_repo_settings after > config setup. > > > Changes since V1 > ================ > > * Add ensure_correct_sparsity function that ensures the index is sparse if > the repository settings (including index.sparse) allow it, otherwise > ensuring the index is expanded to full. > * Restructure condition in do_read_index to, rather than check specifically > for the index.sparse config setting, call ensure_correct_sparsity > unconditionally when command_requires_full_index is false. > > > Changes since V2 > ================ > > * Rename can_convert_to_sparse to is_sparse_index_allowed to more > accurately reflect what the function returns. > * Remove index-iterating checks from is_sparse_index_allowed, leaving only > inexpensive checks on config settings & sparse checkout patterns. Checks > are still part of convert_to_sparse to ensure it behaves exactly as it > did before this series. > * Restructure ensure_correct_sparsity for better readability. > * Fix test_env variable scope. > > > Changes since V3 > ================ > > * Add a new patch to avoid unnecessary cache tree free/recreation when > possible in convert_to_sparse. > > > Changes since V4 > ================ > > * Updated patch 4/4 commit message to better explain practical reasons for > making this change. Thanks. This version looks good to me: Reviewed-by: Elijah Newren <newren@gmail.com> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2021-11-23 17:21 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-15 19:54 [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-15 19:54 ` [PATCH 2/2] sparse-index: update index read to consider index.sparse config Victoria Dye via GitGitGadget 2021-10-15 21:53 ` Junio C Hamano 2021-10-17 1:21 ` Derrick Stolee 2021-10-17 5:58 ` Junio C Hamano 2021-10-17 19:33 ` Derrick Stolee 2021-10-18 1:15 ` Junio C Hamano 2021-10-18 13:25 ` Derrick Stolee 2021-10-18 14:14 ` Victoria Dye 2021-10-21 13:26 ` Derrick Stolee 2021-10-18 16:09 ` Junio C Hamano 2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-21 20:48 ` [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget 2021-10-21 22:20 ` Junio C Hamano 2021-10-27 17:21 ` Victoria Dye 2021-10-21 20:48 ` [PATCH v2 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-27 18:20 ` [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget 2021-10-27 20:07 ` Derrick Stolee 2021-10-27 21:32 ` Junio C Hamano 2021-10-28 1:24 ` Derrick Stolee 2021-10-29 13:43 ` Victoria Dye 2021-10-27 18:20 ` [PATCH v3 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget 2021-10-27 20:08 ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Derrick Stolee 2021-10-29 13:51 ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget 2021-10-29 18:45 ` Junio C Hamano 2021-10-29 19:00 ` Derrick Stolee 2021-10-29 20:01 ` Junio C Hamano 2021-10-29 13:51 ` [PATCH v4 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget 2021-10-29 13:51 ` [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget 2021-11-22 17:36 ` Elijah Newren 2021-11-22 18:59 ` Victoria Dye 2021-11-22 17:40 ` [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren 2021-11-23 0:20 ` [PATCH v5 " Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget 2021-11-23 0:20 ` [PATCH v5 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget 2021-11-23 17:21 ` [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).