git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Victoria Dye <vdye@github.com>, Junio C Hamano <gitster@pobox.com>
Cc: Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config
Date: Thu, 21 Oct 2021 09:26:19 -0400	[thread overview]
Message-ID: <002d7dd3-c41a-87a4-6fc0-ddf8497f4805@gmail.com> (raw)
In-Reply-To: <4656a934-5305-fbdf-76ca-17562fca62ef@github.com>

On 10/18/2021 10:14 AM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 10/17/2021 9:15 PM, Junio C Hamano wrote:
>>> OK.  I somehow got the impression that we convert in both ways from
>>> the patch text under discussion, specifically this part in
>>> do_read_index():
>>>
>>>> -	if (istate->repo->settings.command_requires_full_index)
>>>> +	if (!istate->repo->settings.sparse_index ||
>>>> +	    istate->repo->settings.command_requires_full_index)
>>>>  		ensure_full_index(istate);
>>>> +	else if (!istate->sparse_index)
>>>> +		convert_to_sparse(istate, 0);
>>>>  
>>>>  	return istate->cache_nr;
>>>
>>> We used to say "when we know we are running a command that is not
>>> sparse ready, expand it here" and nothing else.
>>>
>>> We now added a bit more condition for expansion, i.e. "when we are
>>> told that the repository does not want sparse index, or when the
>>> command is not sparse ready".
>>>
>>> But the patch goes one more step.  "If we didn't find a reason to
>>> expand to full, and if the in-core index we read is not sparse, then
>>> call convert_to_sparse() on it".  So if the repo settings tells us
>>> that the repository wants a sparse index, and the index we read was
>>> not sparse, what does convert_to_sparse() without MEM_ONLY flag bit
>>> do?  Nothing special?
>>
>> You are absolutely right. I've been talking about what I _thought_
>> the code does (and should do) but I missed this 'else if' which is
>> in fact doing what you have been claiming the entire time. I should
>> have done a better job double-checking the code before doubling
>> down on my claims.
>>
>> I think the 'else if' should be removed, which would match my
>> expectations.
>>
> 
> By leaving that part out, wouldn't you only solve half of the "mismatch"
> between in-core index and repo setting? Earlier, you described your
> expectation as:
> 
>> * If index.sparse=false, then a sparse index will be converted to
>>   full upon read.
>>
>> * If index.sparse=true, then a full index will be converted to sparse
>>   on write.
> 
> Why should the direction of change to the setting value (false->true vs
> true->false) cause the index to convert at different times? Consider the
> scenario:
> 
> # In a cone-mode, sparse index-enabled sparse checkout repo
> $ git -c index.sparse=false status    # 1
> $ git status                          # 2
> $ git status                          # 3
> 
> Before this patch, the index has the following states per command:
> 
> (1) the index is sparse in-core, writes full on-disk
> (2) the index is full in-core, writes sparse on-disk
> (3) the index is sparse in-core, writes sparse on-disk
> 
> Here, I see two mismatches in my expectations: (1) operates on an in-core
> sparse index, despite `index.sparse=false`, and (2) operates on an in-core
> full index, despite `index.sparse=true`. 
> 
> What you're suggesting solves only the first mismatch. However, the second
> mismatch incurs the performance hit of a supposedly-sparse command actually
> operating on an in-core full index. It also creates an inconsistency between
> (2) and (3) in their use of the sparse index. What I'd like this patch to do
> is:
> 
> (1) the index is full in-core, writes full on-disk
> (2) the index is sparse in-core, writes sparse on-disk
> (3) the index is sparse in-core, writes sparse on-disk
> 
> Here, there are no more mismatches between in-core index usage and what is
> written to disk, and (2) and (3) use the same index sparsity.

I suppose that my perspective is that we always need to handle a full
index in-core, because it is a subset of the capabilities of a sparse
index. There is not much value in requiring the in-core index be sparse.

But also, I'm now less concerned about commands that convert from
full-to-sparse on read and expand back to full because of
command_requires_full_index. This should be a short-lived issue because
the index.sparse config is unlikely to be changing frequently, so once
the index is converted to sparse on write, we no longer need to do any
work to convert the in-core index to sparse on read.

The thing to keep in mind is that not every command that reads the index
also writes it. For example, the two 'git status' commands you list might
not write the index if there is no new information in the filesystem that
would trigger an index write.

In short: I've shifted my view and no longer oppose this conversion
immediately upon reading.

>>> I see many unconditional calls to convert_to_sparse(istate, 0) on
>>> the write code paths, where I instead would expect "if the repo
>>> wants sparse, call convert_to_sparse(), and if the repo does not
>>> want sparse, call ensure_full_index()", before actualy writing the
>>> entries out.  These also are setting traps to confuse readers.
>>>
>>> Perhaps we want a helper function "ensure_right_sparsity(istate)"
>>> that would be called where we have
>>>
>>> 	if (condition)
>>> 		convert_to_sparse();
>>> 	else
>>> 		ensure_full_index();
>>>
>>> or an unconditonal call to convert_to_sparse() to unconfuse readers?
>>
>> convert_to_sparse() has several checks that prevent the conversion
>> from happening, such as having a split index. In particular, it will
>> skip the conversion if the_repository->settings.sparse_index is false.
>> Thus, calling it unconditionally in the write code is correct.
>>
> 
> I may have introduced some confusion by redundantly checking
> `!istate->sparse_index` before converting to sparse (my goal was readability
> - showing the index does not need to be converted to sparse if it is already
> sparse - but I think it ended up doing the opposite). The condition could be
> removed entirely, thanks to an equivalent check inside `convert_to_sparse`:
> 
> -       if (istate->repo->settings.command_requires_full_index)
> +       if (!istate->repo->settings.sparse_index ||
> +           istate->repo->settings.command_requires_full_index)
>                 ensure_full_index(istate);
> +       else
> +               convert_to_sparse(istate, 0);
 
This is simpler.

Thanks,
-Stolee

  reply	other threads:[~2021-10-21 13:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-15 19:54 [PATCH 0/2] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget
2021-10-15 19:54 ` [PATCH 1/2] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-15 19:54 ` [PATCH 2/2] sparse-index: update index read to consider index.sparse config Victoria Dye via GitGitGadget
2021-10-15 21:53   ` Junio C Hamano
2021-10-17  1:21     ` Derrick Stolee
2021-10-17  5:58       ` Junio C Hamano
2021-10-17 19:33         ` Derrick Stolee
2021-10-18  1:15           ` Junio C Hamano
2021-10-18 13:25             ` Derrick Stolee
2021-10-18 14:14               ` Victoria Dye
2021-10-21 13:26                 ` Derrick Stolee [this message]
2021-10-18 16:09               ` Junio C Hamano
2021-10-21 20:48 ` [PATCH v2 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget
2021-10-21 20:48   ` [PATCH v2 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-21 20:48   ` [PATCH v2 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-10-21 22:20     ` Junio C Hamano
2021-10-27 17:21       ` Victoria Dye
2021-10-21 20:48   ` [PATCH v2 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-10-27 18:20   ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Victoria Dye via GitGitGadget
2021-10-27 18:20     ` [PATCH v3 1/3] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-27 18:20     ` [PATCH v3 2/3] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-10-27 20:07       ` Derrick Stolee
2021-10-27 21:32         ` Junio C Hamano
2021-10-28  1:24           ` Derrick Stolee
2021-10-29 13:43             ` Victoria Dye
2021-10-27 18:20     ` [PATCH v3 3/3] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-10-27 20:08     ` [PATCH v3 0/3] sparse-index: expand/collapse based on 'index.sparse' Derrick Stolee
2021-10-29 13:51     ` [PATCH v4 0/4] " Victoria Dye via GitGitGadget
2021-10-29 13:51       ` [PATCH v4 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-10-29 13:51       ` [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget
2021-10-29 18:45         ` Junio C Hamano
2021-10-29 19:00           ` Derrick Stolee
2021-10-29 20:01             ` Junio C Hamano
2021-10-29 13:51       ` [PATCH v4 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-10-29 13:51       ` [PATCH v4 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-11-22 17:36         ` Elijah Newren
2021-11-22 18:59           ` Victoria Dye
2021-11-22 17:40       ` [PATCH v4 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren
2021-11-23  0:20       ` [PATCH v5 " Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 1/4] test-read-cache.c: prepare_repo_settings after config init Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 2/4] sparse-index: avoid unnecessary cache tree clearing Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 3/4] sparse-index: add ensure_correct_sparsity function Victoria Dye via GitGitGadget
2021-11-23  0:20         ` [PATCH v5 4/4] sparse-index: update do_read_index to ensure correct sparsity Victoria Dye via GitGitGadget
2021-11-23 17:21         ` [PATCH v5 0/4] sparse-index: expand/collapse based on 'index.sparse' Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=002d7dd3-c41a-87a4-6fc0-ddf8497f4805@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).