git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Junio C Hamano <gitster@pobox.com>,
	Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Cc: vdye@github.com, git@vger.kernel.org
Subject: Re: [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse
Date: Thu, 8 Sep 2022 16:46:54 -0400	[thread overview]
Message-ID: <093827ae-41ef-5f7c-7829-647536ce1305@github.com> (raw)
In-Reply-To: <xmqqczc5rblr.fsf@gitster.g>

On 9/8/2022 1:59 PM, Junio C Hamano wrote:
> Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes:
> 
>> +
>> +	/*
>> +	 * NEEDSWORK: when reading a submodule, the sparsity settings in the
>> +	 * superproject are incorrectly forgotten or misused. For example:
>> +	 *
>> +	 * 1. "command_requires_full_index"
>> +	 * 	When this setting is turned on for `grep`, only the superproject
>> +	 *	knows it. All the submodules are read with their own configs
>> +	 *	and get prepare_repo_settings()'d. Therefore, these submodules
>> +	 *	"forget" the sparse-index feature switch. As a result, the index
>> +	 *	of these submodules are expanded unexpectedly.
> 
> Is this fundamental, or is it just this version of the patch is
> incomplete in that it still does not propagate the bit from
> the_repository->settings to submodule's settings?  Should a change
> to propagate the bit be included for this topic to be complete?
> 
> To put it another way, when grep with this version of the patch
> recurses into a submodule, does it work correctly even without
> flipping command_requires_full_index on in the "struct repository"
> instance for the submodule?  If so, then the NEEDSWORK above may be
> just performance issue.  If it behaves incorrectly, then it means
> we cannot safely make "git grep" aware of sparse index yet.  It is
> hard to tell which one you meant in the above.
> 
> I think the same question needs to be asked for other points
> (omitted from quoting) in this list.

I think this comment is misplaced. It should either be contained in
the commit message or placed closer to this diff hunk:

>> @@ -537,8 +561,20 @@ static int grep_cache(struct grep_opt *opt,
>>  
>>  		strbuf_setlen(&name, name_base_len);
>>  		strbuf_addstr(&name, ce->name);
>> +		if (S_ISSPARSEDIR(ce->ce_mode)) {
>> +			enum object_type type;
>> +			struct tree_desc tree;
>> +			void *data;
>> +			unsigned long size;
>> +
>> +			data = read_object_file(&ce->oid, &type, &size);
>> +			init_tree_desc(&tree, data, size);
>>  
>> -		if (S_ISREG(ce->ce_mode) &&
>> +			hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
>> +			strbuf_setlen(&name, name_base_len);
>> +			strbuf_addstr(&name, ce->name);
>> +			free(data);
>> +		} else if (S_ISREG(ce->ce_mode) &&

The conclusion we were trying to reach is that you (Junio) correctly
identified a bug in how we were calling grep_tree() in this hunk in
its v4 form.

HOWEVER: it "doesn't matter" because the sparse index doesn't work
at all within a submodule. Specifically, if a super-repo does not
enable sparse-checkout, but the submodule _does_, then we don't
know how Git will behave currently. His reasonings go on to explain
why the situation is fraught:

* command_requires_full_index is set in a builtin only for the
  top-level project, so when we traverse into a submodule, we don't
  re-check if the current builtin has integrated with sparse index
  and expand a sparse index to a full one.

* core_apply_sparse_checkout is a global not even associated with
  a repository struct. What happens when a super project is not
  sparse but a submodule is? Or vice-versa? I honestly don't know,
  and it will require testing to find out.

Shaoxuan's comment is attempting to list the reasons why submodules
do not currently work with sparse-index, and specifically that we
can add tests that _should_ exercise this code in a meaningful way,
but because of the current limitations of the codebase, the code
isn't actually exercised in that scenario.

In order to actually create a test that demonstrates how submodules
and sparse-checkout work with this logic, we need to do some serious
refactoring of the sparse-checkout logic to care about the repository
struct, along with some other concerns specifically around the sparse
index. This doesn't seem appropriate for the GSoC timeline or even for
just this topic.

Victoria and I have noted this issue down and will try to find time
to investigate further, with a target of being able to actually
exercise this grep_tree() call within a sparse index in a submodule,
giving us full confidence that name_base_len is the correct value to
put in that parameter.

Thanks,
-Stolee


  reply	other threads:[~2022-09-08 20:47 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee [this message]
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=093827ae-41ef-5f7c-7829-647536ce1305@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).