git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: git@vger.kernel.org, gitster@pobox.com, jonathantanmy@google.com
Subject: Re: [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates
Date: Mon, 27 Sep 2021 19:30:56 +0200	[thread overview]
Message-ID: <87zgryylfx.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <296230071f2cabda213b2a5f6f435f9308718569.1632754555.git.matheus.bernardino@usp.br>


On Mon, Sep 27 2021, Matheus Tavares wrote:

> On Mon, Sep 27, 2021 at 9:09 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>>
>> On Mon, Aug 16 2021, Jonathan Tan wrote:
>>
>> > Record the repository whenever an OID grep source is created, and teach
>> > the worker threads to explicitly provide the repository when accessing
>> > objects.
>> > [...]
>> > diff --git a/grep.h b/grep.h
>> > index 480b3f5bba..128007db65 100644
>> > --- a/grep.h
>> > +++ b/grep.h
>> > @@ -120,7 +120,20 @@ struct grep_opt {
>> >       struct grep_pat *header_list;
>> >       struct grep_pat **header_tail;
>> >       struct grep_expr *pattern_expression;
>> > +
>> > +     /*
>> > +      * NEEDSWORK: See if we can remove this field, because the repository
>> > +      * should probably be per-source. That is, grep.c functions using this
>> > +      * field should probably start using "repo" in "struct grep_source"
>> > +      * instead.
>> > +      *
>> > +      * This is potentially the cause of at least one bug - "git grep"
>> > +      * ignoring the textconv attributes from submodules. See [1] for more
>> > +      * information.
>> > +      * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
>> > +      */
>> >       struct repository *repo;
>> > +
>>
>> I ran into this comment and read the linked E-Mail, and then the
>> downthread
>> https://lore.kernel.org/git/CAHd-oW6uG1fap-T4UF17bJmjoHAqWCDq9KbY+_8a3cEnnfATxg@mail.gmail.com/;
>>
>> Given Matheus's "I've somehow missed this guard and the..." there I'm
>> not quite sure what/if we should be doing here & what this comment is
>> recommending? I.e. do we still need to adjust the call chains as noted
>> in the E-Mail the comment links to, or not?
>
> I think we should still adjust the call chains, yes. The downthread
> message you mentioned is kind of a tangent about performance, where
> Junio helped me understand something I had previously missed in the
> code, regarding the persistence of the attributes stack.
>
> But the issue that started the thread was about a correctness problem:
> the superproject textconv attributes are being used on submodules'
> files when running `git grep` with `--recurse-submodules --textconv`.
> The three cases to consider are:
>
> - .gitattributes from the working tree
> - .gitattributes from the index
> - .git/info/attributes
>
> On all these cases, the superproject attributes are being used on the
> submodule. Additionally, if the superproject does not define any
> attribute, the submodule attributes are being ignored in all cases
> except by the first one (but that is only because the code sees the
> .gitattributes file on the submodule as if it were a "regular"
> subdirectory of the surperproject. So the submodule's .gitattribures
> takes higher precedence when evaluating the attributes for files in
> that directory).
>
> Another issue is that the textconv cache is always saved to (and read
> from) the superproject gitdir, even for submodules' files.
>
> Here are some test cases that demonstrate these issues:
>
> -- snipsnap --
> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index 3172f5b936..d01a3bc5d8 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -441,4 +441,104 @@ test_expect_success 'grep --recurse-submodules with --cached ignores worktree mo
>  	test_must_fail git grep --recurse-submodules --cached "A modified line in submodule" >actual 2>&1 &&
>  	test_must_be_empty actual
>  '
> +
> +test_expect_failure 'grep --textconv: superproject .gitattributes does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv: superproject .gitattributes (from index) does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >.gitattributes &&
> +	git add .gitattributes &&
> +	rm .gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv: superproject .git/info/attributes does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	super_attr="$(git rev-parse --path-format=relative --git-path info/attributes)" &&
> +	test_when_finished rm -f "$super_attr" &&
> +	echo "a diff=d2x" >"$super_attr" &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'grep --textconv corectly reads submodule .gitattributes' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >submodule/.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv corectly reads submodule .gitattributes (from index)' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >submodule/.gitattributes &&
> +	git -C submodule add .gitattributes &&
> +	rm submodule/.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv corectly reads submodule .git/info/attributes' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +
> +	# Workaround: we use --path-format=relative because the absolute path
> +	# contains whitespaces and that seems to confuse test_when_finished
> +	#
> +	submodule_attr="submodule/$(git -C submodule rev-parse --path-format=relative --git-path info/attributes)" &&
> +	test_when_finished rm -f "$submodule_attr" &&
> +	echo "a diff=d2x" >"$submodule_attr" &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep saves textconv cache in the appropriated repository' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x_cached.textconv "sed -e \"s/d/x/\"" &&
> +	test_config_global diff.d2x_cached.cachetextconv true &&
> +	echo "a diff=d2x_cached" >submodule/.gitattributes &&
> +
> +	# Note: we only read/write to the textconv cache when grepping from an
> +	# OID as the working tree file might have modifications. That is why
> +	# we use --cached here.
> +	#
> +	git grep --textconv --cached --recurse-submodules x &&
> +	test_path_is_missing "$(git rev-parse --git-path refs/notes/textconv/d2x_cached)" &&
> +	test_path_is_file "$(git -C submodule rev-parse --git-path refs/notes/textconv/d2x_cached)"
> +'
> +
>  test_done

Thanks! I think it would be very good to have these tests in-tree along
with an updated comment pointing to them.


  reply	other threads:[~2021-09-27 17:46 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-10 21:13   ` Junio C Hamano
2021-08-13 16:53     ` Jonathan Tan
2021-08-11 21:33   ` Emily Shaffer
2021-08-13 16:23     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-11 21:36   ` Emily Shaffer
2021-08-13 16:31     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-10 21:38   ` Junio C Hamano
2021-08-11 21:42   ` Emily Shaffer
2021-08-11 23:07     ` Ramsay Jones
2021-08-13 16:32       ` Jonathan Tan
2021-08-11 22:45   ` Matheus Tavares Bernardino
2021-08-12 16:49     ` Junio C Hamano
2021-08-13 16:33       ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 4/7] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-11 21:44   ` Emily Shaffer
2021-08-13 16:39     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 5/7] grep: allocate subrepos on heap Jonathan Tan
2021-08-11 21:50   ` Emily Shaffer
2021-08-13 16:42     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
2021-08-11 21:52   ` Emily Shaffer
2021-08-13 16:44     ` Jonathan Tan
2021-08-11 23:28   ` Matheus Tavares Bernardino
2021-08-13 16:47     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 7/7] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-11 21:55   ` Emily Shaffer
2021-08-11 22:22     ` Matheus Tavares Bernardino
2021-08-13 16:50       ` Jonathan Tan
2021-08-11 21:29 ` [PATCH 0/7] In grep, no adding submodule ODB as alternates Emily Shaffer
2021-08-11 22:49 ` Josh Steadmon
2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-16 15:06     ` Matheus Tavares Bernardino
2021-08-13 21:05   ` [PATCH v2 4/8] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 5/8] grep: allocate subrepos on heap Jonathan Tan
2021-08-13 21:44     ` Junio C Hamano
2021-08-16 19:42       ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 6/8] grep: add repository to OID grep sources Jonathan Tan
2021-08-16 14:48     ` Matheus Tavares Bernardino
2021-08-16 19:44       ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
2021-08-16 14:32     ` Matheus Tavares Bernardino
2021-08-16 19:57       ` Matheus Tavares Bernardino
2021-08-16 20:02       ` Jonathan Tan
2021-08-16 15:48     ` Matheus Tavares Bernardino
2021-08-16 20:09       ` Jonathan Tan
2021-08-16 20:57         ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-16 15:14   ` [PATCH v2 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 4/8] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 5/8] grep: allocate subrepos on heap Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 6/8] grep: add repository to OID grep sources Jonathan Tan
2021-09-27 12:08     ` Ævar Arnfjörð Bjarmason
2021-09-27 16:45       ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
2021-09-27 17:30         ` Ævar Arnfjörð Bjarmason [this message]
2021-08-16 21:09   ` [PATCH v3 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-17 19:29   ` [PATCH v3 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
2021-09-08  0:26   ` Junio C Hamano
2021-09-08 15:31     ` Matheus Tavares Bernardino
2021-09-08 18:45       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2019-09-18  1:56 [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates Matheus Tavares
2019-09-18 19:55 ` Junio C Hamano
2019-09-19  5:18   ` Matheus Tavares Bernardino
2019-09-20 16:26     ` Junio C Hamano
2019-09-21 20:34       ` Matheus Tavares Bernardino
2019-09-28  3:24         ` Junio C Hamano
2019-09-28  4:20           ` Matheus Tavares Bernardino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgryylfx.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=matheus.bernardino@usp.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).