git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>, git@vger.kernel.org
Cc: vdye@github.com
Subject: Re: [PATCH v1 1/2] builtin/grep.c: add --sparse option
Date: Thu, 25 Aug 2022 02:20:27 +0800	[thread overview]
Message-ID: <74ff1a97-fa98-7280-9d84-35dafaf3cb3d@gmail.com> (raw)
In-Reply-To: <80f24382-1188-d450-d1e2-42f68c08e60b@github.com>

Hi reviewrs,

I came back from busying with relocation :)

On 8/17/2022 10:12 PM, Derrick Stolee wrote:
 > On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
 >> Add a --sparse option to `git-grep`. This option is mainly used to:
 >>
 >> If searching in the index (using --cached):
 >>
 >> With --sparse, proceed the action when the current cache_entry is
 >
 > This phrasing is awkward. It might be better to reframe to describe the
 > _why_ before the _what_
 >
 >   When the '--cached' option is used with the 'git grep' command, the
 >   search is limited to the blobs found in the index, not in the worktree.
 >   If the user has enabled sparse-checkout, this might present more 
results
 >   than they would like, since the files outside of the 
sparse-checkout are
 >   unlikely to be important to them.
 >
 >   Change the default behavior of 'git grep' to focus on the files within
 >   the sparse-checkout definition. To enable the previous behavior, add a
 >   '--sparse' option to 'git grep' that triggers the old behavior that
 >   inspects paths outside of the sparse-checkout definition when paired
 >   with the '--cached' option.

Good suggestion!

 > Or something like that. The documentation updates will also help clarify
 > what happens when '--cached' is not included. I assume '--sparse' is
 > ignored, but perhaps it _could_ allow looking at the cached files outside
 > the sparse-checkout definition, this could make the simpler invocation of
 > 'git grep --sparse <pattern>' be the way that users can search after 
their
 > attempt to search the worktree failed.

This simpler version was in my earlier local branch, but later I
decided not to go with it. I found the difference between these two
approaches, is that "--cached --sparse" is more correct in terms of
how Git actually works (because sparsity is a concept in the index);
and "--sparse" is more comfortable for the end user.

I found the former one better here, because it is more self-explanatory,
and thus more info for the user, i.e. "you are now looking at the
index, and Git will also consider files outside of sparse definition."

To be honest, I don't know which one is "better", but I think I'll
keep the current implementation unless something more convincing shows
up later.

 >> marked with SKIP_WORKTREE bit (the default is to skip this kind of
 >> entry). Before this patch, --cached itself can realize this action.
 >> Adding --sparse here grants the user finer control over sparse
 >> entries. If the user only wants to peak into the index without
 >
 > s/peak/peek/
 >
 >> caring about sparse entries, --cached should suffice; if the user
 >> wants to peak into the index _and_ cares about sparse entries,
 >> combining --sparse with --cached can address this need.
 >>
 >> Suggested-by: Victoria Dye <vdye@github.com>
 >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
 >> ---
 >>  builtin/grep.c                  | 10 +++++++++-
 >>  t/t7817-grep-sparse-checkout.sh | 12 ++++++------
 >>  2 files changed, 15 insertions(+), 7 deletions(-)
 >
 > You mentioned in Slack that you missed the documentation of the --sparse
 > option. Just pointing it out here so we don't forget.

Will do.

 >>
 >> diff --git a/builtin/grep.c b/builtin/grep.c
 >> index e6bcdf860c..61402e8084 100644
 >> --- a/builtin/grep.c
 >> +++ b/builtin/grep.c
 >> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
 >>
 >>  static int skip_first_line;
 >>
 >> +static int grep_sparse = 0;
 >> +
 >
 > I initially thought it might be good to not define an additional global,
 > but there are many defined in this file outside of the context and they
 > are spread out with extra whitespace like this.
 >
 >>  static void add_work(struct grep_opt *opt, struct grep_source *gs)
 >>  {
 >>      if (opt->binary != GREP_BINARY_TEXT)
 >> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
 >>      for (nr = 0; nr < repo->index->cache_nr; nr++) {
 >>          const struct cache_entry *ce = repo->index->cache[nr];
 >>
 >> -        if (!cached && ce_skip_worktree(ce))
 >
 > This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
 > was provided.
 >
 >> +        /*
 >> +         * If ce is a SKIP_WORKTREE entry, look into it when both
 >> +         * --sparse and --cached are given.
 >> +         */
 >> +        if (!(grep_sparse && cached) && ce_skip_worktree(ce))
 >>              continue;
 >
 > The logic of this if statement is backwards from the comment because a
 > true statement means "skip the entry" _not_ "look into it".
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless both --sparse and
 >      * --cached are given.
 >      */

Got it.

 > But again, we might want to consider this alternative:
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless --sparse is given.
 >      */
 >     if (!grep_sparse && ce_skip_worktree(ce))
 >         continue;
 >
 > This will require further changes below, specifically this bit:
 >
 >             /*
 >              * If CE_VALID is on, we assume worktree file and its
 >              * cache entry are identical, even if worktree file has
 >              * been modified, so use cache version instead
 >              */
 >             if (cached || (ce->ce_flags & CE_VALID)) {
 >                 if (ce_stage(ce) || ce_intent_to_add(ce))
 >                     continue;
 >                 hit |= grep_oid(opt, &ce->oid, name.buf,
 >                          0, name.buf);
 >             } else {
 >
 > We need to activate this grep_oid() call also when ce_skip_worktree(c) is
 > true. That is, if we want 'git grep --sparse' to extend the search beyond
 > the worktree and into the sparse entries.
 >
 >>
 >>          strbuf_setlen(&name, name_base_len);
 >> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const 
char *prefix)
 >>                 PARSE_OPT_NOCOMPLETE),
 >>          OPT_INTEGER('m', "max-count", &opt.max_count,
 >>              N_("maximum number of results per file")),
 >> +        OPT_BOOL(0, "sparse", &grep_sparse,
 >> +             N_("search sparse contents and expand sparse index")),
 >
 > This "and expand sparse index" is an internal implementation detail, 
not a
 > heplful item for the help text. Instead, perhaps:
 >
 >     "search the contents of files outside the sparse-checkout definition"

Sounds good!

 > (Also, while the sparse index is being expanded right now, I would expect
 > to not expand the sparse index by the end of the series.)
 >
 >> -test_expect_success 'grep --cached searches entries with the 
SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --cached and --sparse searches entries 
with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >>      dir/c:text
 >>      EOF
 >> -    git grep --cached "text" >actual &&
 >> +    git grep --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Please add a test that demonstrates the change of behavior when only 
--cached
 > is provided, not --sparse.

Sure!

 > (If you take my suggestion to allow 'git grep --sparse' to do something
 > different, then also add a test for that case.)
 >
 >>
 >> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules 
honors sparse checkout in submodu
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --recurse-submodules --cached searches 
entries with the SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --recurse-submodules --cached and 
--sparse searches entries with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules 
--cached searches entries with th
 >>      sub/B/b:text
 >>      sub2/a:text
 >>      EOF
 >> -    git grep --recurse-submodules --cached "text" >actual &&
 >> +    git grep --recurse-submodules --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not 
search the index with CE_VALID a
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --cached searches index entries with both 
CE_VALID and SKIP_WORKTREE' '
 >> +test_expect_success 'grep --cached and --sparse searches index 
entries with both CE_VALID and SKIP_WORKTREE' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches 
index entries with both CE_VALID and
 >>      EOF
 >>      test_when_finished "git update-index --no-assume-unchanged b" &&
 >>      git update-index --assume-unchanged b &&
 >> -    git grep --cached text >actual &&
 >> +    git grep --cached --sparse text >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Same with these two tests. Add additional commands that show the 
change of
 > behavior when only using '--cached'.

--
Thanks,
Shaoxuan


  parent reply	other threads:[~2022-08-24 18:20 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan [this message]
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74ff1a97-fa98-7280-9d84-35dafaf3cb3d@gmail.com \
    --to=shaoxuan.yuan02@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).