git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>, Derrick Stolee <stolee@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns
Date: Tue, 12 May 2020 21:05:18 -0300	[thread overview]
Message-ID: <CAHd-oW4k-9ZHQJcqEUqzxw0YShPWCaG61=v4wyBvXte2sSTV5g@mail.gmail.com> (raw)
In-Reply-To: <xmqqh7wmqn7k.fsf@gitster.c.googlers.com>

On Mon, May 11, 2020 at 4:35 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > One of the main uses for a sparse checkout is to allow users to focus on
> > the subset of files in a repository in which they are interested. But
> > git-grep currently ignores the sparsity patterns and report all matches
> > found outside this subset, which kind of goes in the opposite direction.
> > Let's fix that, making it honor the sparsity boundaries for every
> > grepping case:
> >
> > - git grep in worktree
> > - git grep --cached
> > - git grep $REVISION
>
> It makes sense for these to be limited within the "sparse" area.
>
> > - git grep --untracked and git grep --no-index (which already respect
> >   sparse checkout boundaries)
>
> I can understand the former; those untracked files are what _could_
> be brought into attention by "git add", so limiting to the same
> "sparse" area may make sense.
>
> I am not sure about the latter, though, as "--no-index" is an
> explicit request to pretend that we are dealing with a random
> collection of files, not managed in a git repository.  But perhaps
> there is a similar justification like how "--untracked" is
> unjustifiable.  I dunno.

Yeah, I think there was no need to mention those two cases here. My
intention was to say that, in these cases, we should stick to the
files that are present in the working tree (which should match the
sparsity patterns + untracked {and ignored, in --no-index}), as
opposed to how the worktree grep used to behave until now, falling
back to the cache on files excluded by the sparse checkout.

> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index a5056f395a..91ee0b2734 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt,
> >                     const struct pathspec *pathspec, int cached);
> >  static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
> >                    struct tree_desc *tree, struct strbuf *base, int tn_len,
> > -                  int check_attr);
> > +                  int is_root_tree);
> >
> >  static int grep_submodule(struct grep_opt *opt,
> >                         const struct pathspec *pathspec,
> > @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt,
> >
> >       for (nr = 0; nr < repo->index->cache_nr; nr++) {
> >               const struct cache_entry *ce = repo->index->cache[nr];
> > +
> > +             if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode))
> > +                     continue;
>
> Hmph.  Why exclude gitlink from this rule?  If a submodule sits at a
> path that is excluded by the sparse pattern, should we still recurse
> into it?

The idea behind not skipping gitlinks here was to be compliant with
what we have in the working tree. In 4fd683b ("sparse-checkout:
document interactions with submodules"), we decided that, if the
sparse-checkout patterns exclude a submodule, the submodule would
still appear in the working tree. The purpose was to keep these
features (submodules and sparse-checkout) independent. Along the same
lines, I think we should always recurse into initialized submodules in
grep, and then load their own sparsity patterns, to decide what should
be grepped within.

[...]
> > +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
> > +                  struct tree_desc *tree, struct strbuf *base, int tn_len,
> > +                  int is_root_tree)
> > +{
> > +     struct pattern_list *patterns = NULL;
> > +     int ret;
> > +
> > +     if (is_root_tree)
> > +             patterns = get_sparsity_patterns(opt->repo);
> > +
> > +     ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree,
> > +                        patterns, 0);
> > +
> > +     if (patterns) {
> > +             clear_pattern_list(patterns);
> > +             free(patterns);
> > +     }
>
> OK, it is not like this codepath is driven by "git log" to grep from
> top-level tree objects of many commits, so it is OK to grab the
> sparsity patterns once before do_grep_tree() and discard it when we
> are done.

Yeah. A possible performance problem here would be when users pass
many trees to git-grep (since we are reloading the pattern lists, from
both the_repository and submodules, for each tree). But, as Elijah
pointed out [1], the cases where this overhead might be somewhat
noticeable should be very rare.

[1]: https://lore.kernel.org/git/CABPp-BGUf-4exGW23xka1twf2D=nFOz1CkD_f-rDX_AGdVEeDA@mail.gmail.com/

  reply	other threads:[~2020-05-13  0:05 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24  6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-03-24  6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares
2020-03-24  7:57   ` Elijah Newren
2020-03-24 21:26     ` Junio C Hamano
2020-03-24 23:38       ` Matheus Tavares
2020-03-24  6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares
2020-03-24  7:15   ` Elijah Newren
2020-03-24 15:12     ` Derrick Stolee
2020-03-24 16:16       ` Elijah Newren
2020-03-24 17:02         ` Derrick Stolee
2020-03-24 23:01       ` Matheus Tavares Bernardino
2020-03-24 22:55     ` Matheus Tavares Bernardino
2020-04-21  2:10       ` Matheus Tavares Bernardino
2020-04-21  3:08         ` Elijah Newren
2020-04-22 12:08           ` Derrick Stolee
2020-04-23  6:09           ` Matheus Tavares Bernardino
2020-03-24  6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares
2020-03-24  7:54   ` Elijah Newren
2020-03-24 18:30     ` Junio C Hamano
2020-03-24 19:07       ` Elijah Newren
2020-03-25 20:18         ` Junio C Hamano
2020-03-30  3:23       ` Matheus Tavares Bernardino
2020-03-31 19:12         ` Elijah Newren
2020-03-31 20:02           ` Derrick Stolee
2020-04-27 17:15             ` Matheus Tavares Bernardino
2020-04-29 16:46               ` Elijah Newren
2020-04-29 17:21             ` Elijah Newren
2020-03-25 23:15     ` Matheus Tavares Bernardino
2020-03-26  6:02       ` Elijah Newren
2020-03-27 15:51         ` Junio C Hamano
2020-03-27 19:01           ` Elijah Newren
2020-03-30  1:12         ` Matheus Tavares Bernardino
2020-03-31 16:48           ` Elijah Newren
2020-05-10  0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-05-10  0:41   ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares
2020-05-10  0:41   ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares
2020-05-11 19:10     ` Junio C Hamano
2020-05-12 22:55       ` Matheus Tavares Bernardino
2020-05-12 23:22         ` Junio C Hamano
2020-05-10  0:41   ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares
2020-05-11 19:35     ` Junio C Hamano
2020-05-13  0:05       ` Matheus Tavares Bernardino [this message]
2020-05-13  0:17         ` Junio C Hamano
2020-05-21  7:26           ` Elijah Newren
2020-05-21 17:35             ` Matheus Tavares Bernardino
2020-05-21 17:52               ` Elijah Newren
2020-05-22  5:49                 ` Matheus Tavares Bernardino
2020-05-22 14:26                   ` Elijah Newren
2020-05-22 15:36                     ` Elijah Newren
2020-05-22 20:54                       ` Matheus Tavares Bernardino
2020-05-22 21:06                         ` Elijah Newren
2020-06-10 11:40                     ` Derrick Stolee
2020-06-10 16:22                       ` Matheus Tavares Bernardino
2020-06-10 17:42                         ` Derrick Stolee
2020-06-10 18:14                           ` Matheus Tavares Bernardino
2020-06-10 20:12                         ` Elijah Newren
2020-06-10 19:58                       ` Elijah Newren
2020-05-21  7:36       ` Elijah Newren
2020-05-10  0:41   ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-05-10  4:23     ` Matheus Tavares Bernardino
2020-05-21 17:18       ` Elijah Newren
2020-05-21  7:09     ` Elijah Newren
2020-05-28  1:12   ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-05-28  1:12     ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares
2020-05-28  1:13     ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares
2020-05-30 14:29       ` Elijah Newren
2020-06-01  4:36         ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares
2020-05-30 14:49       ` Elijah Newren
2020-06-01  4:38         ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares
2020-05-30 15:48       ` Elijah Newren
2020-06-01  4:44         ` Matheus Tavares Bernardino
2020-06-03  2:38           ` Elijah Newren
2020-06-10 17:08             ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-05-30 16:18       ` Elijah Newren
2020-06-01  4:45         ` Matheus Tavares Bernardino
2020-06-03  2:39           ` Elijah Newren
2020-06-10 21:15             ` Matheus Tavares Bernardino
2020-06-11  0:35               ` Elijah Newren
2020-06-12 15:44     ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-06-12 15:44       ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 2/6] t/helper/test-config: return exit codes consistently Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares
2020-06-16 19:13         ` Elijah Newren
2020-06-21 16:05           ` Matheus Tavares Bernardino
2020-09-01  2:41         ` Jonathan Nieder
2020-09-01 21:44           ` Matheus Tavares Bernardino
2020-06-12 15:45       ` [PATCH v4 5/6] grep: honor sparse checkout patterns Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-06-16 22:31       ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Elijah Newren
2020-09-02  6:17       ` [PATCH v5 0/8] " Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares
2020-09-02  6:57           ` Eric Sunshine
2020-09-02 16:16             ` Matheus Tavares Bernardino
2020-09-02 16:38               ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares
2020-09-02  7:18           ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares
2020-09-02  7:30           ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares
2020-09-02 20:15           ` Jonathan Nieder
2020-09-09 13:04             ` Matheus Tavares Bernardino
2020-09-09 23:32               ` Jonathan Nieder
2020-09-02  6:17         ` [PATCH v5 7/8] grep: honor sparse checkout patterns Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-09-10 17:21         ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 5/9] t/helper/test-config: unify exit labels Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 7/9] config: correctly read worktree configs in submodules Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 8/9] grep: honor sparse checkout patterns Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2021-02-09 21:33           ` [PATCH v7] grep: honor sparse-checkout on working tree searches Matheus Tavares
2021-02-09 23:23             ` Junio C Hamano
2021-02-10  6:12               ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHd-oW4k-9ZHQJcqEUqzxw0YShPWCaG61=v4wyBvXte2sSTV5g@mail.gmail.com' \
    --to=matheus.bernardino@usp.br \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).