git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Matheus Tavares <matheus.bernardino@usp.br>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Stefan Beller <stefanbeller@gmail.com>
Subject: Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns
Date: Tue, 24 Mar 2020 09:16:38 -0700	[thread overview]
Message-ID: <CABPp-BFf-A_a=_02gP_=kBFmQmpM8udFZyrq17Co_u+QWURFKw@mail.gmail.com> (raw)
In-Reply-To: <59c04216-8dd9-cbbf-a869-a65ed8ca6e0a@gmail.com>

On Tue, Mar 24, 2020 at 8:12 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/24/2020 3:15 AM, Elijah Newren wrote:
> > Hi Matheus,
> >
> > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares
...
> >> Something I'm not entirely sure in this patch is how we implement the
> >> mechanism to honor sparsity for the `git grep <commit-ish>` case (which
> >> is treated in the grep_tree() function). Currently, the patch looks for
> >> an index entry that matches the path, and then checks its skip_worktree
> >
> > As you discuss below, checking the index is both wrong _and_ costly.
>
> I'm not sure why checking the index is _wrong_, but I agree about the
> performance cost.

Let's say there are two directories, dir1 and dir2.  Over time, there
have existed a total of six files:
   dir1/{a,b,c}
   dir2/{d,e,f}
At the current time, there are only four files in the index:
   dir1/{a,b}
   dir2/{d,e}
And the user has done a `git sparse-checkout set dir2` and then at
some point later run `git grep OTHERCOMMIT foobar`.  What happens?

Well, since we're in a sparse checkout, we should only search the
relevant paths within OTHERCOMMIT for "foobar".  Let's say we attempt
to figure out the "relevant paths" using the index.  We can tell that
dir1/a and dir2/a are marked as SKIP_WORKTREE so we don't search them.
dir1/c is untracked -- what do we do with it?  Include it?  Exclude
it?  Carrying on with the other files, dir2/d and dir2/e are tracked
and !SKIP_WORKTREE so we search them.  dir2/f is untracked -- what do
we do with it?  Include it?  Exclude it?

We're left without the necessary information to tell whether we should
search OTHERCOMMIT's dir1/c and dir2/f if we consult the index.  Any
decision we make is going to be wrong for one of the two paths.

If we instead do not attempt to consult the index (which corresponds
to a version close to HEAD) in order to ask questions about the
completely different OTHERCOMMIT, but instead use the sparsity
patterns to query whether those files/directories are interesting,
then we get the right answer.  The index can only be consulted for the
right answer in the case of --cached; in all other cases (including
OTHERCOMMIT == HEAD), we should use the sparsity patterns.  In fact,
we could also use the sparsity patterns in the case of --cached, it's
just that for that one particular case consulting the index will also
give the right answer.

  reply	other threads:[~2020-03-24 16:16 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24  6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-03-24  6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares
2020-03-24  7:57   ` Elijah Newren
2020-03-24 21:26     ` Junio C Hamano
2020-03-24 23:38       ` Matheus Tavares
2020-03-24  6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares
2020-03-24  7:15   ` Elijah Newren
2020-03-24 15:12     ` Derrick Stolee
2020-03-24 16:16       ` Elijah Newren [this message]
2020-03-24 17:02         ` Derrick Stolee
2020-03-24 23:01       ` Matheus Tavares Bernardino
2020-03-24 22:55     ` Matheus Tavares Bernardino
2020-04-21  2:10       ` Matheus Tavares Bernardino
2020-04-21  3:08         ` Elijah Newren
2020-04-22 12:08           ` Derrick Stolee
2020-04-23  6:09           ` Matheus Tavares Bernardino
2020-03-24  6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares
2020-03-24  7:54   ` Elijah Newren
2020-03-24 18:30     ` Junio C Hamano
2020-03-24 19:07       ` Elijah Newren
2020-03-25 20:18         ` Junio C Hamano
2020-03-30  3:23       ` Matheus Tavares Bernardino
2020-03-31 19:12         ` Elijah Newren
2020-03-31 20:02           ` Derrick Stolee
2020-04-27 17:15             ` Matheus Tavares Bernardino
2020-04-29 16:46               ` Elijah Newren
2020-04-29 17:21             ` Elijah Newren
2020-03-25 23:15     ` Matheus Tavares Bernardino
2020-03-26  6:02       ` Elijah Newren
2020-03-27 15:51         ` Junio C Hamano
2020-03-27 19:01           ` Elijah Newren
2020-03-30  1:12         ` Matheus Tavares Bernardino
2020-03-31 16:48           ` Elijah Newren
2020-05-10  0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-05-10  0:41   ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares
2020-05-10  0:41   ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares
2020-05-11 19:10     ` Junio C Hamano
2020-05-12 22:55       ` Matheus Tavares Bernardino
2020-05-12 23:22         ` Junio C Hamano
2020-05-10  0:41   ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares
2020-05-11 19:35     ` Junio C Hamano
2020-05-13  0:05       ` Matheus Tavares Bernardino
2020-05-13  0:17         ` Junio C Hamano
2020-05-21  7:26           ` Elijah Newren
2020-05-21 17:35             ` Matheus Tavares Bernardino
2020-05-21 17:52               ` Elijah Newren
2020-05-22  5:49                 ` Matheus Tavares Bernardino
2020-05-22 14:26                   ` Elijah Newren
2020-05-22 15:36                     ` Elijah Newren
2020-05-22 20:54                       ` Matheus Tavares Bernardino
2020-05-22 21:06                         ` Elijah Newren
2020-06-10 11:40                     ` Derrick Stolee
2020-06-10 16:22                       ` Matheus Tavares Bernardino
2020-06-10 17:42                         ` Derrick Stolee
2020-06-10 18:14                           ` Matheus Tavares Bernardino
2020-06-10 20:12                         ` Elijah Newren
2020-06-10 19:58                       ` Elijah Newren
2020-05-21  7:36       ` Elijah Newren
2020-05-10  0:41   ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-05-10  4:23     ` Matheus Tavares Bernardino
2020-05-21 17:18       ` Elijah Newren
2020-05-21  7:09     ` Elijah Newren
2020-05-28  1:12   ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-05-28  1:12     ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares
2020-05-28  1:13     ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares
2020-05-30 14:29       ` Elijah Newren
2020-06-01  4:36         ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares
2020-05-30 14:49       ` Elijah Newren
2020-06-01  4:38         ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares
2020-05-30 15:48       ` Elijah Newren
2020-06-01  4:44         ` Matheus Tavares Bernardino
2020-06-03  2:38           ` Elijah Newren
2020-06-10 17:08             ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-05-30 16:18       ` Elijah Newren
2020-06-01  4:45         ` Matheus Tavares Bernardino
2020-06-03  2:39           ` Elijah Newren
2020-06-10 21:15             ` Matheus Tavares Bernardino
2020-06-11  0:35               ` Elijah Newren
2020-06-12 15:44     ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-06-12 15:44       ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 2/6] t/helper/test-config: return exit codes consistently Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares
2020-06-16 19:13         ` Elijah Newren
2020-06-21 16:05           ` Matheus Tavares Bernardino
2020-09-01  2:41         ` Jonathan Nieder
2020-09-01 21:44           ` Matheus Tavares Bernardino
2020-06-12 15:45       ` [PATCH v4 5/6] grep: honor sparse checkout patterns Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-06-16 22:31       ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Elijah Newren
2020-09-02  6:17       ` [PATCH v5 0/8] " Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares
2020-09-02  6:57           ` Eric Sunshine
2020-09-02 16:16             ` Matheus Tavares Bernardino
2020-09-02 16:38               ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares
2020-09-02  7:18           ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares
2020-09-02  7:30           ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares
2020-09-02 20:15           ` Jonathan Nieder
2020-09-09 13:04             ` Matheus Tavares Bernardino
2020-09-09 23:32               ` Jonathan Nieder
2020-09-02  6:17         ` [PATCH v5 7/8] grep: honor sparse checkout patterns Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-09-10 17:21         ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 5/9] t/helper/test-config: unify exit labels Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 7/9] config: correctly read worktree configs in submodules Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 8/9] grep: honor sparse checkout patterns Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2021-02-09 21:33           ` [PATCH v7] grep: honor sparse-checkout on working tree searches Matheus Tavares
2021-02-09 23:23             ` Junio C Hamano
2021-02-10  6:12               ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFf-A_a=_02gP_=kBFmQmpM8udFZyrq17Co_u+QWURFKw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=sandals@crustytoothpaste.net \
    --cc=stefanbeller@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).