git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: git@vger.kernel.org
Cc: gitster@pobox.com, stolee@gmail.com, newren@gmail.com
Subject: [PATCH v7] grep: honor sparse-checkout on working tree searches
Date: Tue,  9 Feb 2021 18:33:30 -0300	[thread overview]
Message-ID: <5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br> (raw)
In-Reply-To: <cover.1599758167.git.matheus.bernardino@usp.br>

On a sparse checked out repository, `git grep` (without --cached) ends
up searching the cache when an entry matches the search pathspec and has
the SKIP_WORKTREE bit set. This is confusing both because the sparse
paths are not expected to be in a working tree search (as they are not
checked out), and because the output mixes working tree and cache
results without distinguishing them. (Note that grep also resorts to the
cache on working tree searches that include --assume-unchanged paths.
But the whole point in that case is to assume that the contents of the
index entry and the file are the same. This does not apply to the case
of sparse paths, where the file isn't even expected to be present.)

Fix that by teaching grep to honor the sparse-checkout rules for working
tree searches. If the user wants to grep paths outside the current
sparse-checkout definition, they may either update the sparsity rules to
materialize the files, or use --cached to search all blobs registered in
the index.

Note: it might also be interesting to add a configuration option that
allow users to search paths that are present despite having the
SKIP_WORKTREE bit set, and/or to restrict searches in the index and past
revisions too. These ideas are left as future improvements to avoid
conflicting with other sparse-checkout topics currently in flight.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

This new version includes only the bug fix for the working tree grep, as
discussed in [1]. I think there are a couple other patches that could be
extracted from the previous v6 [2] and sent as standalone topics,
without the risk of conflicting with the sparse-index work. E.g. the
unification of the git-grep.txt and config/grep.txt doc files. I'll look
into that tomorrow.

[1]: https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/
[2]: https://lore.kernel.org/git/cover.1599758167.git.matheus.bernardino@usp.br/

 builtin/grep.c                   |   7 +-
 t/t7011-skip-worktree-reading.sh |   9 --
 t/t7817-grep-sparse-checkout.sh  | 174 +++++++++++++++++++++++++++++++
 3 files changed, 179 insertions(+), 11 deletions(-)
 create mode 100755 t/t7817-grep-sparse-checkout.sh

diff --git a/builtin/grep.c b/builtin/grep.c
index ca259af441..bff214c755 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt,
 
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
+
+		if (!cached && ce_skip_worktree(ce))
+			continue;
+
 		strbuf_setlen(&name, name_base_len);
 		strbuf_addstr(&name, ce->name);
 
@@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt,
 			 * cache entry are identical, even if worktree file has
 			 * been modified, so use cache version instead
 			 */
-			if (cached || (ce->ce_flags & CE_VALID) ||
-			    ce_skip_worktree(ce)) {
+			if (cached || (ce->ce_flags & CE_VALID)) {
 				if (ce_stage(ce) || ce_intent_to_add(ce))
 					continue;
 				hit |= grep_oid(opt, &ce->oid, name.buf,
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 37525cae3a..26852586ac 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' '
 	test -z "$(git ls-files -m)"
 '
 
-test_expect_success 'grep with skip-worktree file' '
-	git update-index --no-skip-worktree 1 &&
-	echo test > 1 &&
-	git update-index 1 &&
-	git update-index --skip-worktree 1 &&
-	rm 1 &&
-	test "$(git grep --no-ext-grep test)" = "1:test"
-'
-
 echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A	1" > expected
 test_expect_success 'diff-index does not examine skip-worktree absent entries' '
 	setup_absent &&
diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
new file mode 100755
index 0000000000..590b99bbb6
--- /dev/null
+++ b/t/t7817-grep-sparse-checkout.sh
@@ -0,0 +1,174 @@
+#!/bin/sh
+
+test_description='grep in sparse checkout
+
+This test creates a repo with the following structure:
+
+.
+|-- a
+|-- b
+|-- dir
+|   `-- c
+|-- sub
+|   |-- A
+|   |   `-- a
+|   `-- B
+|       `-- b
+`-- sub2
+    `-- a
+
+Where the outer repository has non-cone mode sparsity patterns, sub is a
+submodule with cone mode sparsity patterns and sub2 is a submodule that is
+excluded by the superproject sparsity patterns. The resulting sparse checkout
+should leave the following structure in the working tree:
+
+.
+|-- a
+|-- sub
+|   `-- B
+|       `-- b
+`-- sub2
+    `-- a
+
+But note that sub2 should have the SKIP_WORKTREE bit set.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	echo "text" >a &&
+	echo "text" >b &&
+	mkdir dir &&
+	echo "text" >dir/c &&
+
+	git init sub &&
+	(
+		cd sub &&
+		mkdir A B &&
+		echo "text" >A/a &&
+		echo "text" >B/b &&
+		git add A B &&
+		git commit -m sub &&
+		git sparse-checkout init --cone &&
+		git sparse-checkout set B
+	) &&
+
+	git init sub2 &&
+	(
+		cd sub2 &&
+		echo "text" >a &&
+		git add a &&
+		git commit -m sub2
+	) &&
+
+	git submodule add ./sub &&
+	git submodule add ./sub2 &&
+	git add a b dir &&
+	git commit -m super &&
+	git sparse-checkout init --no-cone &&
+	git sparse-checkout set "/*" "!b" "!/*/" "sub" &&
+
+	git tag -am tag-to-commit tag-to-commit HEAD &&
+	tree=$(git rev-parse HEAD^{tree}) &&
+	git tag -am tag-to-tree tag-to-tree $tree &&
+
+	test_path_is_missing b &&
+	test_path_is_missing dir &&
+	test_path_is_missing sub/A &&
+	test_path_is_file a &&
+	test_path_is_file sub/B/b &&
+	test_path_is_file sub2/a &&
+	git branch -m main
+'
+
+# The test below covers a special case: the sparsity patterns exclude '/b' and
+# sparse checkout is enabled, but the path exists in the working tree (e.g.
+# manually created after `git sparse-checkout init`). git grep should skip it.
+test_expect_success 'working tree grep honors sparse checkout' '
+	cat >expect <<-EOF &&
+	a:text
+	EOF
+	test_when_finished "rm -f b" &&
+	echo "new-text" >b &&
+	git grep "text" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep searches unmerged file despite not matching sparsity patterns' '
+	cat >expect <<-EOF &&
+	b:modified-b-in-branchX
+	b:modified-b-in-branchY
+	EOF
+	test_when_finished "test_might_fail git merge --abort && \
+			    git checkout main && git sparse-checkout init" &&
+
+	git sparse-checkout disable &&
+	git checkout -b branchY main &&
+	test_commit modified-b-in-branchY b &&
+	git checkout -b branchX main &&
+	test_commit modified-b-in-branchX b &&
+
+	git sparse-checkout init &&
+	test_path_is_missing b &&
+	test_must_fail git merge branchY &&
+	git grep "modified-b" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' '
+	cat >expect <<-EOF &&
+	a:text
+	b:text
+	dir/c:text
+	EOF
+	git grep --cached "text" >actual &&
+	test_cmp expect actual
+'
+
+# Note that sub2/ is present in the worktree but it is excluded by the sparsity
+# patterns, so grep should not recurse into it.
+test_expect_success 'grep --recurse-submodules honors sparse checkout in submodule' '
+	cat >expect <<-EOF &&
+	a:text
+	sub/B/b:text
+	EOF
+	git grep --recurse-submodules "text" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' '
+	cat >expect <<-EOF &&
+	a:text
+	b:text
+	dir/c:text
+	sub/A/a:text
+	sub/B/b:text
+	sub2/a:text
+	EOF
+	git grep --recurse-submodules --cached "text" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'working tree grep does not search the index with CE_VALID and SKIP_WORKTREE' '
+	cat >expect <<-EOF &&
+	a:text
+	EOF
+	test_when_finished "git update-index --no-assume-unchanged b" &&
+	git update-index --assume-unchanged b &&
+	git grep text >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' '
+	cat >expect <<-EOF &&
+	a:text
+	b:text
+	dir/c:text
+	EOF
+	test_when_finished "git update-index --no-assume-unchanged b" &&
+	git update-index --assume-unchanged b &&
+	git grep --cached text >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.29.2


  parent reply	other threads:[~2021-02-09 23:03 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24  6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-03-24  6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares
2020-03-24  7:57   ` Elijah Newren
2020-03-24 21:26     ` Junio C Hamano
2020-03-24 23:38       ` Matheus Tavares
2020-03-24  6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares
2020-03-24  7:15   ` Elijah Newren
2020-03-24 15:12     ` Derrick Stolee
2020-03-24 16:16       ` Elijah Newren
2020-03-24 17:02         ` Derrick Stolee
2020-03-24 23:01       ` Matheus Tavares Bernardino
2020-03-24 22:55     ` Matheus Tavares Bernardino
2020-04-21  2:10       ` Matheus Tavares Bernardino
2020-04-21  3:08         ` Elijah Newren
2020-04-22 12:08           ` Derrick Stolee
2020-04-23  6:09           ` Matheus Tavares Bernardino
2020-03-24  6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares
2020-03-24  7:54   ` Elijah Newren
2020-03-24 18:30     ` Junio C Hamano
2020-03-24 19:07       ` Elijah Newren
2020-03-25 20:18         ` Junio C Hamano
2020-03-30  3:23       ` Matheus Tavares Bernardino
2020-03-31 19:12         ` Elijah Newren
2020-03-31 20:02           ` Derrick Stolee
2020-04-27 17:15             ` Matheus Tavares Bernardino
2020-04-29 16:46               ` Elijah Newren
2020-04-29 17:21             ` Elijah Newren
2020-03-25 23:15     ` Matheus Tavares Bernardino
2020-03-26  6:02       ` Elijah Newren
2020-03-27 15:51         ` Junio C Hamano
2020-03-27 19:01           ` Elijah Newren
2020-03-30  1:12         ` Matheus Tavares Bernardino
2020-03-31 16:48           ` Elijah Newren
2020-05-10  0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-05-10  0:41   ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares
2020-05-10  0:41   ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares
2020-05-11 19:10     ` Junio C Hamano
2020-05-12 22:55       ` Matheus Tavares Bernardino
2020-05-12 23:22         ` Junio C Hamano
2020-05-10  0:41   ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares
2020-05-11 19:35     ` Junio C Hamano
2020-05-13  0:05       ` Matheus Tavares Bernardino
2020-05-13  0:17         ` Junio C Hamano
2020-05-21  7:26           ` Elijah Newren
2020-05-21 17:35             ` Matheus Tavares Bernardino
2020-05-21 17:52               ` Elijah Newren
2020-05-22  5:49                 ` Matheus Tavares Bernardino
2020-05-22 14:26                   ` Elijah Newren
2020-05-22 15:36                     ` Elijah Newren
2020-05-22 20:54                       ` Matheus Tavares Bernardino
2020-05-22 21:06                         ` Elijah Newren
2020-06-10 11:40                     ` Derrick Stolee
2020-06-10 16:22                       ` Matheus Tavares Bernardino
2020-06-10 17:42                         ` Derrick Stolee
2020-06-10 18:14                           ` Matheus Tavares Bernardino
2020-06-10 20:12                         ` Elijah Newren
2020-06-10 19:58                       ` Elijah Newren
2020-05-21  7:36       ` Elijah Newren
2020-05-10  0:41   ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-05-10  4:23     ` Matheus Tavares Bernardino
2020-05-21 17:18       ` Elijah Newren
2020-05-21  7:09     ` Elijah Newren
2020-05-28  1:12   ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-05-28  1:12     ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares
2020-05-28  1:13     ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares
2020-05-30 14:29       ` Elijah Newren
2020-06-01  4:36         ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares
2020-05-30 14:49       ` Elijah Newren
2020-06-01  4:38         ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares
2020-05-30 15:48       ` Elijah Newren
2020-06-01  4:44         ` Matheus Tavares Bernardino
2020-06-03  2:38           ` Elijah Newren
2020-06-10 17:08             ` Matheus Tavares Bernardino
2020-05-28  1:13     ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-05-30 16:18       ` Elijah Newren
2020-06-01  4:45         ` Matheus Tavares Bernardino
2020-06-03  2:39           ` Elijah Newren
2020-06-10 21:15             ` Matheus Tavares Bernardino
2020-06-11  0:35               ` Elijah Newren
2020-06-12 15:44     ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-06-12 15:44       ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 2/6] t/helper/test-config: return exit codes consistently Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares
2020-06-16 19:13         ` Elijah Newren
2020-06-21 16:05           ` Matheus Tavares Bernardino
2020-09-01  2:41         ` Jonathan Nieder
2020-09-01 21:44           ` Matheus Tavares Bernardino
2020-06-12 15:45       ` [PATCH v4 5/6] grep: honor sparse checkout patterns Matheus Tavares
2020-06-12 15:45       ` [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-06-16 22:31       ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Elijah Newren
2020-09-02  6:17       ` [PATCH v5 0/8] " Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares
2020-09-02  6:57           ` Eric Sunshine
2020-09-02 16:16             ` Matheus Tavares Bernardino
2020-09-02 16:38               ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares
2020-09-02  7:18           ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares
2020-09-02  7:30           ` Eric Sunshine
2020-09-02  6:17         ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares
2020-09-02 20:15           ` Jonathan Nieder
2020-09-09 13:04             ` Matheus Tavares Bernardino
2020-09-09 23:32               ` Jonathan Nieder
2020-09-02  6:17         ` [PATCH v5 7/8] grep: honor sparse checkout patterns Matheus Tavares
2020-09-02  6:17         ` [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2020-09-10 17:21         ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 5/9] t/helper/test-config: unify exit labels Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 7/9] config: correctly read worktree configs in submodules Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 8/9] grep: honor sparse checkout patterns Matheus Tavares
2020-09-10 17:21           ` [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares
2021-02-09 21:33           ` Matheus Tavares [this message]
2021-02-09 23:23             ` [PATCH v7] grep: honor sparse-checkout on working tree searches Junio C Hamano
2021-02-10  6:12               ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br \
    --to=matheus.bernardino@usp.br \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).