git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: git@vger.kernel.org
Cc: Emily Shaffer <emilyshaffer@google.com>
Subject: [PATCH] grep: provide pathspecs/patterns via file or stdin
Date: Thu, 21 Nov 2019 17:16:46 -0800	[thread overview]
Message-ID: <20191122011646.218346-1-emilyshaffer@google.com> (raw)

Teach 'git grep' how to receive pathspecs via file, and add convenience
flags for receiving pathspecs or patterns via stdin.

In some cases, the list of pathspecs to 'git grep' is longer than the
command line allows, or long enough that a user becomes fatigued reading
the command.  In other cases, a user may wish to write a script or
oneliner which generates a list of files to 'git grep' against. In both
scenarios, a reasonable workaround is to provide the list of pathspecs
via a file or stdin.

Before this change, it was already possible to receive patterns via
stdin by using the argument '-f -'. Now that there are two arguments
which could both use stdin, help safeguard the user from doing so by
providing convenience '--stdin-patterns' and '--stdin-pathspecs' which
are mutually exclusive.  (It is still possible to ask for '-f -
--pathspecs-file -', though.)

Finally, in order to match '--pathspecs-file', give '-f' a long form
'--patterns-file' too.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
Providing too many pathspecs for the argument list to handle is causing
some gore out in the wild with git-secrets:
https://github.com/awslabs/git-secrets/issues/78

A short arg for --pathspecs-file didn't occur to me because grep has
many short args already, but if anyone has a suggestion I'll happily add
it.

The pathspec list is gathered into an argv_array because parse_pathspec
is only called once, against the entire list of pathspecs. Perhaps it
makes sense to concatenate the paths given on the command line with the
paths given via file, but since it was nontrivial to merge two
argv_arrays and I wasn't certain it was the right choice, I didn't do
so. I'd probably implement that by adding a utility function to
argv-array. ('-f somepatterns.txt -e otherpattern' concatenates the
patterns, by the way.)

 - Emily

 Documentation/git-grep.txt | 18 ++++++++++-
 builtin/grep.c             | 61 +++++++++++++++++++++++++++++++++-----
 t/t7810-grep.sh            | 52 ++++++++++++++++++++++++++++++--
 3 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index c89fb569e3..fb588a3846 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -24,7 +24,8 @@ SYNOPSIS
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
 	   [--threads <num>]
-	   [-f <file>] [-e] <pattern>
+	   [--pathspecs-file <file>] [--stdin-pathspecs]
+	   [-f | --patterns-file <file>] [--stdin-patterns] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
 	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
@@ -270,7 +271,22 @@ providing this option will cause it to die.
 	See `grep.threads` in 'CONFIGURATION' for more information.
 
 -f <file>::
+--patterns-file <file>::
 	Read patterns from <file>, one per line.
+
+--stdin-patterns::
+	Read patterns from stdin, one per line. Equivalent to `-f -`. Cannot be
+	combined with `--stdin-pathspecs`.
+
+--pathspecs-file <file>::
+	Read pathspecs from <file>, one per line. Pathspecs passed in the
+	argument list will be ignored in this mode.
+
+--stdin-pathspecs::
+	Read pathspecs from stdin, one per line. Pathspecs passed in the
+	argument list will be ignored in this mode. Equivalent to
+	`--pathspecs-file -`.
+
 +
 Passing the pattern via <file> allows for providing a search pattern
 containing a \0.
diff --git a/builtin/grep.c b/builtin/grep.c
index 50ce8d9461..f6f9f84002 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -720,16 +720,38 @@ static int context_callback(const struct option *opt, const char *arg,
 	return 0;
 }
 
-static int file_callback(const struct option *opt, const char *arg, int unset)
+static int pathspec_file(struct argv_array *pathspec_argv, const char *arg)
+{
+	struct strbuf sb = STRBUF_INIT;
+
+	if (!strcmp(arg, "-")) {
+		while (strbuf_getline(&sb, stdin) != EOF)
+			argv_array_push(pathspec_argv, sb.buf);
+	}
+	else {
+		if (!strbuf_read_file(&sb, arg, 0))
+			die_errno(_("cannot open '%s'"), arg);
+		argv_array_split(pathspec_argv, sb.buf);
+	}
+
+	strbuf_release(&sb);
+	return 0;
+}
+
+static int pathspec_file_callback(const struct option *opt,
+				  const char *arg, int unset)
+{
+	BUG_ON_OPT_NEG(unset);
+	return pathspec_file(opt->value, arg);
+}
+
+static int pattern_file(struct grep_opt *grep_opt, const char *arg)
 {
-	struct grep_opt *grep_opt = opt->value;
 	int from_stdin;
 	FILE *patterns;
 	int lno = 0;
 	struct strbuf sb = STRBUF_INIT;
 
-	BUG_ON_OPT_NEG(unset);
-
 	from_stdin = !strcmp(arg, "-");
 	patterns = from_stdin ? stdin : fopen(arg, "r");
 	if (!patterns)
@@ -748,6 +770,12 @@ static int file_callback(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int pattern_file_callback(const struct option *opt, const char *arg, int unset)
+{
+	BUG_ON_OPT_NEG(unset);
+	return pattern_file(opt->value, arg);
+}
+
 static int not_callback(const struct option *opt, const char *arg, int unset)
 {
 	struct grep_opt *grep_opt = opt->value;
@@ -803,12 +831,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	struct grep_opt opt;
 	struct object_array list = OBJECT_ARRAY_INIT;
 	struct pathspec pathspec;
+	struct argv_array pathspec_argv;
 	struct string_list path_list = STRING_LIST_INIT_NODUP;
 	int i;
 	int dummy;
 	int use_index = 1;
 	int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED;
 	int allow_revs;
+	int stdin_patterns = 0;
+	int stdin_pathspecs = 0;
 
 	struct option options[] = {
 		OPT_BOOL(0, "cached", &cached,
@@ -896,8 +927,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL('W', "function-context", &opt.funcbody,
 			N_("show the surrounding function")),
 		OPT_GROUP(""),
-		OPT_CALLBACK('f', NULL, &opt, N_("file"),
-			N_("read patterns from file"), file_callback),
+		OPT_CALLBACK('f', "patterns-file", &opt, N_("file"),
+			N_("read patterns from file"), pattern_file_callback),
+		OPT_BOOL(0, "stdin-patterns", &stdin_patterns,
+			 N_("read patterns from stdin")),
+		OPT_CALLBACK(0, "pathspecs-file", &pathspec_argv, N_("file"),
+			N_("read pathspecs from file"), pathspec_file_callback),
+		OPT_BOOL(0, "stdin-pathspecs", &stdin_pathspecs,
+			 N_("read pathspecs from stdin")),
 		{ OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"),
 			N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback },
 		{ OPTION_CALLBACK, 0, "and", &opt, NULL,
@@ -949,6 +986,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 	grep_commit_pattern_type(pattern_type_arg, &opt);
 
+	if (stdin_patterns && stdin_pathspecs)
+		die(_("cannot use stdin for both patterns and pathspecs"));
+
+	if (stdin_patterns)
+		pattern_file(&opt, "-");
+
+	if (stdin_pathspecs)
+		pathspec_file(&pathspec_argv, "-");
+
 	if (use_index && !startup_info->have_repository) {
 		int fallback = 0;
 		git_config_get_bool("grep.fallbacktonoindex", &fallback);
@@ -1057,7 +1103,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	parse_pathspec(&pathspec, 0,
 		       PATHSPEC_PREFER_CWD |
 		       (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
-		       prefix, argv + i);
+		       prefix,
+		       (pathspec_argv.argc ? pathspec_argv.argv : argv) + i);
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 	pathspec.recurse_submodules = !!recurse_submodules;
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 7d7b396c23..be8e60cfa0 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -549,6 +549,10 @@ test_expect_success 'grep -f, non-existent file' '
 	test_must_fail git grep -f patterns
 '
 
+text_expect_success 'grep --pathspec-file, non-existent file' '
+	test_must_fail git grep --pathspecs-file pathspecs
+'
+
 cat >expected <<EOF
 file:foo mmap bar
 file:foo_mmap bar
@@ -582,8 +586,8 @@ mmap
 vvv
 EOF
 
-test_expect_success 'grep -f, multiple patterns' '
-	git grep -f patterns >actual &&
+test_expect_success 'grep --patterns-file, multiple patterns' '
+	git grep --patterns-file patterns >actual &&
 	test_cmp expected actual
 '
 
@@ -621,6 +625,11 @@ test_expect_success 'grep -f, ignore empty lines, read patterns from stdin' '
 	test_cmp expected actual
 '
 
+test_expect_success 'grep --stdin--patterns' '
+	git grep --stdin-patterns <patterns >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<EOF
 y:y yy
 --
@@ -1125,6 +1134,45 @@ test_expect_success 'grep --no-index descends into repos, but not .git' '
 	)
 '
 
+test_expect_success 'setup pathspecs-file tests' '
+cat >excluded-file <<EOF &&
+bar
+EOF
+cat >pathspec-file <<EOF &&
+foo
+bar
+baz
+EOF
+cat >unrelated-file <<EOF &&
+xyz
+EOF
+git add excluded-file pathspec-file unrelated-file
+'
+
+cat >pathspecs <<EOF
+pathspec-file
+unrelated-file
+EOF
+
+cat >expected <<EOF
+pathspec-file:bar
+EOF
+
+test_expect_success 'grep --stdin-pathspecs' '
+	git grep --stdin-pathspecs "bar" <pathspecs >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --pathspecs-file with file' '
+	git grep --pathspecs-file pathspecs "bar" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --pathspec-file with stdin' '
+	git grep --pathspecs-file - "bar" <pathspecs >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'setup double-dash tests' '
 cat >double-dash <<EOF &&
 --
-- 
2.24.0.432.g9d3f5f5b63-goog


             reply	other threads:[~2019-11-22  1:17 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22  1:16 Emily Shaffer [this message]
2019-11-22  2:14 ` [PATCH] grep: provide pathspecs/patterns via file or stdin Denton Liu
2019-11-22  2:34   ` Junio C Hamano
2019-11-22  3:56     ` Junio C Hamano
2019-11-22 18:52     ` Denton Liu
2019-11-22 22:02     ` Emily Shaffer
2019-11-22 22:06       ` Emily Shaffer
2019-11-23  0:28       ` Junio C Hamano
2019-11-22  2:24 ` Junio C Hamano
2019-12-04 20:39 ` [PATCH v2] grep: support the --pathspec-from-file option Emily Shaffer
2019-12-04 21:05   ` Denton Liu
2019-12-04 21:24     ` Junio C Hamano
2019-12-04 22:24   ` Junio C Hamano
2019-12-13  3:07     ` Emily Shaffer
2019-12-05 11:58   ` Alexandr Miloslavskiy
2019-12-13  4:00     ` Emily Shaffer
2019-12-06 11:22   ` Johannes Schindelin
2019-12-06 11:34   ` SZEDER Gábor
2019-12-13  4:12   ` [PATCH v3] " Emily Shaffer
2019-12-13 13:04     ` Alexandr Miloslavskiy
2019-12-13 18:26     ` Junio C Hamano
2019-12-13 20:13       ` Alexandr Miloslavskiy
2019-12-17  0:33         ` Emily Shaffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191122011646.218346-1-emilyshaffer@google.com \
    --to=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).