git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
	Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>,
	Denton Liu <liu.denton@gmail.com>
Subject: Re: [PATCH v2] grep: support the --pathspec-from-file option
Date: Thu, 12 Dec 2019 19:07:10 -0800	[thread overview]
Message-ID: <20191213030710.GA135450@google.com> (raw)
In-Reply-To: <xmqq5ziv3f20.fsf@gitster-ct.c.googlers.com>

On Wed, Dec 04, 2019 at 02:24:07PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the
> > documentation accordingly.
> >
> > This changes enables 'git grep' to receive the pathspec from a file by
> > specifying the path, or from stdin by specifying '-' as the path. This
> > matches the previous functionality of '-f', so the documentation of '-f'
> > has been expanded to describe this functionality. To let '-f' match the
> > new '--pathspec-from-file' option, also teach a '--patterns-from-file'
> > long name to '-f'.
> >
> > Since there are now two arguments which can attempt to read from stdin,
> > add a safeguard to check whether the user specified '-' for both of
> > them. It is still possible for a user to pass '/dev/stdin' to one or
> > both arguments at present; we do not explicitly check.
> 
> OK.  I guess this is good enough at least for now and possibly
> forever as that falls into the "doctor, it hurts when I do this"
> category.
> 
> > Refactored to use am/pathspec-from-file. This changes the implementation
> > significantly since v1, but the testcases mostly remain the same.
> 
> I am hoping that this topic, and Alexandr's "add" and "checkout"
> stuff, would help establishing a good pattern for other commands to
> follow.
> 
> > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> > index c89fb569e3..56b1c5a302 100644
> > --- a/Documentation/git-grep.txt
> > +++ b/Documentation/git-grep.txt
> > @@ -24,7 +24,8 @@ SYNOPSIS
> >  	   [-A <post-context>] [-B <pre-context>] [-C <context>]
> >  	   [-W | --function-context]
> >  	   [--threads <num>]
> > -	   [-f <file>] [-e] <pattern>
> > +	   [-f | --patterns-from-file <file>] [-e] <pattern>
> 
> OK.
> 
> > +	   [--pathspec-from-file=<file> [--pathspec-file-nul]]
> 
> OK.
> 
> > @@ -270,7 +271,10 @@ providing this option will cause it to die.
> >  	See `grep.threads` in 'CONFIGURATION' for more information.
> >  
> >  -f <file>::
> > -	Read patterns from <file>, one per line.
> > +--patterns-from-file <file>::
> > +	Read patterns from <file>, one per line. If `<file>` is exactly `-` then
> > +	standard input is used; standard input cannot be used for both
> > +	--patterns-from-file and --pathspec-from-file.
> 
> Makes sense---otherwise they would compete and we cannot know which
> kind each line in the standard input is ;-)

At one point Jonathan Nieder had suggested to me that it might be nice
to watch for "--" in the standard input. But I think I wasn't confident
in the idea that all commands which take arbitrary number of arguments,
and pathspec expect their arg list to end in <arg...> -- <pathspec>.

> 
> > @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for
> >  more search backends, until then we'll die when the pattern type in
> >  question doesn't support them.
> >  
> > +--pathspec-from-file <file>::
> > +	Read pathspec from <file> instead of the command line. If `<file>` is
> > +	exactly `-` then standard input is used; standard input cannot be used
> > +	for both --patterns-from-file and --pathspec-from-file. Pathspec elements
> > +	are separated by LF or CR/LF. Pathspec elements can be quoted as
> > +	explained for the configuration variable `core.quotePath` (see
> > +	linkgit:git-config[1]). See also `--pathspec-file-nul` and global
> > +	`--literal-pathspecs`.
> > +
> > +--pathspec-file-nul::
> > +	Only meaningful with `--pathspec-from-file`. Pathspec elements are
> > +	separated with NUL character and all other characters are taken
> > +	literally (including newlines and quotes).
> > +
> 
> I think these matches the ones in git-reset.txt from the earlier
> work by Alexandr's, which is good.

Yes, I took them verbatim.

> 
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 50ce8d9461..54ba991c42 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -31,6 +31,7 @@ static char const * const grep_usage[] = {
> >  };
> >  
> >  static int recurse_submodules;
> > +static int patterns_from_stdin, pathspec_from_stdin;
> >  
> >  #define GREP_NUM_THREADS_DEFAULT 8
> >  static int num_threads;
> > @@ -723,15 +724,18 @@ static int context_callback(const struct option *opt, const char *arg,
> >  static int file_callback(const struct option *opt, const char *arg, int unset)
> >  {
> 
> Shouldn't this be renamed?  Earlier, the only thing that can be
> taken from a file was the patterns, but now a file can be given
> to specify a set of pathspec elements.  "patterns_file_callback()"
> or something like taht?

Done. I was hesitant to touch the other code; I guess it is fine.

> 
> >  	struct grep_opt *grep_opt = opt->value;
> > -	int from_stdin;
> >  	FILE *patterns;
> >  	int lno = 0;
> >  	struct strbuf sb = STRBUF_INIT;
> >  
> >  	BUG_ON_OPT_NEG(unset);
> >  
> > -	from_stdin = !strcmp(arg, "-");
> > -	patterns = from_stdin ? stdin : fopen(arg, "r");
> > +	patterns_from_stdin = !strcmp(arg, "-");
> 
> Totally outside the scope of this patch, but we may want to
> introduce
> 
> 	int is_stdin_cmdline_marker(const char *arg)
> 	{
> 		return !strcmp(arg, "-");
> 	}
> 
> which later can be taught also about "/dev/stdin".  Even outside the
> pathspec-from-file option, I think "diff --no-index" also has some
> special-case provision for treating "-" as "input coming from the
> standard input string", which would benefit from such a helper.

Agreed; that's probably a handy thing to put into parse-options.h. I
suggest we think of this next time we are looking for Outreachy
microprojects or similar. I'll write it down somewhere.

> 
> > +	if (patterns_from_stdin && pathspec_from_stdin)
> > +		die(_("cannot specify both patterns and pathspec via stdin"));
> 
> It's kind of sad that we need to check this in this callback and
> also inside cmd_grep() top-level we will see further down...

Another reviewer suggested not using a callback and instead waiting
until after the entire argparse to do file reads; this would let us
avoid the weird double check here. I think I will do it.

> 
> > +	if (pathspec_from_file) {
> > +		if (pathspec.nr)
> > +			die(_("--pathspec-from-file is incompatible with pathspec arguments"));
> > +
> > +		pathspec_from_stdin = !strcmp(pathspec_from_file, "-");
> > +
> > +		if (patterns_from_stdin && pathspec_from_stdin)
> > +			die(_("cannot specify both patterns and pathspec via stdin"));
> 
> ... here.
> 
> Thanks.

  reply	other threads:[~2019-12-13  3:07 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22  1:16 [PATCH] grep: provide pathspecs/patterns via file or stdin Emily Shaffer
2019-11-22  2:14 ` Denton Liu
2019-11-22  2:34   ` Junio C Hamano
2019-11-22  3:56     ` Junio C Hamano
2019-11-22 18:52     ` Denton Liu
2019-11-22 22:02     ` Emily Shaffer
2019-11-22 22:06       ` Emily Shaffer
2019-11-23  0:28       ` Junio C Hamano
2019-11-22  2:24 ` Junio C Hamano
2019-12-04 20:39 ` [PATCH v2] grep: support the --pathspec-from-file option Emily Shaffer
2019-12-04 21:05   ` Denton Liu
2019-12-04 21:24     ` Junio C Hamano
2019-12-04 22:24   ` Junio C Hamano
2019-12-13  3:07     ` Emily Shaffer [this message]
2019-12-05 11:58   ` Alexandr Miloslavskiy
2019-12-13  4:00     ` Emily Shaffer
2019-12-06 11:22   ` Johannes Schindelin
2019-12-06 11:34   ` SZEDER Gábor
2019-12-13  4:12   ` [PATCH v3] " Emily Shaffer
2019-12-13 13:04     ` Alexandr Miloslavskiy
2019-12-13 18:26     ` Junio C Hamano
2019-12-13 20:13       ` Alexandr Miloslavskiy
2019-12-17  0:33         ` Emily Shaffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191213030710.GA135450@google.com \
    --to=emilyshaffer@google.com \
    --cc=alexandr.miloslavskiy@syntevo.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=liu.denton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).