git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] grep: add --max-count command line option
@ 2022-06-20 15:49 Carlos L. via GitGitGadget
  2022-06-20 15:57 ` Paul Eggert
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
  0 siblings, 2 replies; 14+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-20 15:49 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v1
Pull-Request: https://github.com/git/git/pull/1278

 Documentation/git-grep.txt | 8 ++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 ++
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..19b817d5e58 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,13 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default).
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..4ab28995da0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		exit(EXIT_FAILURE);
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..a010f9f4132 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count != (unsigned)-1 && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..218585a8679 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	unsigned max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = (unsigned)-1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 15:49 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
@ 2022-06-20 15:57 ` Paul Eggert
  2022-06-20 16:25   ` Carlos L.
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
  1 sibling, 1 reply; 14+ messages in thread
From: Paul Eggert @ 2022-06-20 15:57 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: Martin Ågren [ ], Carlos L., git

On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
> +	unsigned max_count;

Why not make this intmax_t? That way, you don't have to worry about 
casting -1 to unsigned. Also on typical 64-bit machines you no longer 
have to worry about mishandling counts greater than 2**32 (the limit 
becomes 2**63 - 1 which is plenty).

These days it's typically better to avoid unsigned types in C when you 
can, as standard tools like 'gcc -fsanitize=undefined' can catch signed 
int overflow whereas unsigned int overflow always wraps around which is 
typically bad news.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 15:57 ` Paul Eggert
@ 2022-06-20 16:25   ` Carlos L.
  2022-06-20 16:32     ` Paul Eggert
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos L. @ 2022-06-20 16:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Carlos L. via GitGitGadget, Martin Ågren [ ], git

Hi,

On Monday, June 20th, 2022 at 17:57, Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
>
> > + unsigned max_count;
>
>
> Why not make this intmax_t? That way, you don't have to worry about
> casting -1 to unsigned. Also on typical 64-bit machines you no longer
> have to worry about mishandling counts greater than 232 (the limit
> becomes 263 - 1 which is plenty).

This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized:

parse-options.c:
 219             *(int *)opt->value = strtol(arg, (char **)&s, 10);

I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 16:25   ` Carlos L.
@ 2022-06-20 16:32     ` Paul Eggert
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Eggert @ 2022-06-20 16:32 UTC (permalink / raw)
  To: Carlos L.; +Cc: Carlos L. via GitGitGadget, Martin Ågren [ ], git

On 6/20/22 11:25, Carlos L. wrote:
> This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized:
>
> parse-options.c:
>   219             *(int *)opt->value = strtol(arg, (char **)&s, 10);

OK, so parse-options messes up if the user specifies a count that does 
not fit in 'int'? Although that's a separate bug, let's not make things 
worse here; let's make the new count an 'int'.

In the long run parse-options should be changed to use strtoimax instead 
of strtol, and the corresponding integers should be changed to intmax_t, 
and the proper thing should be done if the string value does not fit 
into intmax_t. But this longer-run fix affects all integer-valued 
options, not just this one.


> I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.

Such comparisons cannot misfire if both values are nonnegative, and that 
can easily be arranged here.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] grep: add --max-count command line option
  2022-06-20 15:49 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
  2022-06-20 15:57 ` Paul Eggert
@ 2022-06-21  5:36 ` Carlos L. via GitGitGadget
  2022-06-21 16:27   ` Junio C Hamano
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-21  5:36 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v2
Pull-Request: https://github.com/git/git/pull/1278

Range-diff vs v1:

 1:  f89c6e244aa ! 1:  ee7eb298854 grep: add --max-count command line option
     @@ grep.c: static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, i
       		bol = eol + 1;
       		if (!left)
       			break;
     -+		if (opt->max_count != (unsigned)-1 && count == opt->max_count)
     ++		if (opt->max_count != -1 && count == opt->max_count)
      +			break;
       		left--;
       		lno++;
     @@ grep.h: struct grep_opt {
       	int show_hunk_mark;
       	int file_break;
       	int heading;
     -+	unsigned max_count;
     ++	int max_count;
       	void *priv;
       
       	void (*output)(struct grep_opt *opt, const void *data, size_t size);
     @@ grep.h: struct grep_opt {
       	.relative = 1, \
       	.pathname = 1, \
       	.max_depth = -1, \
     -+	.max_count = (unsigned)-1, \
     ++	.max_count = -1, \
       	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
       	.colors = { \
       		[GREP_COLOR_CONTEXT] = "", \


 Documentation/git-grep.txt | 8 ++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 ++
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..19b817d5e58 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,13 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default).
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..4ab28995da0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		exit(EXIT_FAILURE);
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..b32ab75cb6b 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count != -1 && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..bdcadce61b8 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = -1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
@ 2022-06-21 16:27   ` Junio C Hamano
  2022-06-22  6:41     ` Carlos L.
       [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
  2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2022-06-21 16:27 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget
  Cc: git, Martin Ågren [ ], Paul Eggert [ ], Carlos L.

"Carlos L. via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>
>
> This patch adds a command line option analogous to that of GNU
> grep(1)'s -m / --max-count, which users might already be used to.
> This makes it possible to limit the amount of matches shown in the
> output while keeping the functionality of other options such as -C
> (show code context) or -p (show containing function), which would be
> difficult to do with a shell pipeline (e.g. head(1)).
>
> Signed-off-by: Carlos López 00xc@protonmail.com
> ---
> ...
>  Documentation/git-grep.txt | 8 ++++++++
>  builtin/grep.c             | 9 +++++++++
>  grep.c                     | 2 ++
>  grep.h                     | 2 ++
>  4 files changed, 21 insertions(+)

Tests?

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 3d393fbac1b..19b817d5e58 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -23,6 +23,7 @@ SYNOPSIS
>  	   [--break] [--heading] [-p | --show-function]
>  	   [-A <post-context>] [-B <pre-context>] [-C <context>]
>  	   [-W | --function-context]
> +	   [(-m | --max-count) <num>]
>  	   [--threads <num>]
>  	   [-f <file>] [-e] <pattern>
>  	   [--and|--or|--not|(|)|-e <pattern>...]
> @@ -238,6 +239,13 @@ providing this option will cause it to die.
>  	`git diff` works out patch hunk headers (see 'Defining a
>  	custom hunk-header' in linkgit:gitattributes[5]).
>  
> +-m <num>::
> +--max-count <num>::
> +	Limit the amount of matches per file. When using the `-v` or
> +	`--invert-match` option, the search stops after the specified
> +	number of non-matches. A value of -1 will return unlimited
> +	results (the default).

Hmph ...

> +	/*
> +	 * Optimize out the case where the amount of matches is limited to zero.
> +	 * We do this to keep results consistent with GNU grep(1).
> +	 */
> +	if (opt.max_count == 0)
> +		exit(EXIT_FAILURE);
> +

OK, so "stop before seeing any match" logically leads to "we found
nothing, so exit with non-zero".

> diff --git a/grep.c b/grep.c
> index 82eb7da1022..b32ab75cb6b 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
>  		bol = eol + 1;
>  		if (!left)
>  			break;
> +		if (opt->max_count != -1 && count == opt->max_count)
> +			break;

I would have written it "if (0 <= opt->max_count && ...)".  What
happens when a trickster asks you to do "git grep -m -2"?

I guess what I am getting at is if we are better off saying that
negative means unlimited, instead of special casing -1 like this.  I
didn't think it through so it may be perfectly possible that what
you wrote makes more sense than "anything negative is unlimited".

I dunno.

>  		left--;
>  		lno++;
>  	}

Thanks.

> diff --git a/grep.h b/grep.h
> index c722d25ed9d..bdcadce61b8 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -171,6 +171,7 @@ struct grep_opt {
>  	int show_hunk_mark;
>  	int file_break;
>  	int heading;
> +	int max_count;
>  	void *priv;
>  
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
> @@ -181,6 +182,7 @@ struct grep_opt {
>  	.relative = 1, \
>  	.pathname = 1, \
>  	.max_depth = -1, \
> +	.max_count = -1, \
>  	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
>  	.colors = { \
>  		[GREP_COLOR_CONTEXT] = "", \
>
> base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
  2022-06-21 16:27   ` Junio C Hamano
@ 2022-06-22  6:41     ` Carlos L.
  2022-06-22  6:56       ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos L. @ 2022-06-22  6:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Carlos L. via GitGitGadget, git, Martin Ågren [ ], Paul Eggert [ ]

Hi,

Just a couple of questions.

On Tuesday, June 21st, 2022 at 18:27, Junio C Hamano <gitster@pobox.com> wrote:

> "Carlos L. via GitGitGadget" gitgitgadget@gmail.com writes:
>
> > From: =?UTF-8?q?Carlos=20L=C3=B3pez?= 00xc@protonmail.com
> >
> > This patch adds a command line option analogous to that of GNU
> > grep(1)'s -m / --max-count, which users might already be used to.
> > This makes it possible to limit the amount of matches shown in the
> > output while keeping the functionality of other options such as -C
> > (show code context) or -p (show containing function), which would be
> > difficult to do with a shell pipeline (e.g. head(1)).
> >
> > Signed-off-by: Carlos López 00xc@protonmail.com
> > ---
> > ...
> > Documentation/git-grep.txt | 8 ++++++++
> > builtin/grep.c | 9 +++++++++
> > grep.c | 2 ++
> > grep.h | 2 ++
> > 4 files changed, 21 insertions(+)
>
>
> Tests?

Right. Is it OK if I include my test(s) in t/t7810-grep.sh, or should it be a different/new file?

> > diff --git a/grep.c b/grep.c
> > index 82eb7da1022..b32ab75cb6b 100644
> > --- a/grep.c
> > +++ b/grep.c
> > @@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
> > bol = eol + 1;
> > if (!left)
> > break;
> > + if (opt->max_count != -1 && count == opt->max_count)
> > + break;
>
>
> I would have written it "if (0 <= opt->max_count && ...)". What
>
> happens when a trickster asks you to do "git grep -m -2"?

Fair enough. Since it's already optimized out above, is there any reason we need to include zero (<=)?

> I guess what I am getting at is if we are better off saying that
> negative means unlimited, instead of special casing -1 like this. I
> didn't think it through so it may be perfectly possible that what
> you wrote makes more sense than "anything negative is unlimited".
>
> I dunno.

I think you're right, I'll adjust my patch.

Best,
Carlos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
  2022-06-22  6:41     ` Carlos L.
@ 2022-06-22  6:56       ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2022-06-22  6:56 UTC (permalink / raw)
  To: Carlos L.
  Cc: Carlos L. via GitGitGadget, git, Martin Ågren [ ], Paul Eggert [ ]

"Carlos L." <00xc@protonmail.com> writes:

>> Tests?
>
> Right. Is it OK if I include my test(s) in t/t7810-grep.sh, or
> should it be a different/new file?

It is preferrable to add new tests to existing scripts, rather than
adding a new (and short) one.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
       [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
@ 2022-06-22 13:23     ` Carlos L.
  0 siblings, 0 replies; 14+ messages in thread
From: Carlos L. @ 2022-06-22 13:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Carlos L. via GitGitGadget, git, Martin Ågren [ ], Paul Eggert [ ]

Hi,

On Wednesday, June 22nd, 2022 at 12:22, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> On Tue, Jun 21 2022, Carlos L. via GitGitGadget wrote:
>
> > From: =?UTF-8?q?Carlos=20L=C3=B3pez?= 00xc@protonmail.com
> >
> > This patch adds a command line option analogous to that of GNU
> > grep(1)'s -m / --max-count, which users might already be used to.
>
>
> Thanks, this seems useful.
>
> > This makes it possible to limit the amount of matches shown in the
> > output while keeping the functionality of other options such as -C
> > (show code context) or -p (show containing function), which would be
> > difficult to do with a shell pipeline (e.g. head(1)).
>
>
> We start multi-threaded grep workers, how does this code handle races
> between them finding things, this count being incremented, and the "do
> we have sufficient results?" check?
>
> Is it guarded by the relevant mutexes?

AFAICT only a single thread runs on each file via grep_source_1(), and we check `count`, which is local to this function.

> > + /*
> > + * Optimize out the case where the amount of matches is limited to zero.
> > + * We do this to keep results consistent with GNU grep(1).
> > + */
> > + if (opt.max_count == 0)
> > + exit(EXIT_FAILURE);
>
>
> Don't use exit() in cmd_grep(), you should use "return 1".

I'll use return in my follow-up patch, this can be improved afterwards.

> But even better use usage_msg_opt() here, i.e. inform the user why this
> was bad.
>
> Or hrm, it seems GNU grep silently returns 1 here, perhaps --max-count=0
> is a feature for some?
>
> If this is intentional it's worth documenting and testing it explicitly.

I will add a sentence about this in Documentation/git-grep.txt.

> Re the comments from others about size_t or whatever, it might be better
> here to use OPT_CALLBACK and an unsigned type.
>
> Then just have a "int have_max_count:1", which IMO is more obvious than
> using integer wrap-around to test "didn't provide this flag".

FWIW, I think it's fine to use int and a negative value as an special encoding, max_depth does the same thing. These are per-file matches, so they should not go over 2 billion in reasonable use cases.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 0/2] grep: add --max-count command line option
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
  2022-06-21 16:27   ` Junio C Hamano
       [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
@ 2022-06-22 17:07   ` Carlos L. via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
                       ` (2 more replies)
  2 siblings, 3 replies; 14+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-22 17:07 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.

This patch adds a command line option analogous to that of GNU grep(1)'s -m
/ --max-count, which users might already be used to. This makes it possible
to limit the amount of matches shown in the output while keeping the
functionality of other options such as -C (show code context) or -p (show
containing function), which would be difficult to do with a shell pipeline
(e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com

Carlos López (2):
  grep: add --max-count command line option
  tests: add tests for grep --max-count

 Documentation/git-grep.txt |  9 +++++
 builtin/grep.c             |  9 +++++
 grep.c                     |  2 +-
 grep.h                     |  2 +
 t/t7810-grep.sh            | 83 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 104 insertions(+), 1 deletion(-)


base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v3
Pull-Request: https://github.com/git/git/pull/1278

Range-diff vs v2:

 1:  ee7eb298854 ! 1:  5bf7244437e grep: add --max-count command line option
     @@ Documentation/git-grep.txt: providing this option will cause it to die.
      +	Limit the amount of matches per file. When using the `-v` or
      +	`--invert-match` option, the search stops after the specified
      +	number of non-matches. A value of -1 will return unlimited
     -+	results (the default).
     ++	results (the default). A value of 0 will exit immediately with
     ++	a non-zero status.
      +
       --threads <num>::
       	Number of grep worker threads to use.
     @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix)
      +	 * We do this to keep results consistent with GNU grep(1).
      +	 */
      +	if (opt.max_count == 0)
     -+		exit(EXIT_FAILURE);
     ++		return 1;
      +
       	if (show_in_pager) {
       		if (num_threads > 1)
     @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix)
      
       ## grep.c ##
      @@ grep.c: static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
     - 		bol = eol + 1;
     - 		if (!left)
     - 			break;
     -+		if (opt->max_count != -1 && count == opt->max_count)
     -+			break;
     - 		left--;
     - 		lno++;
     - 	}
     + 				return 0;
     + 			goto next_line;
     + 		}
     +-		if (hit) {
     ++		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
     + 			count++;
     + 			if (opt->status_only)
     + 				return 1;
      
       ## grep.h ##
      @@ grep.h: struct grep_opt {
 -:  ----------- > 2:  525958af877 tests: add tests for grep --max-count

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/2] grep: add --max-count command line option
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
@ 2022-06-22 17:07     ` Carlos López via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
  2022-06-22 19:47     ` [PATCH v4] grep: add --max-count command line option Carlos L. via GitGitGadget
  2 siblings, 0 replies; 14+ messages in thread
From: Carlos López via GitGitGadget @ 2022-06-22 17:07 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
 Documentation/git-grep.txt | 9 +++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 +-
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..58d944bd578 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,14 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default). A value of 0 will exit immediately with
+	a non-zero status.
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..e6bcdf860cc 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		return 1;
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..52a894c9890 100644
--- a/grep.c
+++ b/grep.c
@@ -1615,7 +1615,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				return 0;
 			goto next_line;
 		}
-		if (hit) {
+		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
 			count++;
 			if (opt->status_only)
 				return 1;
diff --git a/grep.h b/grep.h
index c722d25ed9d..bdcadce61b8 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = -1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/2] tests: add tests for grep --max-count
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
@ 2022-06-22 17:07     ` Carlos López via GitGitGadget
  2022-06-22 18:10       ` Junio C Hamano
  2022-06-22 19:47     ` [PATCH v4] grep: add --max-count command line option Carlos L. via GitGitGadget
  2 siblings, 1 reply; 14+ messages in thread
From: Carlos López via GitGitGadget @ 2022-06-22 17:07 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

Add tests for grep's -m / --max-count to check if the option correctly
outputs limited results, and that it interacts properly with other flags
that could likely be used in conjunction.

Signed-off-by: Carlos López 00xc@protonmail.com
---
 t/t7810-grep.sh | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 69356011713..7b1b8a3cd93 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -77,6 +77,7 @@ test_expect_success setup '
 	# Say hello.
 	function hello() {
 	  echo "Hello world."
+	  echo "Hello again."
 	} # hello
 
 	# Still a no-op.
@@ -595,6 +596,88 @@ test_expect_success 'grep --files-without-match --quiet' '
 	test_must_be_empty actual
 '
 
+cat >expected <<EOF &&
+EOF
+
+test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
+	test_must_fail git grep --max-count 0 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file:foo mmap bar
+EOF
+
+test_expect_success 'grep --max-count 1' '
+	git grep --max-count 1 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file:foo mmap bar
+file:foo_mmap bar
+file:foo_mmap bar mmap
+EOF
+
+test_expect_success 'grep --max-count 3' '
+	git grep --max-count 3 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file:foo mmap bar
+file:foo_mmap bar
+file:foo_mmap bar mmap
+file:foo mmap bar_mmap
+file:foo_mmap bar mmap baz
+EOF
+
+test_expect_success 'grep --max-count -1 (no limit)' '
+	git grep --max-count -1 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file-foo mmap bar
+file:foo_mmap bar
+file-foo_mmap bar mmap
+EOF
+
+test_expect_success 'grep --max-count 1 --context 2' '
+	git grep --max-count 1 --context 1 foo_mmap >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+hello.ps1=function hello() {
+hello.ps1:  echo "Hello world."
+EOF
+
+test_expect_success 'grep --max-count 1 --show-function' '
+	git grep --max-count 1 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+hello.ps1=function hello() {
+hello.ps1:  echo "Hello world."
+hello.ps1:  echo "Hello again."
+EOF
+
+test_expect_success 'grep --max-count 2 --show-function' '
+	git grep --max-count 2 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+hello.ps1:1
+EOF
+
+test_expect_success 'grep --max-count 1 -c' '
+	git grep --max-count 1 --count Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<EOF
 file:foo mmap bar_mmap
 EOF
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] tests: add tests for grep --max-count
  2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
@ 2022-06-22 18:10       ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2022-06-22 18:10 UTC (permalink / raw)
  To: Carlos López via GitGitGadget
  Cc: git, Martin Ågren [ ], Paul Eggert [ ], Carlos L.

"Carlos López via GitGitGadget"  <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>
>
> Add tests for grep's -m / --max-count to check if the option correctly
> outputs limited results, and that it interacts properly with other flags
> that could likely be used in conjunction.
>
> Signed-off-by: Carlos López 00xc@protonmail.com
> ---
>  t/t7810-grep.sh | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 83 insertions(+)

This is better done as part of the previous patch.  The new tests
protect the new code from future breakage.

> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index 69356011713..7b1b8a3cd93 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -77,6 +77,7 @@ test_expect_success setup '
>  	# Say hello.
>  	function hello() {
>  	  echo "Hello world."
> +	  echo "Hello again."
>  	} # hello
>  
>  	# Still a no-op.
> @@ -595,6 +596,88 @@ test_expect_success 'grep --files-without-match --quiet' '
>  	test_must_be_empty actual
>  '
>  
> +cat >expected <<EOF &&
> +EOF
> +
> +test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
> +	test_must_fail git grep --max-count 0 foo >actual &&
> +	test_cmp expected actual
> +'

For this particular one, "test_must_be_empty actual" would suffice,
without comparing with the expected output.

> +cat >expected <<EOF &&
> +file:foo mmap bar
> +EOF
> +
> +test_expect_success 'grep --max-count 1' '
> +	git grep --max-count 1 foo >actual &&
> +	test_cmp expected actual
> +'

Writing expected output outside test_expect_success that uses it is
a quite old style but that is because this test script is pretty
much ancient, so mimicking it is OK.  We'd need to come back later
when the tree is quiescent to clean them up, though (#leftoverbits).

> ...
> +	test_cmp expected actual
> +'

The new tests seem to give us a reasonable test coverage.  We could
discard one of the "-m1" vs "-m3" in the early ones, as they do not
give much extra test coverage over the other, to reduce repetition.

We do not test a case where we pick up-to N matches each from
multiple files, though.  Perhaps

    git grep -m1 -e o -- hello.\*

may stop after hitting "No-op." in hello.ps1 and "stdio" in hello.c,
which may make a good test, perhaps?

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4] grep: add --max-count command line option
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
@ 2022-06-22 19:47     ` Carlos L. via GitGitGadget
  2 siblings, 0 replies; 14+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-22 19:47 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v4
Pull-Request: https://github.com/git/git/pull/1278

Range-diff vs v3:

 1:  5bf7244437e ! 1:  89c0151c164 grep: add --max-count command line option
     @@ grep.h: struct grep_opt {
       	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
       	.colors = { \
       		[GREP_COLOR_CONTEXT] = "", \
     +
     + ## t/t7810-grep.sh ##
     +@@ t/t7810-grep.sh: test_expect_success setup '
     + 	# Say hello.
     + 	function hello() {
     + 	  echo "Hello world."
     ++	  echo "Hello again."
     + 	} # hello
     + 
     + 	# Still a no-op.
     +@@ t/t7810-grep.sh: test_expect_success 'grep --files-without-match --quiet' '
     + 	test_must_be_empty actual
     + '
     + 
     ++test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
     ++	test_must_fail git grep --max-count 0 foo >actual &&
     ++	test_must_be_empty actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 3' '
     ++	cat >expected <<-EOF &&
     ++	file:foo mmap bar
     ++	file:foo_mmap bar
     ++	file:foo_mmap bar mmap
     ++	EOF
     ++	git grep --max-count 3 foo >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count -1 (no limit)' '
     ++	cat >expected <<-EOF &&
     ++	file:foo mmap bar
     ++	file:foo_mmap bar
     ++	file:foo_mmap bar mmap
     ++	file:foo mmap bar_mmap
     ++	file:foo_mmap bar mmap baz
     ++	EOF
     ++	git grep --max-count -1 foo >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --context 2' '
     ++	cat >expected <<-EOF &&
     ++	file-foo mmap bar
     ++	file:foo_mmap bar
     ++	file-foo_mmap bar mmap
     ++	EOF
     ++	git grep --max-count 1 --context 1 foo_mmap >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --show-function' '
     ++	cat >expected <<-EOF &&
     ++	hello.ps1=function hello() {
     ++	hello.ps1:  echo "Hello world."
     ++	EOF
     ++	git grep --max-count 1 --show-function Hello hello.ps1 >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 2 --show-function' '
     ++	cat >expected <<-EOF &&
     ++	hello.ps1=function hello() {
     ++	hello.ps1:  echo "Hello world."
     ++	hello.ps1:  echo "Hello again."
     ++	EOF
     ++	git grep --max-count 2 --show-function Hello hello.ps1 >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --count' '
     ++	cat >expected <<-EOF &&
     ++	hello.ps1:1
     ++	EOF
     ++	git grep --max-count 1 --count Hello hello.ps1 >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 (multiple files)' '
     ++	cat >expected <<-EOF &&
     ++	hello.c:#include <stdio.h>
     ++	hello.ps1:# No-op.
     ++	EOF
     ++	git grep --max-count 1 -e o -- hello.\* >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --context 1 (multiple files)' '
     ++	cat >expected <<-EOF &&
     ++	hello.c-#include <assert.h>
     ++	hello.c:#include <stdio.h>
     ++	hello.c-
     ++	--
     ++	hello.ps1:# No-op.
     ++	hello.ps1-function dummy() {}
     ++	EOF
     ++	git grep --max-count 1 --context 1 -e o -- hello.\* >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     + cat >expected <<EOF
     + file:foo mmap bar_mmap
     + EOF
 2:  525958af877 < -:  ----------- tests: add tests for grep --max-count


 Documentation/git-grep.txt |  9 ++++
 builtin/grep.c             |  9 ++++
 grep.c                     |  2 +-
 grep.h                     |  2 +
 t/t7810-grep.sh            | 87 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..58d944bd578 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,14 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default). A value of 0 will exit immediately with
+	a non-zero status.
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..e6bcdf860cc 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		return 1;
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..52a894c9890 100644
--- a/grep.c
+++ b/grep.c
@@ -1615,7 +1615,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				return 0;
 			goto next_line;
 		}
-		if (hit) {
+		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
 			count++;
 			if (opt->status_only)
 				return 1;
diff --git a/grep.h b/grep.h
index c722d25ed9d..bdcadce61b8 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = -1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 69356011713..0f937990a06 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -77,6 +77,7 @@ test_expect_success setup '
 	# Say hello.
 	function hello() {
 	  echo "Hello world."
+	  echo "Hello again."
 	} # hello
 
 	# Still a no-op.
@@ -595,6 +596,92 @@ test_expect_success 'grep --files-without-match --quiet' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
+	test_must_fail git grep --max-count 0 foo >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'grep --max-count 3' '
+	cat >expected <<-EOF &&
+	file:foo mmap bar
+	file:foo_mmap bar
+	file:foo_mmap bar mmap
+	EOF
+	git grep --max-count 3 foo >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count -1 (no limit)' '
+	cat >expected <<-EOF &&
+	file:foo mmap bar
+	file:foo_mmap bar
+	file:foo_mmap bar mmap
+	file:foo mmap bar_mmap
+	file:foo_mmap bar mmap baz
+	EOF
+	git grep --max-count -1 foo >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --context 2' '
+	cat >expected <<-EOF &&
+	file-foo mmap bar
+	file:foo_mmap bar
+	file-foo_mmap bar mmap
+	EOF
+	git grep --max-count 1 --context 1 foo_mmap >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --show-function' '
+	cat >expected <<-EOF &&
+	hello.ps1=function hello() {
+	hello.ps1:  echo "Hello world."
+	EOF
+	git grep --max-count 1 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 2 --show-function' '
+	cat >expected <<-EOF &&
+	hello.ps1=function hello() {
+	hello.ps1:  echo "Hello world."
+	hello.ps1:  echo "Hello again."
+	EOF
+	git grep --max-count 2 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --count' '
+	cat >expected <<-EOF &&
+	hello.ps1:1
+	EOF
+	git grep --max-count 1 --count Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 (multiple files)' '
+	cat >expected <<-EOF &&
+	hello.c:#include <stdio.h>
+	hello.ps1:# No-op.
+	EOF
+	git grep --max-count 1 -e o -- hello.\* >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --context 1 (multiple files)' '
+	cat >expected <<-EOF &&
+	hello.c-#include <assert.h>
+	hello.c:#include <stdio.h>
+	hello.c-
+	--
+	hello.ps1:# No-op.
+	hello.ps1-function dummy() {}
+	EOF
+	git grep --max-count 1 --context 1 -e o -- hello.\* >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<EOF
 file:foo mmap bar_mmap
 EOF

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-22 19:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 15:49 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
2022-06-20 15:57 ` Paul Eggert
2022-06-20 16:25   ` Carlos L.
2022-06-20 16:32     ` Paul Eggert
2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
2022-06-21 16:27   ` Junio C Hamano
2022-06-22  6:41     ` Carlos L.
2022-06-22  6:56       ` Junio C Hamano
     [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
2022-06-22 13:23     ` Carlos L.
2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
2022-06-22 18:10       ` Junio C Hamano
2022-06-22 19:47     ` [PATCH v4] grep: add --max-count command line option Carlos L. via GitGitGadget

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).