git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] grep: add --max-count command line option
@ 2022-05-12 13:20 Carlos L. via GitGitGadget
  2022-05-14 18:16 ` Martin Ågren
  2022-05-16  5:57 ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-05-12 13:20 UTC (permalink / raw)
  To: git; +Cc: Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López <00xc@protonmail.com>
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1264%2F00xc%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1264/00xc/master-v1
Pull-Request: https://github.com/git/git/pull/1264

 Documentation/git-grep.txt | 7 +++++++
 builtin/grep.c             | 2 ++
 grep.c                     | 2 ++
 grep.h                     | 1 +
 4 files changed, 12 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..02b36046475 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [-m | --max-count <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,12 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the -v or
+	--invert-match option, the search stops after the specified
+	number of non-matches. Setting this option to 0 has no effect.
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..ba1894d5675 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file (default: 0, no limit)")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
diff --git a/grep.c b/grep.c
index 82eb7da1022..173b6c27b6e 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..25836f34314 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	unsigned max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);

base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-12 13:20 [PATCH] " Carlos L. via GitGitGadget
@ 2022-05-14 18:16 ` Martin Ågren
  2022-05-16  5:57 ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2022-05-14 18:16 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: Git Mailing List, Carlos López

Hi Carlos,

Welcome to the mailing list. :-)

On Thu, 12 May 2022 at 21:13, Carlos L. via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>
>
> This patch adds a command line option analogous to that of GNU
> grep(1)'s -m / --max-count, which users might already be used to.
> This makes it possible to limit the amount of matches shown in the
> output while keeping the functionality of other options such as -C
> (show code context) or -p (show containing function), which would be
> difficult to do with a shell pipeline (e.g. head(1)).

Makes sense to me.

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 3d393fbac1b..02b36046475 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -23,6 +23,7 @@ SYNOPSIS
>            [--break] [--heading] [-p | --show-function]
>            [-A <post-context>] [-B <pre-context>] [-C <context>]
>            [-W | --function-context]
> +          [-m | --max-count <num>]

I think this should be

    [(-m | --max-count) <num>]

since the short form "-m" also wants to take "<num>".

> +-m <num>::
> +--max-count <num>::
> +       Limit the amount of matches per file. When using the -v or
> +       --invert-match option, the search stops after the specified
> +       number of non-matches. Setting this option to 0 has no effect.

Please use `backticks` with `-v` and `--invert-match` so that they are
set in monospace.

Regarding the special value 0, it's a bit unclear what "has no effect"
means. In particular, it can have an effect in the sense that when it
is used like

  git grep -m 1 -m 0 foo

it undoes the `-m 1`.

But also, that's not how my grep(1) behaves: with `-m 0`, it limits the
number of matches to zero. I don't know how useful that is (can that
zero-case be optimized by exiting with 1 before even trying to find the
needle!?), or if maybe different variants of grep handle this
differently?  If all grep implementations handle 0 by actually only
emitting zero hits, I think it would be wise for us to handle 0 the same
way.

As for overriding an earlier `-m <foo>`, which could be useful, it seems
to me like `--no-max-count` would make sense.

All in all, I would suggest the following documentation:

    -m <num>::
    --max-count <num>::
           Limit the amount of matches per file. When using the `-v` or
           `--invert-match` option, the search stops after the specified
           number of non-matches. Use `--no-max-count` to countermand an
           earlier `--max-count` option on the command line.

... and of course the matching implementation. :-) Maybe you could
achieve that by using -1 to signal that there's no max-count in play?
How does that sound to you?

Even if we want to handle the zero just like you do, I think this patch
needs a few tests. We should make sure to test the 0-case (whatever we
end up wanting it to behave like), and probably the "suppress an earlier
-m by giving --no-max-count" case. It also seems wise to set up some
test scenario where there are several files involved so that we can see
that we don't just print the first m matches *globally*, but that the
counter is really handled *per file*.

I think this `-m` flag would be a nice addition. I know that I've been
missing something like it a few times. As you wrote in your commit
message, `| head -3` can work for some use-cases, but definitely not for
others. This `-m` is a lot more granular than `-l` which can be seen as
a crude `-m 1`. Thanks for posting this patch! I hope you find my
comments useful.

Martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-12 13:20 [PATCH] " Carlos L. via GitGitGadget
  2022-05-14 18:16 ` Martin Ågren
@ 2022-05-16  5:57 ` Junio C Hamano
  2022-05-16  7:28   ` Paul Eggert
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-05-16  5:57 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: git, Carlos L.

"Carlos L. via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

Offtopic, but I wonder why this line is encoded like so?  The
"Signed-off-by:" line is not, and it is safely transmitted, so
it feels like we do not need to encode the in-body header that
is added only for e-mail but not in the original commit...

> This patch adds a command line option analogous to that of GNU
> grep(1)'s -m / --max-count, which users might already be used to.
> This makes it possible to limit the amount of matches shown in the
> output while keeping the functionality of other options such as -C
> (show code context) or -p (show containing function), which would be
> difficult to do with a shell pipeline (e.g. head(1)).
>
> Signed-off-by: Carlos López <00xc@protonmail.com>
> ---
> ...
> +-m <num>::
> +--max-count <num>::
> +	Limit the amount of matches per file. When using the -v or
> +	--invert-match option, the search stops after the specified
> +	number of non-matches. Setting this option to 0 has no effect.
> +

Good thing that this is defined as "per-file" limit.  If it were a
global limit, the interaction between this one and "--threads=<num>"
would have been interesting.  Perhaps add a test to make sure the
feature continues to work with "--threads=2" (I am assuming that you
have already tested this implementation works with the option).

Martin already commented on the wording "no effect"; I agree it is a
poor choice of words from the point of view of "overriding with 0".

It indeed is curious why GNU grep chose to immediately exit with 1
when "-m 0" was given, but that was decision made more than 20 years
ago (http://gnu.ist.utl.pt/software/grep/changes.html and look for
"2000-03-17").  Between "being consistent even with a seemingly
useless design choice made by somebody else" and "choose to be
different in a corner case where nobody should care and allow us to
be more useful", I am slightly in favor in this particular case.

What "git grep -m -1" should do?  IIRC, OPT_INTEGER is for signed
integer but the new .max_count member, as well as the existing
"count" that is compared with it, are of "unsigned" type.  Either
erroring out or treating it as unlimited is probably fine, but
whatever we do, we should document and have a test for it.

Thanks.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  5:57 ` Junio C Hamano
@ 2022-05-16  7:28   ` Paul Eggert
  2022-05-16  8:38     ` Carlos L.
  2022-05-16 15:18     ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Paul Eggert @ 2022-05-16  7:28 UTC (permalink / raw)
  To: Junio C Hamano, Carlos L. via GitGitGadget
  Cc: git, Carlos L., GNU grep developers

On 5/15/22 22:57, Junio C Hamano wrote:

> It indeed is curious why GNU grep chose to immediately exit with 1
> when "-m 0" was given,

As I vaguely recall, if "-m 1" stops before "-m 2" does, then the idea 
was that it's reasonable for "-m 0" to stop before "-m 1" does, and the 
logical place to stop is right at the start, before any matches are 
found (i.e., exit with status 1).

What would be more useful for 'grep -m 0' to do? (Sorry, I came into 
this conversation just now.) Perhaps GNU 'grep -m 0' should change, if 
there's something better for it to do.


> What "git grep -m -1" should do?  IIRC, OPT_INTEGER is for signed
> integer but the new .max_count member, as well as the existing
> "count" that is compared with it, are of "unsigned" type.  Either
> erroring out or treating it as unlimited is probably fine, but
> whatever we do, we should document and have a test for it.

'grep -m -1' treats the count as being unlimited, but this isn't 
documented and (from the code) appears to be accidental. It'd make sense 
for it to be documented.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  7:28   ` Paul Eggert
@ 2022-05-16  8:38     ` Carlos L.
  2022-05-16 15:36       ` Junio C Hamano
  2022-05-16 15:18     ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Carlos L. @ 2022-05-16  8:38 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Junio C Hamano, Carlos L. via GitGitGadget, git,
	GNU grep developers

Hi list,

Thanks to everyone who provided feedback :)

On Saturday, May 14th, 2022 at 20:16, Martin Ågren <martin.agren@gmail.com> wrote:
> I think this should be
>
> [(-m | --max-count) <num>]

> Please use `backticks` with `-v` and `--invert-match` so that they are
> set in monospace.

I will add these suggestions to my patch.

> Regarding the special value 0, it's a bit unclear what "has no effect"
> means. In particular, it can have an effect in the sense that when it
> is used like
>
> git grep -m 1 -m 0 foo
>
> it undoes the `-m 1`.
>
> But also, that's not how my grep(1) behaves: with `-m 0`, it limits the
> number of matches to zero. I don't know how useful that is (can that
> zero-case be optimized by exiting with 1 before even trying to find the
> needle!?), or if maybe different variants of grep handle this
> differently? If all grep implementations handle 0 by actually only
> emitting zero hits, I think it would be wise for us to handle 0 the same
> way.

I agree the wording is not clear. I did not see a good use case for GNU's `-m 0`, which is why I used that value as unlimited. I am not sold on using `--no-max-count` or -1 *just* for consistency, but if someone can point to a good use case of GNU's `-m 0` (especially in git grep), I will gladly concede.

> Even if we want to handle the zero just like you do, I think this patch
> needs a few tests. We should make sure to test the 0-case (whatever we
> end up wanting it to behave like), and probably the "suppress an earlier
> -m by giving --no-max-count" case. It also seems wise to set up some
> test scenario where there are several files involved so that we can see
> that we don't just print the first m matches globally, but that the
> counter is really handled per file.

This seems sound. Is there any documentation on how to write tests for git?

On Monday, May 16th, 2022 at 07:57, Junio C Hamano <gitster@pobox.com> wrote:
> Good thing that this is defined as "per-file" limit. If it were a
> global limit, the interaction between this one and "--threads=<num>"
> would have been interesting. Perhaps add a test to make sure the
> feature continues to work with "--threads=2" (I am assuming that you
> have already tested this implementation works with the option).

I did and I found no unexpected behavior.

> What "git grep -m -1" should do? IIRC, OPT_INTEGER is for signed
> integer but the new .max_count member, as well as the existing
> "count" that is compared with it, are of "unsigned" type. Either
> erroring out or treating it as unlimited is probably fine, but
> whatever we do, we should document and have a test for it.

I would favor treating it as an error. As mentioned above, using 0 to describe "unlimited matches" (e.g. the default) is my preference, but I am willing to concede if someone can think of a good use for `-m 0`. Also, from the implementation side (although not as important) it looks better: if we allow negative values, we need to distinguish between -1 (unlimited) and -4 (display error to user, probably) - the patch is much simpler right now. And just as a side note, we avoid an issue in the pretty much insignificant use case of giving a very big value (UINT_MAX) for `-m` and it overflowing into -1, thus not properly limiting the number of matches.

On Monday, May 16th, 2022 at 09:28, Paul Eggert <eggert@cs.ucla.edu> wrote:
> As I vaguely recall, if "-m 1" stops before "-m 2" does, then the idea
> was that it's reasonable for "-m 0" to stop before "-m 1" does, and the
> logical place to stop is right at the start, before any matches are
> found (i.e., exit with status 1).

As I mentioned above, I do not see what this `-m 0` behavior is useful for, but if someone could show me an use for it I would appreciate it.

Again, thank you everyone for your comments.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  7:28   ` Paul Eggert
  2022-05-16  8:38     ` Carlos L.
@ 2022-05-16 15:18     ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-05-16 15:18 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Carlos L. via GitGitGadget, git, Carlos L., GNU grep developers

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 5/15/22 22:57, Junio C Hamano wrote:
>
>> It indeed is curious why GNU grep chose to immediately exit with 1
>> when "-m 0" was given,
>
> As I vaguely recall, if "-m 1" stops before "-m 2" does, then the idea
> was that it's reasonable for "-m 0" to stop before "-m 1" does, and
> the logical place to stop is right at the start, before any matches
> are found (i.e., exit with status 1).
>
> What would be more useful for 'grep -m 0' to do? (Sorry, I came into
> this conversation just now.) Perhaps GNU 'grep -m 0' should change, if 
> there's something better for it to do.

"grep -m 0" that declares a failure upfront because it is asked to
stop before finding any match, combined with the fact that the
command is expected to signal a failure after finding no matches, is
an optimization that is mathmatically correct ;-)

It was asked as a part of discussion on a proposed patch to teach
the same "-m <max-number-of-hits>" option to "git grep" what it
ought to mean to give "-m 0".  As we are too accustomed to the "last
command line option wins" behaviour, I initially did not find the
behaviour of the proposed patch, where 0 (or negative) stood for
"unlimited", quite natural and useful (e.g. it allows overriding a
hardcoded default option in aliases, "[alias] gg = grep -m 4"), and
then was surprised by the "'-m 0' is an immediate failure" in GNU
grep.  I would call it mathematically pure and correct but of
dubious utility.

Sorry for not providing enough context.  Full discussion is seen at 
https://lore.kernel.org/git/pull.1264.git.git.1652361610103.gitgitgadget@gmail.com/

>> What "git grep -m -1" should do?  IIRC, OPT_INTEGER is for signed
>> integer but the new .max_count member, as well as the existing
>> "count" that is compared with it, are of "unsigned" type.  Either
>> erroring out or treating it as unlimited is probably fine, but
>> whatever we do, we should document and have a test for it.
>
> 'grep -m -1' treats the count as being unlimited, but this isn't
> documented and (from the code) appears to be accidental. It'd make
> sense for it to be documented.

Thanks.  The question was asked for the proposed addition to "git
grep", but it is funny to see it apply equally well to GNU grep ;-).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  8:38     ` Carlos L.
@ 2022-05-16 15:36       ` Junio C Hamano
  2022-05-17  5:53         ` Paul Eggert
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-05-16 15:36 UTC (permalink / raw)
  To: Carlos L.
  Cc: Paul Eggert, Carlos L. via GitGitGadget, git, GNU grep developers

"Carlos L." <00xc@protonmail.com> writes:

>> Even if we want to handle the zero just like you do, I think this patch
>> needs a few tests. We should make sure to test the 0-case (whatever we
>> end up wanting it to behave like), and probably the "suppress an earlier
>> -m by giving --no-max-count" case. It also seems wise to set up some
>> test scenario where there are several files involved so that we can see
>> that we don't just print the first m matches globally, but that the
>> counter is really handled per file.
>
> This seems sound. Is there any documentation on how to write tests for git?

t/README and Documentation/MyFirstContribution would be two good
places to start.

>> What "git grep -m -1" should do? IIRC, OPT_INTEGER is for signed
>> integer but the new .max_count member, as well as the existing
>> "count" that is compared with it, are of "unsigned" type. Either
>> erroring out or treating it as unlimited is probably fine, but
>> whatever we do, we should document and have a test for it.
>
> I would favor treating it as an error. As mentioned above, using 0
> to describe "unlimited matches" (e.g. the default) is my
> preference, but I am willing to concede if someone can think of a
> good use for `-m 0`.

With Devil's advocate hat on.

"GNU grep has been doing so for the past 20 years and existing users
of the command expects '-m 0' to behave that way" is a good enough
reason, especially if '-m 0' is not the only possible way to say
"unlimited".

> Also, from the implementation side (although
> not as important) it looks better: if we allow negative values, we
> need to distinguish between -1 (unlimited) and -4 (display error
> to user, probably)

If we are going to document "you can pass a negative value to
explicitly say 'unlimited', which is a useful way to countermand
another `-m <num>` that appear earlier on the command line", then -1
and -4 would equally be 'unlimited' and there is no need to
distinguish anything.

Devil's advocate hat off.

I personally do not mind if "-m <non-positive>" means "unlimited",
as long as that is clearly documented and tested, but for long time
"GNU grep" users "-m 0" might appear surprising (not necessarily
because they would find that the "-m 0" that immediately fails is
useful, but because the behaviour is deliberately made different).

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16 15:36       ` Junio C Hamano
@ 2022-05-17  5:53         ` Paul Eggert
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Eggert @ 2022-05-17  5:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Carlos L. via GitGitGadget, git, GNU grep developers, Carlos L.

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

On 5/16/22 08:36, Junio C Hamano wrote:
> "GNU grep has been doing so for the past 20 years and existing users
> of the command expects '-m 0' to behave that way" is a good enough
> reason, especially if '-m 0' is not the only possible way to say
> "unlimited".

Yes, I'm inclined in the same direction, now that I see more of the 
context. That is, GNU grep can continue what it's long been doing, with 
the only change being to the documentation so that we document -m-1 as 
meaning "unlimited". This minimizes possible disruption to existing 
scripts and satisfies the use case of having a way to turn off any 
previously-appearing -m option.

I installed the attached to the GNU grep master doc to do that. Hope 
this works for you.

[-- Attachment #2: 0001-grep-document-m-better.patch --]
[-- Type: text/x-patch, Size: 1544 bytes --]

From 2deca89cf0c7a99450f88cf0abfadd336511633f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 16 May 2022 12:18:26 -0700
Subject: [PATCH] grep: document -m better

* doc/grep.in.1, doc/grep.texi: Document behavior of -m 0 and -m -1.
This documents longstanding behavior, and is consistent with
how git grep -m will likely behave.
---
 doc/grep.in.1 | 10 ++++++++++
 doc/grep.texi |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/doc/grep.in.1 b/doc/grep.in.1
index aba085a..5ba90ee 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -321,6 +321,16 @@ Scanning each input file stops upon first match.
 Stop reading a file after
 .I NUM
 matching lines.
+If
+.I NUM
+is zero,
+.B grep
+stops right away without reading input.
+A
+.I NUM
+of \-1 is treated as infinity and
+.B grep
+does not stop; this is the default.
 If the input is standard input from a regular file,
 and
 .I NUM
diff --git a/doc/grep.texi b/doc/grep.texi
index b9688c8..b073fa7 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -341,6 +341,10 @@ Scanning each input file stops upon first match.
 @opindex --max-count
 @cindex max-count
 Stop after the first @var{num} selected lines.
+If @var{num} is zero, @command{grep} stops right away without reading input.
+A @var{num} of @minus{}1 is treated as infinity and @command{grep}
+does not stop; this is the default.
+
 If the input is standard input from a regular file,
 and @var{num} selected lines are output,
 @command{grep} ensures that the standard input is positioned
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] grep: add --max-count command line option
@ 2022-06-20 15:49 Carlos L. via GitGitGadget
  2022-06-20 15:57 ` Paul Eggert
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
  0 siblings, 2 replies; 22+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-20 15:49 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.,
	Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v1
Pull-Request: https://github.com/git/git/pull/1278

 Documentation/git-grep.txt | 8 ++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 ++
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..19b817d5e58 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,13 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default).
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..4ab28995da0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		exit(EXIT_FAILURE);
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..a010f9f4132 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count != (unsigned)-1 && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..218585a8679 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	unsigned max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = (unsigned)-1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 15:49 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
@ 2022-06-20 15:57 ` Paul Eggert
  2022-06-20 16:25   ` Carlos L.
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Eggert @ 2022-06-20 15:57 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: Martin Ågren [ ], Carlos L., git

On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
> +	unsigned max_count;

Why not make this intmax_t? That way, you don't have to worry about 
casting -1 to unsigned. Also on typical 64-bit machines you no longer 
have to worry about mishandling counts greater than 2**32 (the limit 
becomes 2**63 - 1 which is plenty).

These days it's typically better to avoid unsigned types in C when you 
can, as standard tools like 'gcc -fsanitize=undefined' can catch signed 
int overflow whereas unsigned int overflow always wraps around which is 
typically bad news.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 15:57 ` Paul Eggert
@ 2022-06-20 16:25   ` Carlos L.
  2022-06-20 16:32     ` Paul Eggert
  0 siblings, 1 reply; 22+ messages in thread
From: Carlos L. @ 2022-06-20 16:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Carlos L. via GitGitGadget, Martin Ågren [ ], git

Hi,

On Monday, June 20th, 2022 at 17:57, Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
>
> > + unsigned max_count;
>
>
> Why not make this intmax_t? That way, you don't have to worry about
> casting -1 to unsigned. Also on typical 64-bit machines you no longer
> have to worry about mishandling counts greater than 232 (the limit
> becomes 263 - 1 which is plenty).

This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized:

parse-options.c:
 219             *(int *)opt->value = strtol(arg, (char **)&s, 10);

I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 16:25   ` Carlos L.
@ 2022-06-20 16:32     ` Paul Eggert
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Eggert @ 2022-06-20 16:32 UTC (permalink / raw)
  To: Carlos L.; +Cc: Carlos L. via GitGitGadget, Martin Ågren [ ], git

On 6/20/22 11:25, Carlos L. wrote:
> This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized:
>
> parse-options.c:
>   219             *(int *)opt->value = strtol(arg, (char **)&s, 10);

OK, so parse-options messes up if the user specifies a count that does 
not fit in 'int'? Although that's a separate bug, let's not make things 
worse here; let's make the new count an 'int'.

In the long run parse-options should be changed to use strtoimax instead 
of strtol, and the corresponding integers should be changed to intmax_t, 
and the proper thing should be done if the string value does not fit 
into intmax_t. But this longer-run fix affects all integer-valued 
options, not just this one.


> I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.

Such comparisons cannot misfire if both values are nonnegative, and that 
can easily be arranged here.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] grep: add --max-count command line option
  2022-06-20 15:49 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
  2022-06-20 15:57 ` Paul Eggert
@ 2022-06-21  5:36 ` Carlos L. via GitGitGadget
  2022-06-21 16:27   ` Junio C Hamano
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-21  5:36 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.,
	Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v2
Pull-Request: https://github.com/git/git/pull/1278

Range-diff vs v1:

 1:  f89c6e244aa ! 1:  ee7eb298854 grep: add --max-count command line option
     @@ grep.c: static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, i
       		bol = eol + 1;
       		if (!left)
       			break;
     -+		if (opt->max_count != (unsigned)-1 && count == opt->max_count)
     ++		if (opt->max_count != -1 && count == opt->max_count)
      +			break;
       		left--;
       		lno++;
     @@ grep.h: struct grep_opt {
       	int show_hunk_mark;
       	int file_break;
       	int heading;
     -+	unsigned max_count;
     ++	int max_count;
       	void *priv;
       
       	void (*output)(struct grep_opt *opt, const void *data, size_t size);
     @@ grep.h: struct grep_opt {
       	.relative = 1, \
       	.pathname = 1, \
       	.max_depth = -1, \
     -+	.max_count = (unsigned)-1, \
     ++	.max_count = -1, \
       	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
       	.colors = { \
       		[GREP_COLOR_CONTEXT] = "", \


 Documentation/git-grep.txt | 8 ++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 ++
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..19b817d5e58 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,13 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default).
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..4ab28995da0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		exit(EXIT_FAILURE);
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..b32ab75cb6b 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count != -1 && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..bdcadce61b8 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = -1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
@ 2022-06-21 16:27   ` Junio C Hamano
  2022-06-22  6:41     ` Carlos L.
       [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-06-21 16:27 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget
  Cc: git, Martin Ågren [ ], Paul Eggert [ ], Carlos L.

"Carlos L. via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>
>
> This patch adds a command line option analogous to that of GNU
> grep(1)'s -m / --max-count, which users might already be used to.
> This makes it possible to limit the amount of matches shown in the
> output while keeping the functionality of other options such as -C
> (show code context) or -p (show containing function), which would be
> difficult to do with a shell pipeline (e.g. head(1)).
>
> Signed-off-by: Carlos López 00xc@protonmail.com
> ---
> ...
>  Documentation/git-grep.txt | 8 ++++++++
>  builtin/grep.c             | 9 +++++++++
>  grep.c                     | 2 ++
>  grep.h                     | 2 ++
>  4 files changed, 21 insertions(+)

Tests?

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 3d393fbac1b..19b817d5e58 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -23,6 +23,7 @@ SYNOPSIS
>  	   [--break] [--heading] [-p | --show-function]
>  	   [-A <post-context>] [-B <pre-context>] [-C <context>]
>  	   [-W | --function-context]
> +	   [(-m | --max-count) <num>]
>  	   [--threads <num>]
>  	   [-f <file>] [-e] <pattern>
>  	   [--and|--or|--not|(|)|-e <pattern>...]
> @@ -238,6 +239,13 @@ providing this option will cause it to die.
>  	`git diff` works out patch hunk headers (see 'Defining a
>  	custom hunk-header' in linkgit:gitattributes[5]).
>  
> +-m <num>::
> +--max-count <num>::
> +	Limit the amount of matches per file. When using the `-v` or
> +	`--invert-match` option, the search stops after the specified
> +	number of non-matches. A value of -1 will return unlimited
> +	results (the default).

Hmph ...

> +	/*
> +	 * Optimize out the case where the amount of matches is limited to zero.
> +	 * We do this to keep results consistent with GNU grep(1).
> +	 */
> +	if (opt.max_count == 0)
> +		exit(EXIT_FAILURE);
> +

OK, so "stop before seeing any match" logically leads to "we found
nothing, so exit with non-zero".

> diff --git a/grep.c b/grep.c
> index 82eb7da1022..b32ab75cb6b 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
>  		bol = eol + 1;
>  		if (!left)
>  			break;
> +		if (opt->max_count != -1 && count == opt->max_count)
> +			break;

I would have written it "if (0 <= opt->max_count && ...)".  What
happens when a trickster asks you to do "git grep -m -2"?

I guess what I am getting at is if we are better off saying that
negative means unlimited, instead of special casing -1 like this.  I
didn't think it through so it may be perfectly possible that what
you wrote makes more sense than "anything negative is unlimited".

I dunno.

>  		left--;
>  		lno++;
>  	}

Thanks.

> diff --git a/grep.h b/grep.h
> index c722d25ed9d..bdcadce61b8 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -171,6 +171,7 @@ struct grep_opt {
>  	int show_hunk_mark;
>  	int file_break;
>  	int heading;
> +	int max_count;
>  	void *priv;
>  
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
> @@ -181,6 +182,7 @@ struct grep_opt {
>  	.relative = 1, \
>  	.pathname = 1, \
>  	.max_depth = -1, \
> +	.max_count = -1, \
>  	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
>  	.colors = { \
>  		[GREP_COLOR_CONTEXT] = "", \
>
> base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
  2022-06-21 16:27   ` Junio C Hamano
@ 2022-06-22  6:41     ` Carlos L.
  2022-06-22  6:56       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Carlos L. @ 2022-06-22  6:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Carlos L. via GitGitGadget, git, Martin Ågren [ ],
	Paul Eggert [ ]

Hi,

Just a couple of questions.

On Tuesday, June 21st, 2022 at 18:27, Junio C Hamano <gitster@pobox.com> wrote:

> "Carlos L. via GitGitGadget" gitgitgadget@gmail.com writes:
>
> > From: =?UTF-8?q?Carlos=20L=C3=B3pez?= 00xc@protonmail.com
> >
> > This patch adds a command line option analogous to that of GNU
> > grep(1)'s -m / --max-count, which users might already be used to.
> > This makes it possible to limit the amount of matches shown in the
> > output while keeping the functionality of other options such as -C
> > (show code context) or -p (show containing function), which would be
> > difficult to do with a shell pipeline (e.g. head(1)).
> >
> > Signed-off-by: Carlos López 00xc@protonmail.com
> > ---
> > ...
> > Documentation/git-grep.txt | 8 ++++++++
> > builtin/grep.c | 9 +++++++++
> > grep.c | 2 ++
> > grep.h | 2 ++
> > 4 files changed, 21 insertions(+)
>
>
> Tests?

Right. Is it OK if I include my test(s) in t/t7810-grep.sh, or should it be a different/new file?

> > diff --git a/grep.c b/grep.c
> > index 82eb7da1022..b32ab75cb6b 100644
> > --- a/grep.c
> > +++ b/grep.c
> > @@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
> > bol = eol + 1;
> > if (!left)
> > break;
> > + if (opt->max_count != -1 && count == opt->max_count)
> > + break;
>
>
> I would have written it "if (0 <= opt->max_count && ...)". What
>
> happens when a trickster asks you to do "git grep -m -2"?

Fair enough. Since it's already optimized out above, is there any reason we need to include zero (<=)?

> I guess what I am getting at is if we are better off saying that
> negative means unlimited, instead of special casing -1 like this. I
> didn't think it through so it may be perfectly possible that what
> you wrote makes more sense than "anything negative is unlimited".
>
> I dunno.

I think you're right, I'll adjust my patch.

Best,
Carlos

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
  2022-06-22  6:41     ` Carlos L.
@ 2022-06-22  6:56       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-06-22  6:56 UTC (permalink / raw)
  To: Carlos L.
  Cc: Carlos L. via GitGitGadget, git, Martin Ågren [ ],
	Paul Eggert [ ]

"Carlos L." <00xc@protonmail.com> writes:

>> Tests?
>
> Right. Is it OK if I include my test(s) in t/t7810-grep.sh, or
> should it be a different/new file?

It is preferrable to add new tests to existing scripts, rather than
adding a new (and short) one.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] grep: add --max-count command line option
       [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
@ 2022-06-22 13:23     ` Carlos L.
  0 siblings, 0 replies; 22+ messages in thread
From: Carlos L. @ 2022-06-22 13:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Carlos L. via GitGitGadget, git, Martin Ågren [ ],
	Paul Eggert [ ]

Hi,

On Wednesday, June 22nd, 2022 at 12:22, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> On Tue, Jun 21 2022, Carlos L. via GitGitGadget wrote:
>
> > From: =?UTF-8?q?Carlos=20L=C3=B3pez?= 00xc@protonmail.com
> >
> > This patch adds a command line option analogous to that of GNU
> > grep(1)'s -m / --max-count, which users might already be used to.
>
>
> Thanks, this seems useful.
>
> > This makes it possible to limit the amount of matches shown in the
> > output while keeping the functionality of other options such as -C
> > (show code context) or -p (show containing function), which would be
> > difficult to do with a shell pipeline (e.g. head(1)).
>
>
> We start multi-threaded grep workers, how does this code handle races
> between them finding things, this count being incremented, and the "do
> we have sufficient results?" check?
>
> Is it guarded by the relevant mutexes?

AFAICT only a single thread runs on each file via grep_source_1(), and we check `count`, which is local to this function.

> > + /*
> > + * Optimize out the case where the amount of matches is limited to zero.
> > + * We do this to keep results consistent with GNU grep(1).
> > + */
> > + if (opt.max_count == 0)
> > + exit(EXIT_FAILURE);
>
>
> Don't use exit() in cmd_grep(), you should use "return 1".

I'll use return in my follow-up patch, this can be improved afterwards.

> But even better use usage_msg_opt() here, i.e. inform the user why this
> was bad.
>
> Or hrm, it seems GNU grep silently returns 1 here, perhaps --max-count=0
> is a feature for some?
>
> If this is intentional it's worth documenting and testing it explicitly.

I will add a sentence about this in Documentation/git-grep.txt.

> Re the comments from others about size_t or whatever, it might be better
> here to use OPT_CALLBACK and an unsigned type.
>
> Then just have a "int have_max_count:1", which IMO is more obvious than
> using integer wrap-around to test "didn't provide this flag".

FWIW, I think it's fine to use int and a negative value as an special encoding, max_depth does the same thing. These are per-file matches, so they should not go over 2 billion in reasonable use cases.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 0/2] grep: add --max-count command line option
  2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
  2022-06-21 16:27   ` Junio C Hamano
       [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
@ 2022-06-22 17:07   ` Carlos L. via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
                       ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-22 17:07 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.

This patch adds a command line option analogous to that of GNU grep(1)'s -m
/ --max-count, which users might already be used to. This makes it possible
to limit the amount of matches shown in the output while keeping the
functionality of other options such as -C (show code context) or -p (show
containing function), which would be difficult to do with a shell pipeline
(e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com

Carlos López (2):
  grep: add --max-count command line option
  tests: add tests for grep --max-count

 Documentation/git-grep.txt |  9 +++++
 builtin/grep.c             |  9 +++++
 grep.c                     |  2 +-
 grep.h                     |  2 +
 t/t7810-grep.sh            | 83 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 104 insertions(+), 1 deletion(-)


base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v3
Pull-Request: https://github.com/git/git/pull/1278

Range-diff vs v2:

 1:  ee7eb298854 ! 1:  5bf7244437e grep: add --max-count command line option
     @@ Documentation/git-grep.txt: providing this option will cause it to die.
      +	Limit the amount of matches per file. When using the `-v` or
      +	`--invert-match` option, the search stops after the specified
      +	number of non-matches. A value of -1 will return unlimited
     -+	results (the default).
     ++	results (the default). A value of 0 will exit immediately with
     ++	a non-zero status.
      +
       --threads <num>::
       	Number of grep worker threads to use.
     @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix)
      +	 * We do this to keep results consistent with GNU grep(1).
      +	 */
      +	if (opt.max_count == 0)
     -+		exit(EXIT_FAILURE);
     ++		return 1;
      +
       	if (show_in_pager) {
       		if (num_threads > 1)
     @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix)
      
       ## grep.c ##
      @@ grep.c: static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
     - 		bol = eol + 1;
     - 		if (!left)
     - 			break;
     -+		if (opt->max_count != -1 && count == opt->max_count)
     -+			break;
     - 		left--;
     - 		lno++;
     - 	}
     + 				return 0;
     + 			goto next_line;
     + 		}
     +-		if (hit) {
     ++		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
     + 			count++;
     + 			if (opt->status_only)
     + 				return 1;
      
       ## grep.h ##
      @@ grep.h: struct grep_opt {
 -:  ----------- > 2:  525958af877 tests: add tests for grep --max-count

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/2] grep: add --max-count command line option
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
@ 2022-06-22 17:07     ` Carlos López via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
  2022-06-22 19:47     ` [PATCH v4] grep: add --max-count command line option Carlos L. via GitGitGadget
  2 siblings, 0 replies; 22+ messages in thread
From: Carlos López via GitGitGadget @ 2022-06-22 17:07 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.,
	Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
 Documentation/git-grep.txt | 9 +++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 +-
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..58d944bd578 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,14 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default). A value of 0 will exit immediately with
+	a non-zero status.
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..e6bcdf860cc 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		return 1;
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..52a894c9890 100644
--- a/grep.c
+++ b/grep.c
@@ -1615,7 +1615,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				return 0;
 			goto next_line;
 		}
-		if (hit) {
+		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
 			count++;
 			if (opt->status_only)
 				return 1;
diff --git a/grep.h b/grep.h
index c722d25ed9d..bdcadce61b8 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = -1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/2] tests: add tests for grep --max-count
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
@ 2022-06-22 17:07     ` Carlos López via GitGitGadget
  2022-06-22 18:10       ` Junio C Hamano
  2022-06-22 19:47     ` [PATCH v4] grep: add --max-count command line option Carlos L. via GitGitGadget
  2 siblings, 1 reply; 22+ messages in thread
From: Carlos López via GitGitGadget @ 2022-06-22 17:07 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.,
	Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

Add tests for grep's -m / --max-count to check if the option correctly
outputs limited results, and that it interacts properly with other flags
that could likely be used in conjunction.

Signed-off-by: Carlos López 00xc@protonmail.com
---
 t/t7810-grep.sh | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 69356011713..7b1b8a3cd93 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -77,6 +77,7 @@ test_expect_success setup '
 	# Say hello.
 	function hello() {
 	  echo "Hello world."
+	  echo "Hello again."
 	} # hello
 
 	# Still a no-op.
@@ -595,6 +596,88 @@ test_expect_success 'grep --files-without-match --quiet' '
 	test_must_be_empty actual
 '
 
+cat >expected <<EOF &&
+EOF
+
+test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
+	test_must_fail git grep --max-count 0 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file:foo mmap bar
+EOF
+
+test_expect_success 'grep --max-count 1' '
+	git grep --max-count 1 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file:foo mmap bar
+file:foo_mmap bar
+file:foo_mmap bar mmap
+EOF
+
+test_expect_success 'grep --max-count 3' '
+	git grep --max-count 3 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file:foo mmap bar
+file:foo_mmap bar
+file:foo_mmap bar mmap
+file:foo mmap bar_mmap
+file:foo_mmap bar mmap baz
+EOF
+
+test_expect_success 'grep --max-count -1 (no limit)' '
+	git grep --max-count -1 foo >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+file-foo mmap bar
+file:foo_mmap bar
+file-foo_mmap bar mmap
+EOF
+
+test_expect_success 'grep --max-count 1 --context 2' '
+	git grep --max-count 1 --context 1 foo_mmap >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+hello.ps1=function hello() {
+hello.ps1:  echo "Hello world."
+EOF
+
+test_expect_success 'grep --max-count 1 --show-function' '
+	git grep --max-count 1 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+hello.ps1=function hello() {
+hello.ps1:  echo "Hello world."
+hello.ps1:  echo "Hello again."
+EOF
+
+test_expect_success 'grep --max-count 2 --show-function' '
+	git grep --max-count 2 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<EOF &&
+hello.ps1:1
+EOF
+
+test_expect_success 'grep --max-count 1 -c' '
+	git grep --max-count 1 --count Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<EOF
 file:foo mmap bar_mmap
 EOF
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/2] tests: add tests for grep --max-count
  2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
@ 2022-06-22 18:10       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-06-22 18:10 UTC (permalink / raw)
  To: Carlos López via GitGitGadget
  Cc: git, Martin Ågren [ ], Paul Eggert [ ], Carlos L.

"Carlos López via GitGitGadget"  <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>
>
> Add tests for grep's -m / --max-count to check if the option correctly
> outputs limited results, and that it interacts properly with other flags
> that could likely be used in conjunction.
>
> Signed-off-by: Carlos López 00xc@protonmail.com
> ---
>  t/t7810-grep.sh | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 83 insertions(+)

This is better done as part of the previous patch.  The new tests
protect the new code from future breakage.

> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index 69356011713..7b1b8a3cd93 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -77,6 +77,7 @@ test_expect_success setup '
>  	# Say hello.
>  	function hello() {
>  	  echo "Hello world."
> +	  echo "Hello again."
>  	} # hello
>  
>  	# Still a no-op.
> @@ -595,6 +596,88 @@ test_expect_success 'grep --files-without-match --quiet' '
>  	test_must_be_empty actual
>  '
>  
> +cat >expected <<EOF &&
> +EOF
> +
> +test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
> +	test_must_fail git grep --max-count 0 foo >actual &&
> +	test_cmp expected actual
> +'

For this particular one, "test_must_be_empty actual" would suffice,
without comparing with the expected output.

> +cat >expected <<EOF &&
> +file:foo mmap bar
> +EOF
> +
> +test_expect_success 'grep --max-count 1' '
> +	git grep --max-count 1 foo >actual &&
> +	test_cmp expected actual
> +'

Writing expected output outside test_expect_success that uses it is
a quite old style but that is because this test script is pretty
much ancient, so mimicking it is OK.  We'd need to come back later
when the tree is quiescent to clean them up, though (#leftoverbits).

> ...
> +	test_cmp expected actual
> +'

The new tests seem to give us a reasonable test coverage.  We could
discard one of the "-m1" vs "-m3" in the early ones, as they do not
give much extra test coverage over the other, to reduce repetition.

We do not test a case where we pick up-to N matches each from
multiple files, though.  Perhaps

    git grep -m1 -e o -- hello.\*

may stop after hitting "No-op." in hello.ps1 and "stdio" in hello.c,
which may make a good test, perhaps?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4] grep: add --max-count command line option
  2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
  2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
@ 2022-06-22 19:47     ` Carlos L. via GitGitGadget
  2 siblings, 0 replies; 22+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-22 19:47 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.,
	Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v4
Pull-Request: https://github.com/git/git/pull/1278

Range-diff vs v3:

 1:  5bf7244437e ! 1:  89c0151c164 grep: add --max-count command line option
     @@ grep.h: struct grep_opt {
       	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
       	.colors = { \
       		[GREP_COLOR_CONTEXT] = "", \
     +
     + ## t/t7810-grep.sh ##
     +@@ t/t7810-grep.sh: test_expect_success setup '
     + 	# Say hello.
     + 	function hello() {
     + 	  echo "Hello world."
     ++	  echo "Hello again."
     + 	} # hello
     + 
     + 	# Still a no-op.
     +@@ t/t7810-grep.sh: test_expect_success 'grep --files-without-match --quiet' '
     + 	test_must_be_empty actual
     + '
     + 
     ++test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
     ++	test_must_fail git grep --max-count 0 foo >actual &&
     ++	test_must_be_empty actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 3' '
     ++	cat >expected <<-EOF &&
     ++	file:foo mmap bar
     ++	file:foo_mmap bar
     ++	file:foo_mmap bar mmap
     ++	EOF
     ++	git grep --max-count 3 foo >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count -1 (no limit)' '
     ++	cat >expected <<-EOF &&
     ++	file:foo mmap bar
     ++	file:foo_mmap bar
     ++	file:foo_mmap bar mmap
     ++	file:foo mmap bar_mmap
     ++	file:foo_mmap bar mmap baz
     ++	EOF
     ++	git grep --max-count -1 foo >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --context 2' '
     ++	cat >expected <<-EOF &&
     ++	file-foo mmap bar
     ++	file:foo_mmap bar
     ++	file-foo_mmap bar mmap
     ++	EOF
     ++	git grep --max-count 1 --context 1 foo_mmap >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --show-function' '
     ++	cat >expected <<-EOF &&
     ++	hello.ps1=function hello() {
     ++	hello.ps1:  echo "Hello world."
     ++	EOF
     ++	git grep --max-count 1 --show-function Hello hello.ps1 >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 2 --show-function' '
     ++	cat >expected <<-EOF &&
     ++	hello.ps1=function hello() {
     ++	hello.ps1:  echo "Hello world."
     ++	hello.ps1:  echo "Hello again."
     ++	EOF
     ++	git grep --max-count 2 --show-function Hello hello.ps1 >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --count' '
     ++	cat >expected <<-EOF &&
     ++	hello.ps1:1
     ++	EOF
     ++	git grep --max-count 1 --count Hello hello.ps1 >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 (multiple files)' '
     ++	cat >expected <<-EOF &&
     ++	hello.c:#include <stdio.h>
     ++	hello.ps1:# No-op.
     ++	EOF
     ++	git grep --max-count 1 -e o -- hello.\* >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     ++test_expect_success 'grep --max-count 1 --context 1 (multiple files)' '
     ++	cat >expected <<-EOF &&
     ++	hello.c-#include <assert.h>
     ++	hello.c:#include <stdio.h>
     ++	hello.c-
     ++	--
     ++	hello.ps1:# No-op.
     ++	hello.ps1-function dummy() {}
     ++	EOF
     ++	git grep --max-count 1 --context 1 -e o -- hello.\* >actual &&
     ++	test_cmp expected actual
     ++'
     ++
     + cat >expected <<EOF
     + file:foo mmap bar_mmap
     + EOF
 2:  525958af877 < -:  ----------- tests: add tests for grep --max-count


 Documentation/git-grep.txt |  9 ++++
 builtin/grep.c             |  9 ++++
 grep.c                     |  2 +-
 grep.h                     |  2 +
 t/t7810-grep.sh            | 87 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..58d944bd578 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,14 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default). A value of 0 will exit immediately with
+	a non-zero status.
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..e6bcdf860cc 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		return 1;
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..52a894c9890 100644
--- a/grep.c
+++ b/grep.c
@@ -1615,7 +1615,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 				return 0;
 			goto next_line;
 		}
-		if (hit) {
+		if (hit && (opt->max_count < 0 || count < opt->max_count)) {
 			count++;
 			if (opt->status_only)
 				return 1;
diff --git a/grep.h b/grep.h
index c722d25ed9d..bdcadce61b8 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = -1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 69356011713..0f937990a06 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -77,6 +77,7 @@ test_expect_success setup '
 	# Say hello.
 	function hello() {
 	  echo "Hello world."
+	  echo "Hello again."
 	} # hello
 
 	# Still a no-op.
@@ -595,6 +596,92 @@ test_expect_success 'grep --files-without-match --quiet' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'grep --max-count 0 (must exit with non-zero)' '
+	test_must_fail git grep --max-count 0 foo >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'grep --max-count 3' '
+	cat >expected <<-EOF &&
+	file:foo mmap bar
+	file:foo_mmap bar
+	file:foo_mmap bar mmap
+	EOF
+	git grep --max-count 3 foo >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count -1 (no limit)' '
+	cat >expected <<-EOF &&
+	file:foo mmap bar
+	file:foo_mmap bar
+	file:foo_mmap bar mmap
+	file:foo mmap bar_mmap
+	file:foo_mmap bar mmap baz
+	EOF
+	git grep --max-count -1 foo >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --context 2' '
+	cat >expected <<-EOF &&
+	file-foo mmap bar
+	file:foo_mmap bar
+	file-foo_mmap bar mmap
+	EOF
+	git grep --max-count 1 --context 1 foo_mmap >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --show-function' '
+	cat >expected <<-EOF &&
+	hello.ps1=function hello() {
+	hello.ps1:  echo "Hello world."
+	EOF
+	git grep --max-count 1 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 2 --show-function' '
+	cat >expected <<-EOF &&
+	hello.ps1=function hello() {
+	hello.ps1:  echo "Hello world."
+	hello.ps1:  echo "Hello again."
+	EOF
+	git grep --max-count 2 --show-function Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --count' '
+	cat >expected <<-EOF &&
+	hello.ps1:1
+	EOF
+	git grep --max-count 1 --count Hello hello.ps1 >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 (multiple files)' '
+	cat >expected <<-EOF &&
+	hello.c:#include <stdio.h>
+	hello.ps1:# No-op.
+	EOF
+	git grep --max-count 1 -e o -- hello.\* >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --max-count 1 --context 1 (multiple files)' '
+	cat >expected <<-EOF &&
+	hello.c-#include <assert.h>
+	hello.c:#include <stdio.h>
+	hello.c-
+	--
+	hello.ps1:# No-op.
+	hello.ps1-function dummy() {}
+	EOF
+	git grep --max-count 1 --context 1 -e o -- hello.\* >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<EOF
 file:foo mmap bar_mmap
 EOF

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-06-22 19:47 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 15:49 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
2022-06-20 15:57 ` Paul Eggert
2022-06-20 16:25   ` Carlos L.
2022-06-20 16:32     ` Paul Eggert
2022-06-21  5:36 ` [PATCH v2] " Carlos L. via GitGitGadget
2022-06-21 16:27   ` Junio C Hamano
2022-06-22  6:41     ` Carlos L.
2022-06-22  6:56       ` Junio C Hamano
     [not found]   ` <220622.86mte5knbe.gmgdl@evledraar.gmail.com>
2022-06-22 13:23     ` Carlos L.
2022-06-22 17:07   ` [PATCH v3 0/2] " Carlos L. via GitGitGadget
2022-06-22 17:07     ` [PATCH v3 1/2] " Carlos López via GitGitGadget
2022-06-22 17:07     ` [PATCH v3 2/2] tests: add tests for grep --max-count Carlos López via GitGitGadget
2022-06-22 18:10       ` Junio C Hamano
2022-06-22 19:47     ` [PATCH v4] grep: add --max-count command line option Carlos L. via GitGitGadget
  -- strict thread matches above, loose matches on Subject: below --
2022-05-12 13:20 [PATCH] " Carlos L. via GitGitGadget
2022-05-14 18:16 ` Martin Ågren
2022-05-16  5:57 ` Junio C Hamano
2022-05-16  7:28   ` Paul Eggert
2022-05-16  8:38     ` Carlos L.
2022-05-16 15:36       ` Junio C Hamano
2022-05-17  5:53         ` Paul Eggert
2022-05-16 15:18     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).