git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] grep: add --max-count command line option
@ 2022-05-12 13:20 Carlos L. via GitGitGadget
  2022-05-14 18:16 ` Martin Ågren
  2022-05-16  5:57 ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-05-12 13:20 UTC (permalink / raw)
  To: git; +Cc: Carlos L., Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López <00xc@protonmail.com>
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1264%2F00xc%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1264/00xc/master-v1
Pull-Request: https://github.com/git/git/pull/1264

 Documentation/git-grep.txt | 7 +++++++
 builtin/grep.c             | 2 ++
 grep.c                     | 2 ++
 grep.h                     | 1 +
 4 files changed, 12 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..02b36046475 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [-m | --max-count <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,12 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the -v or
+	--invert-match option, the search stops after the specified
+	number of non-matches. Setting this option to 0 has no effect.
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..ba1894d5675 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file (default: 0, no limit)")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
diff --git a/grep.c b/grep.c
index 82eb7da1022..173b6c27b6e 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..25836f34314 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	unsigned max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);

base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-12 13:20 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
@ 2022-05-14 18:16 ` Martin Ågren
  2022-05-16  5:57 ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Martin Ågren @ 2022-05-14 18:16 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: Git Mailing List, Carlos López

Hi Carlos,

Welcome to the mailing list. :-)

On Thu, 12 May 2022 at 21:13, Carlos L. via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>
>
> This patch adds a command line option analogous to that of GNU
> grep(1)'s -m / --max-count, which users might already be used to.
> This makes it possible to limit the amount of matches shown in the
> output while keeping the functionality of other options such as -C
> (show code context) or -p (show containing function), which would be
> difficult to do with a shell pipeline (e.g. head(1)).

Makes sense to me.

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 3d393fbac1b..02b36046475 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -23,6 +23,7 @@ SYNOPSIS
>            [--break] [--heading] [-p | --show-function]
>            [-A <post-context>] [-B <pre-context>] [-C <context>]
>            [-W | --function-context]
> +          [-m | --max-count <num>]

I think this should be

    [(-m | --max-count) <num>]

since the short form "-m" also wants to take "<num>".

> +-m <num>::
> +--max-count <num>::
> +       Limit the amount of matches per file. When using the -v or
> +       --invert-match option, the search stops after the specified
> +       number of non-matches. Setting this option to 0 has no effect.

Please use `backticks` with `-v` and `--invert-match` so that they are
set in monospace.

Regarding the special value 0, it's a bit unclear what "has no effect"
means. In particular, it can have an effect in the sense that when it
is used like

  git grep -m 1 -m 0 foo

it undoes the `-m 1`.

But also, that's not how my grep(1) behaves: with `-m 0`, it limits the
number of matches to zero. I don't know how useful that is (can that
zero-case be optimized by exiting with 1 before even trying to find the
needle!?), or if maybe different variants of grep handle this
differently?  If all grep implementations handle 0 by actually only
emitting zero hits, I think it would be wise for us to handle 0 the same
way.

As for overriding an earlier `-m <foo>`, which could be useful, it seems
to me like `--no-max-count` would make sense.

All in all, I would suggest the following documentation:

    -m <num>::
    --max-count <num>::
           Limit the amount of matches per file. When using the `-v` or
           `--invert-match` option, the search stops after the specified
           number of non-matches. Use `--no-max-count` to countermand an
           earlier `--max-count` option on the command line.

... and of course the matching implementation. :-) Maybe you could
achieve that by using -1 to signal that there's no max-count in play?
How does that sound to you?

Even if we want to handle the zero just like you do, I think this patch
needs a few tests. We should make sure to test the 0-case (whatever we
end up wanting it to behave like), and probably the "suppress an earlier
-m by giving --no-max-count" case. It also seems wise to set up some
test scenario where there are several files involved so that we can see
that we don't just print the first m matches *globally*, but that the
counter is really handled *per file*.

I think this `-m` flag would be a nice addition. I know that I've been
missing something like it a few times. As you wrote in your commit
message, `| head -3` can work for some use-cases, but definitely not for
others. This `-m` is a lot more granular than `-l` which can be seen as
a crude `-m 1`. Thanks for posting this patch! I hope you find my
comments useful.

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-12 13:20 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
  2022-05-14 18:16 ` Martin Ågren
@ 2022-05-16  5:57 ` Junio C Hamano
  2022-05-16  7:28   ` Paul Eggert
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-05-16  5:57 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: git, Carlos L.

"Carlos L. via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

Offtopic, but I wonder why this line is encoded like so?  The
"Signed-off-by:" line is not, and it is safely transmitted, so
it feels like we do not need to encode the in-body header that
is added only for e-mail but not in the original commit...

> This patch adds a command line option analogous to that of GNU
> grep(1)'s -m / --max-count, which users might already be used to.
> This makes it possible to limit the amount of matches shown in the
> output while keeping the functionality of other options such as -C
> (show code context) or -p (show containing function), which would be
> difficult to do with a shell pipeline (e.g. head(1)).
>
> Signed-off-by: Carlos López <00xc@protonmail.com>
> ---
> ...
> +-m <num>::
> +--max-count <num>::
> +	Limit the amount of matches per file. When using the -v or
> +	--invert-match option, the search stops after the specified
> +	number of non-matches. Setting this option to 0 has no effect.
> +

Good thing that this is defined as "per-file" limit.  If it were a
global limit, the interaction between this one and "--threads=<num>"
would have been interesting.  Perhaps add a test to make sure the
feature continues to work with "--threads=2" (I am assuming that you
have already tested this implementation works with the option).

Martin already commented on the wording "no effect"; I agree it is a
poor choice of words from the point of view of "overriding with 0".

It indeed is curious why GNU grep chose to immediately exit with 1
when "-m 0" was given, but that was decision made more than 20 years
ago (http://gnu.ist.utl.pt/software/grep/changes.html and look for
"2000-03-17").  Between "being consistent even with a seemingly
useless design choice made by somebody else" and "choose to be
different in a corner case where nobody should care and allow us to
be more useful", I am slightly in favor in this particular case.

What "git grep -m -1" should do?  IIRC, OPT_INTEGER is for signed
integer but the new .max_count member, as well as the existing
"count" that is compared with it, are of "unsigned" type.  Either
erroring out or treating it as unlimited is probably fine, but
whatever we do, we should document and have a test for it.

Thanks.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  5:57 ` Junio C Hamano
@ 2022-05-16  7:28   ` Paul Eggert
  2022-05-16  8:38     ` Carlos L.
  2022-05-16 15:18     ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Eggert @ 2022-05-16  7:28 UTC (permalink / raw)
  To: Junio C Hamano, Carlos L. via GitGitGadget
  Cc: git, Carlos L., GNU grep developers

On 5/15/22 22:57, Junio C Hamano wrote:

> It indeed is curious why GNU grep chose to immediately exit with 1
> when "-m 0" was given,

As I vaguely recall, if "-m 1" stops before "-m 2" does, then the idea 
was that it's reasonable for "-m 0" to stop before "-m 1" does, and the 
logical place to stop is right at the start, before any matches are 
found (i.e., exit with status 1).

What would be more useful for 'grep -m 0' to do? (Sorry, I came into 
this conversation just now.) Perhaps GNU 'grep -m 0' should change, if 
there's something better for it to do.


> What "git grep -m -1" should do?  IIRC, OPT_INTEGER is for signed
> integer but the new .max_count member, as well as the existing
> "count" that is compared with it, are of "unsigned" type.  Either
> erroring out or treating it as unlimited is probably fine, but
> whatever we do, we should document and have a test for it.

'grep -m -1' treats the count as being unlimited, but this isn't 
documented and (from the code) appears to be accidental. It'd make sense 
for it to be documented.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  7:28   ` Paul Eggert
@ 2022-05-16  8:38     ` Carlos L.
  2022-05-16 15:36       ` Junio C Hamano
  2022-05-16 15:18     ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Carlos L. @ 2022-05-16  8:38 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Junio C Hamano, Carlos L. via GitGitGadget, git,
	GNU grep developers

Hi list,

Thanks to everyone who provided feedback :)

On Saturday, May 14th, 2022 at 20:16, Martin Ågren <martin.agren@gmail.com> wrote:
> I think this should be
>
> [(-m | --max-count) <num>]

> Please use `backticks` with `-v` and `--invert-match` so that they are
> set in monospace.

I will add these suggestions to my patch.

> Regarding the special value 0, it's a bit unclear what "has no effect"
> means. In particular, it can have an effect in the sense that when it
> is used like
>
> git grep -m 1 -m 0 foo
>
> it undoes the `-m 1`.
>
> But also, that's not how my grep(1) behaves: with `-m 0`, it limits the
> number of matches to zero. I don't know how useful that is (can that
> zero-case be optimized by exiting with 1 before even trying to find the
> needle!?), or if maybe different variants of grep handle this
> differently? If all grep implementations handle 0 by actually only
> emitting zero hits, I think it would be wise for us to handle 0 the same
> way.

I agree the wording is not clear. I did not see a good use case for GNU's `-m 0`, which is why I used that value as unlimited. I am not sold on using `--no-max-count` or -1 *just* for consistency, but if someone can point to a good use case of GNU's `-m 0` (especially in git grep), I will gladly concede.

> Even if we want to handle the zero just like you do, I think this patch
> needs a few tests. We should make sure to test the 0-case (whatever we
> end up wanting it to behave like), and probably the "suppress an earlier
> -m by giving --no-max-count" case. It also seems wise to set up some
> test scenario where there are several files involved so that we can see
> that we don't just print the first m matches globally, but that the
> counter is really handled per file.

This seems sound. Is there any documentation on how to write tests for git?

On Monday, May 16th, 2022 at 07:57, Junio C Hamano <gitster@pobox.com> wrote:
> Good thing that this is defined as "per-file" limit. If it were a
> global limit, the interaction between this one and "--threads=<num>"
> would have been interesting. Perhaps add a test to make sure the
> feature continues to work with "--threads=2" (I am assuming that you
> have already tested this implementation works with the option).

I did and I found no unexpected behavior.

> What "git grep -m -1" should do? IIRC, OPT_INTEGER is for signed
> integer but the new .max_count member, as well as the existing
> "count" that is compared with it, are of "unsigned" type. Either
> erroring out or treating it as unlimited is probably fine, but
> whatever we do, we should document and have a test for it.

I would favor treating it as an error. As mentioned above, using 0 to describe "unlimited matches" (e.g. the default) is my preference, but I am willing to concede if someone can think of a good use for `-m 0`. Also, from the implementation side (although not as important) it looks better: if we allow negative values, we need to distinguish between -1 (unlimited) and -4 (display error to user, probably) - the patch is much simpler right now. And just as a side note, we avoid an issue in the pretty much insignificant use case of giving a very big value (UINT_MAX) for `-m` and it overflowing into -1, thus not properly limiting the number of matches.

On Monday, May 16th, 2022 at 09:28, Paul Eggert <eggert@cs.ucla.edu> wrote:
> As I vaguely recall, if "-m 1" stops before "-m 2" does, then the idea
> was that it's reasonable for "-m 0" to stop before "-m 1" does, and the
> logical place to stop is right at the start, before any matches are
> found (i.e., exit with status 1).

As I mentioned above, I do not see what this `-m 0` behavior is useful for, but if someone could show me an use for it I would appreciate it.

Again, thank you everyone for your comments.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  7:28   ` Paul Eggert
  2022-05-16  8:38     ` Carlos L.
@ 2022-05-16 15:18     ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2022-05-16 15:18 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Carlos L. via GitGitGadget, git, Carlos L., GNU grep developers

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 5/15/22 22:57, Junio C Hamano wrote:
>
>> It indeed is curious why GNU grep chose to immediately exit with 1
>> when "-m 0" was given,
>
> As I vaguely recall, if "-m 1" stops before "-m 2" does, then the idea
> was that it's reasonable for "-m 0" to stop before "-m 1" does, and
> the logical place to stop is right at the start, before any matches
> are found (i.e., exit with status 1).
>
> What would be more useful for 'grep -m 0' to do? (Sorry, I came into
> this conversation just now.) Perhaps GNU 'grep -m 0' should change, if 
> there's something better for it to do.

"grep -m 0" that declares a failure upfront because it is asked to
stop before finding any match, combined with the fact that the
command is expected to signal a failure after finding no matches, is
an optimization that is mathmatically correct ;-)

It was asked as a part of discussion on a proposed patch to teach
the same "-m <max-number-of-hits>" option to "git grep" what it
ought to mean to give "-m 0".  As we are too accustomed to the "last
command line option wins" behaviour, I initially did not find the
behaviour of the proposed patch, where 0 (or negative) stood for
"unlimited", quite natural and useful (e.g. it allows overriding a
hardcoded default option in aliases, "[alias] gg = grep -m 4"), and
then was surprised by the "'-m 0' is an immediate failure" in GNU
grep.  I would call it mathematically pure and correct but of
dubious utility.

Sorry for not providing enough context.  Full discussion is seen at 
https://lore.kernel.org/git/pull.1264.git.git.1652361610103.gitgitgadget@gmail.com/

>> What "git grep -m -1" should do?  IIRC, OPT_INTEGER is for signed
>> integer but the new .max_count member, as well as the existing
>> "count" that is compared with it, are of "unsigned" type.  Either
>> erroring out or treating it as unlimited is probably fine, but
>> whatever we do, we should document and have a test for it.
>
> 'grep -m -1' treats the count as being unlimited, but this isn't
> documented and (from the code) appears to be accidental. It'd make
> sense for it to be documented.

Thanks.  The question was asked for the proposed addition to "git
grep", but it is funny to see it apply equally well to GNU grep ;-).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16  8:38     ` Carlos L.
@ 2022-05-16 15:36       ` Junio C Hamano
  2022-05-17  5:53         ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-05-16 15:36 UTC (permalink / raw)
  To: Carlos L.
  Cc: Paul Eggert, Carlos L. via GitGitGadget, git, GNU grep developers

"Carlos L." <00xc@protonmail.com> writes:

>> Even if we want to handle the zero just like you do, I think this patch
>> needs a few tests. We should make sure to test the 0-case (whatever we
>> end up wanting it to behave like), and probably the "suppress an earlier
>> -m by giving --no-max-count" case. It also seems wise to set up some
>> test scenario where there are several files involved so that we can see
>> that we don't just print the first m matches globally, but that the
>> counter is really handled per file.
>
> This seems sound. Is there any documentation on how to write tests for git?

t/README and Documentation/MyFirstContribution would be two good
places to start.

>> What "git grep -m -1" should do? IIRC, OPT_INTEGER is for signed
>> integer but the new .max_count member, as well as the existing
>> "count" that is compared with it, are of "unsigned" type. Either
>> erroring out or treating it as unlimited is probably fine, but
>> whatever we do, we should document and have a test for it.
>
> I would favor treating it as an error. As mentioned above, using 0
> to describe "unlimited matches" (e.g. the default) is my
> preference, but I am willing to concede if someone can think of a
> good use for `-m 0`.

With Devil's advocate hat on.

"GNU grep has been doing so for the past 20 years and existing users
of the command expects '-m 0' to behave that way" is a good enough
reason, especially if '-m 0' is not the only possible way to say
"unlimited".

> Also, from the implementation side (although
> not as important) it looks better: if we allow negative values, we
> need to distinguish between -1 (unlimited) and -4 (display error
> to user, probably)

If we are going to document "you can pass a negative value to
explicitly say 'unlimited', which is a useful way to countermand
another `-m <num>` that appear earlier on the command line", then -1
and -4 would equally be 'unlimited' and there is no need to
distinguish anything.

Devil's advocate hat off.

I personally do not mind if "-m <non-positive>" means "unlimited",
as long as that is clearly documented and tested, but for long time
"GNU grep" users "-m 0" might appear surprising (not necessarily
because they would find that the "-m 0" that immediately fails is
useful, but because the behaviour is deliberately made different).

Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-05-16 15:36       ` Junio C Hamano
@ 2022-05-17  5:53         ` Paul Eggert
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Eggert @ 2022-05-17  5:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Carlos L. via GitGitGadget, git, GNU grep developers, Carlos L.

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

On 5/16/22 08:36, Junio C Hamano wrote:
> "GNU grep has been doing so for the past 20 years and existing users
> of the command expects '-m 0' to behave that way" is a good enough
> reason, especially if '-m 0' is not the only possible way to say
> "unlimited".

Yes, I'm inclined in the same direction, now that I see more of the 
context. That is, GNU grep can continue what it's long been doing, with 
the only change being to the documentation so that we document -m-1 as 
meaning "unlimited". This minimizes possible disruption to existing 
scripts and satisfies the use case of having a way to turn off any 
previously-appearing -m option.

I installed the attached to the GNU grep master doc to do that. Hope 
this works for you.

[-- Attachment #2: 0001-grep-document-m-better.patch --]
[-- Type: text/x-patch, Size: 1544 bytes --]

From 2deca89cf0c7a99450f88cf0abfadd336511633f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 16 May 2022 12:18:26 -0700
Subject: [PATCH] grep: document -m better

* doc/grep.in.1, doc/grep.texi: Document behavior of -m 0 and -m -1.
This documents longstanding behavior, and is consistent with
how git grep -m will likely behave.
---
 doc/grep.in.1 | 10 ++++++++++
 doc/grep.texi |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/doc/grep.in.1 b/doc/grep.in.1
index aba085a..5ba90ee 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -321,6 +321,16 @@ Scanning each input file stops upon first match.
 Stop reading a file after
 .I NUM
 matching lines.
+If
+.I NUM
+is zero,
+.B grep
+stops right away without reading input.
+A
+.I NUM
+of \-1 is treated as infinity and
+.B grep
+does not stop; this is the default.
 If the input is standard input from a regular file,
 and
 .I NUM
diff --git a/doc/grep.texi b/doc/grep.texi
index b9688c8..b073fa7 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -341,6 +341,10 @@ Scanning each input file stops upon first match.
 @opindex --max-count
 @cindex max-count
 Stop after the first @var{num} selected lines.
+If @var{num} is zero, @command{grep} stops right away without reading input.
+A @var{num} of @minus{}1 is treated as infinity and @command{grep}
+does not stop; this is the default.
+
 If the input is standard input from a regular file,
 and @var{num} selected lines are output,
 @command{grep} ensures that the standard input is positioned
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] grep: add --max-count command line option
@ 2022-06-20 15:49 Carlos L. via GitGitGadget
  2022-06-20 15:57 ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Carlos L. via GitGitGadget @ 2022-06-20 15:49 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren [ ], Paul Eggert [ ], Carlos L.,
	Carlos López

From: =?UTF-8?q?Carlos=20L=C3=B3pez?= <00xc@protonmail.com>

This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
---
    grep: add --max-count command line option
    
    This patch adds a command line option analogous to that of GNU grep(1)'s
    -m / --max-count, which users might already be used to. This makes it
    possible to limit the amount of matches shown in the output while
    keeping the functionality of other options such as -C (show code
    context) or -p (show containing function), which would be difficult to
    do with a shell pipeline (e.g. head(1)).
    
    Signed-off-by: Carlos López 00xc@protonmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1278%2F00xc%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1278/00xc/master-v1
Pull-Request: https://github.com/git/git/pull/1278

 Documentation/git-grep.txt | 8 ++++++++
 builtin/grep.c             | 9 +++++++++
 grep.c                     | 2 ++
 grep.h                     | 2 ++
 4 files changed, 21 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 3d393fbac1b..19b817d5e58 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
 	   [--break] [--heading] [-p | --show-function]
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
+	   [(-m | --max-count) <num>]
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
@@ -238,6 +239,13 @@ providing this option will cause it to die.
 	`git diff` works out patch hunk headers (see 'Defining a
 	custom hunk-header' in linkgit:gitattributes[5]).
 
+-m <num>::
+--max-count <num>::
+	Limit the amount of matches per file. When using the `-v` or
+	`--invert-match` option, the search stops after the specified
+	number of non-matches. A value of -1 will return unlimited
+	results (the default).
+
 --threads <num>::
 	Number of grep worker threads to use.
 	See `grep.threads` in 'CONFIGURATION' for more information.
diff --git a/builtin/grep.c b/builtin/grep.c
index bcb07ea7f75..4ab28995da0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored,
 			   N_("allow calling of grep(1) (ignored by this build)"),
 			   PARSE_OPT_NOCOMPLETE),
+		OPT_INTEGER('m', "max-count", &opt.max_count,
+			N_("maximum number of results per file")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
@@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (recurse_submodules && untracked)
 		die(_("--untracked not supported with --recurse-submodules"));
 
+	/*
+	 * Optimize out the case where the amount of matches is limited to zero.
+	 * We do this to keep results consistent with GNU grep(1).
+	 */
+	if (opt.max_count == 0)
+		exit(EXIT_FAILURE);
+
 	if (show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/grep.c b/grep.c
index 82eb7da1022..a010f9f4132 100644
--- a/grep.c
+++ b/grep.c
@@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 		bol = eol + 1;
 		if (!left)
 			break;
+		if (opt->max_count != (unsigned)-1 && count == opt->max_count)
+			break;
 		left--;
 		lno++;
 	}
diff --git a/grep.h b/grep.h
index c722d25ed9d..218585a8679 100644
--- a/grep.h
+++ b/grep.h
@@ -171,6 +171,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	unsigned max_count;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -181,6 +182,7 @@ struct grep_opt {
 	.relative = 1, \
 	.pathname = 1, \
 	.max_depth = -1, \
+	.max_count = (unsigned)-1, \
 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
 	.colors = { \
 		[GREP_COLOR_CONTEXT] = "", \

base-commit: 5b71c59bc3b9365075e2a175aa7b6f2b0c84ce44
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 15:49 Carlos L. via GitGitGadget
@ 2022-06-20 15:57 ` Paul Eggert
  2022-06-20 16:25   ` Carlos L.
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2022-06-20 15:57 UTC (permalink / raw)
  To: Carlos L. via GitGitGadget; +Cc: Martin Ågren [ ], Carlos L., git

On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
> +	unsigned max_count;

Why not make this intmax_t? That way, you don't have to worry about 
casting -1 to unsigned. Also on typical 64-bit machines you no longer 
have to worry about mishandling counts greater than 2**32 (the limit 
becomes 2**63 - 1 which is plenty).

These days it's typically better to avoid unsigned types in C when you 
can, as standard tools like 'gcc -fsanitize=undefined' can catch signed 
int overflow whereas unsigned int overflow always wraps around which is 
typically bad news.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 15:57 ` Paul Eggert
@ 2022-06-20 16:25   ` Carlos L.
  2022-06-20 16:32     ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Carlos L. @ 2022-06-20 16:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Carlos L. via GitGitGadget, Martin Ågren [ ], git

Hi,

On Monday, June 20th, 2022 at 17:57, Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
>
> > + unsigned max_count;
>
>
> Why not make this intmax_t? That way, you don't have to worry about
> casting -1 to unsigned. Also on typical 64-bit machines you no longer
> have to worry about mishandling counts greater than 232 (the limit
> becomes 263 - 1 which is plenty).

This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized:

parse-options.c:
 219             *(int *)opt->value = strtol(arg, (char **)&s, 10);

I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] grep: add --max-count command line option
  2022-06-20 16:25   ` Carlos L.
@ 2022-06-20 16:32     ` Paul Eggert
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Eggert @ 2022-06-20 16:32 UTC (permalink / raw)
  To: Carlos L.; +Cc: Carlos L. via GitGitGadget, Martin Ågren [ ], git

On 6/20/22 11:25, Carlos L. wrote:
> This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized:
>
> parse-options.c:
>   219             *(int *)opt->value = strtol(arg, (char **)&s, 10);

OK, so parse-options messes up if the user specifies a count that does 
not fit in 'int'? Although that's a separate bug, let's not make things 
worse here; let's make the new count an 'int'.

In the long run parse-options should be changed to use strtoimax instead 
of strtol, and the corresponding integers should be changed to intmax_t, 
and the proper thing should be done if the string value does not fit 
into intmax_t. But this longer-run fix affects all integer-valued 
options, not just this one.


> I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.

Such comparisons cannot misfire if both values are nonnegative, and that 
can easily be arranged here.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-20 16:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 13:20 [PATCH] grep: add --max-count command line option Carlos L. via GitGitGadget
2022-05-14 18:16 ` Martin Ågren
2022-05-16  5:57 ` Junio C Hamano
2022-05-16  7:28   ` Paul Eggert
2022-05-16  8:38     ` Carlos L.
2022-05-16 15:36       ` Junio C Hamano
2022-05-17  5:53         ` Paul Eggert
2022-05-16 15:18     ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2022-06-20 15:49 Carlos L. via GitGitGadget
2022-06-20 15:57 ` Paul Eggert
2022-06-20 16:25   ` Carlos L.
2022-06-20 16:32     ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).