git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git bug: Filter ignored when "--invert-grep" option is used.
@ 2021-12-15  9:50 Dotan Cohen
  2021-12-15 22:08 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Dotan Cohen @ 2021-12-15  9:50 UTC (permalink / raw)
  To: git

What did you do before the bug happened?
$ git log -8 --author=Shachar --grep=Revert --invert-grep

What did you expect to happen?
I expected to see the last 8 commits from Shachar that did not have
the string "Revert" in the commit message.

What happened instead?
The list of commits included commits by authors other than Shachar.

What's different between what you expected and what actually happened?
The "--author" filter seems to be ignored when the "--invert-grep"
option is used.
I also tried to change the order of the options, but the results
remained the same.

[System Info]
git version:
git version 2.34.1
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.11.0-41-generic #45~20.04.1-Ubuntu SMP Wed Nov 10
10:20:10 UTC 2021 x86_64
compiler info: gnuc: 9.3
libc info: glibc: 2.31
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]
pre-commit
pre-push

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git bug: Filter ignored when "--invert-grep" option is used.
  2021-12-15  9:50 Git bug: Filter ignored when "--invert-grep" option is used Dotan Cohen
@ 2021-12-15 22:08 ` Junio C Hamano
  2021-12-16 14:54   ` Dotan Cohen
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2021-12-15 22:08 UTC (permalink / raw)
  To: Dotan Cohen; +Cc: git

Dotan Cohen <dotancohen@gmail.com> writes:

> What did you do before the bug happened?
> $ git log -8 --author=Shachar --grep=Revert --invert-grep
>
> What did you expect to happen?
> I expected to see the last 8 commits from Shachar that did not have
> the string "Revert" in the commit message.
>
> What happened instead?
> The list of commits included commits by authors other than Shachar.
>
> What's different between what you expected and what actually happened?
> The "--author" filter seems to be ignored when the "--invert-grep"
> option is used.
> I also tried to change the order of the options, but the results
> remained the same.

I think --author and --grep uses the same internal pattern matching
engine, so with --invert-grep, I would not be surprised if the
command looks for commits that do not have Revert and (or is that
or?  I dunno) not authored by Shachar.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git bug: Filter ignored when "--invert-grep" option is used.
  2021-12-15 22:08 ` Junio C Hamano
@ 2021-12-16 14:54   ` Dotan Cohen
  2021-12-16 19:42     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Dotan Cohen @ 2021-12-16 14:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> I think --author and --grep uses the same internal pattern matching
> engine, so with --invert-grep, I would not be surprised if the
> command looks for commits that do not have Revert and (or is that
> or?  I dunno) not authored by Shachar.

Possibly, but the flag is called --invert-grep not --invert-matches so
one would expect it to revert grep only. Though behaviour contrary to
user expectations is not an unusual property of git :)

Other than piping to e.g. awk or worse, how would one get the commits
by a particular author that do not have a specific string in the
commit message? Prettying to oneline would make the piping easier to
at least get the commit ids, but I'd like to see the whole commit
message and affected files.

Thanks, Junio.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git bug: Filter ignored when "--invert-grep" option is used.
  2021-12-16 14:54   ` Dotan Cohen
@ 2021-12-16 19:42     ` Junio C Hamano
  2021-12-17 16:48       ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2021-12-16 19:42 UTC (permalink / raw)
  To: Dotan Cohen; +Cc: git

Dotan Cohen <dotancohen@gmail.com> writes:

>> I think --author and --grep uses the same internal pattern matching
>> engine, so with --invert-grep, I would not be surprised if the
>> command looks for commits that do not have Revert and (or is that
>> or?  I dunno) not authored by Shachar.
>
> Possibly, but the flag is called --invert-grep not --invert-matches so
> one would expect it to revert grep only.

That is an actionable improvement idea to introduce a synonym ;-)

But in general, the way the internal "git grep" machinery is exposed
to the commands in the "git log" family is very limited.  With "git
grep", it is quite straight-forward to say "report hits for lines
that has this but not that"

    $ git grep -e this --and --not -e that

but because that the commands in the "log" family already use
"--not" for a quite different purpose, "git log --grep" cannot even
express something similar, even to find hits on a single line, let
alone finding hits on two different lines (i.e. one on the "author"
header, the other in the message part, of the commit object).



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Git bug: Filter ignored when "--invert-grep" option is used.
  2021-12-16 19:42     ` Junio C Hamano
@ 2021-12-17 16:48       ` René Scharfe
  2021-12-17 18:16         ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2021-12-17 16:48 UTC (permalink / raw)
  To: Junio C Hamano, Dotan Cohen; +Cc: git, Christoph Junghans

Am 16.12.21 um 20:42 schrieb Junio C Hamano:
> Dotan Cohen <dotancohen@gmail.com> writes:
>
>>> I think --author and --grep uses the same internal pattern matching
>>> engine, so with --invert-grep, I would not be surprised if the
>>> command looks for commits that do not have Revert and (or is that
>>> or?  I dunno) not authored by Shachar.
>>
>> Possibly, but the flag is called --invert-grep not --invert-matches so
>> one would expect it to revert grep only.
>
> That is an actionable improvement idea to introduce a synonym ;-)

Documentation/rev-list-options.txt says about --invert-grep:

        Limit the commits output to ones with log message that do not
        match the pattern specified with `--grep=<pattern>`.

Both the option name and this sentence suggest that it only should
invert --grep, which makes sense to me.

> But in general, the way the internal "git grep" machinery is exposed
> to the commands in the "git log" family is very limited.  With "git
> grep", it is quite straight-forward to say "report hits for lines
> that has this but not that"
>
>     $ git grep -e this --and --not -e that
>
> but because that the commands in the "log" family already use
> "--not" for a quite different purpose, "git log --grep" cannot even
> express something similar, even to find hits on a single line, let
> alone finding hits on two different lines (i.e. one on the "author"
> header, the other in the message part, of the commit object).

Right, but we can pass in the necessary bit via struct grep_opt.
22dfa8a23d (log: teach --invert-grep option, 2015-01-12) even mentions
that this done in an earlier iteration of that feature.

Representing buffer-level operations like --all-match and this one as
expression nodes would be nice.  At least I suspect that would make
changing the behavior easier, without having to touch as many places.

Anyway, here's a patch that is intended to bring the code in line
with its documentation.  The multiple negations hurt my head, so I
may have snuck in some logic errors, though. :-/

--- >8 ---
Subject: [PATCH] log: let --invert-grep only invert --grep

The option --invert-grep is documented to filter out commits whose
messages match the --grep filters.  However, it also affects the
header matches (--author, --committer), which is not intended.

Move the handling of that option to grep.c, as only the code there can
distinguish between matches in the header from those in the message
body.  If --invert-grep is given then enable extended expressions (not
the regex type, we just need git grep's --not to work), negate the body
patterns and check if any of them match by piggy-backing on the
collect_hits mechanism of grep_source_1().

Collecting the matches in struct grep_opt is a bit iffy, but with
"last_shown" we have a precedent for writing state information to that
struct.

Reported-by: Dotan Cohen <dotancohen@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 grep.c         | 22 +++++++++++++++++++---
 grep.h         |  2 ++
 revision.c     |  4 ++--
 revision.h     |  2 --
 t/t4202-log.sh | 19 +++++++++++++++++++
 5 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/grep.c b/grep.c
index fe847a0111..beef5fe47e 100644
--- a/grep.c
+++ b/grep.c
@@ -699,6 +699,14 @@ static struct grep_expr *compile_pattern_expr(struct grep_pat **list)
 	return compile_pattern_or(list);
 }

+static struct grep_expr *grep_not_expr(struct grep_expr *expr)
+{
+	struct grep_expr *z = xcalloc(1, sizeof(*z));
+	z->node = GREP_NODE_NOT;
+	z->u.unary = expr;
+	return z;
+}
+
 static struct grep_expr *grep_true_expr(void)
 {
 	struct grep_expr *z = xcalloc(1, sizeof(*z));
@@ -797,7 +805,7 @@ void compile_grep_patterns(struct grep_opt *opt)
 		}
 	}

-	if (opt->all_match || header_expr)
+	if (opt->all_match || opt->no_body_match || header_expr)
 		opt->extended = 1;
 	else if (!opt->extended)
 		return;
@@ -808,6 +816,9 @@ void compile_grep_patterns(struct grep_opt *opt)
 	if (p)
 		die("incomplete pattern expression: %s", p->pattern);

+	if (opt->no_body_match && opt->pattern_expression)
+		opt->pattern_expression = grep_not_expr(opt->pattern_expression);
+
 	if (!header_expr)
 		return;

@@ -1057,6 +1068,8 @@ static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x,
 			if (h && (*col < 0 || tmp.rm_so < *col))
 				*col = tmp.rm_so;
 		}
+		if (x->u.atom->token == GREP_PATTERN_BODY)
+			opt->body_hit |= h;
 		break;
 	case GREP_NODE_NOT:
 		/*
@@ -1825,16 +1838,19 @@ int grep_source(struct grep_opt *opt, struct grep_source *gs)
 	 * we do not have to do the two-pass grep when we do not check
 	 * buffer-wide "all-match".
 	 */
-	if (!opt->all_match)
+	if (!opt->all_match && !opt->no_body_match)
 		return grep_source_1(opt, gs, 0);

 	/* Otherwise the toplevel "or" terms hit a bit differently.
 	 * We first clear hit markers from them.
 	 */
 	clr_hit_marker(opt->pattern_expression);
+	opt->body_hit = 0;
 	grep_source_1(opt, gs, 1);

-	if (!chk_hit_marker(opt->pattern_expression))
+	if (opt->all_match && !chk_hit_marker(opt->pattern_expression))
+		return 0;
+	if (opt->no_body_match && opt->body_hit)
 		return 0;

 	return grep_source_1(opt, gs, 0);
diff --git a/grep.h b/grep.h
index 3e8815c347..6a1f0ab017 100644
--- a/grep.h
+++ b/grep.h
@@ -148,6 +148,8 @@ struct grep_opt {
 	int word_regexp;
 	int fixed;
 	int all_match;
+	int no_body_match;
+	int body_hit;
 #define GREP_BINARY_DEFAULT	0
 #define GREP_BINARY_NOMATCH	1
 #define GREP_BINARY_TEXT	2
diff --git a/revision.c b/revision.c
index 1981a0859f..97a06bc8fe 100644
--- a/revision.c
+++ b/revision.c
@@ -2493,7 +2493,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
 	} else if (!strcmp(arg, "--invert-grep")) {
-		revs->invert_grep = 1;
+		revs->grep_filter.no_body_match = 1;
 	} else if ((argcount = parse_long_opt("encoding", argv, &optarg))) {
 		if (strcmp(optarg, "none"))
 			git_log_output_encoding = xstrdup(optarg);
@@ -3778,7 +3778,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 				     (char *)message, strlen(message));
 	strbuf_release(&buf);
 	unuse_commit_buffer(commit, message);
-	return opt->invert_grep ? !retval : retval;
+	return retval;
 }

 static inline int want_ancestry(const struct rev_info *revs)
diff --git a/revision.h b/revision.h
index 5578bb4720..3f66147bfd 100644
--- a/revision.h
+++ b/revision.h
@@ -246,8 +246,6 @@ struct rev_info {

 	/* Filter by commit log message */
 	struct grep_opt	grep_filter;
-	/* Negate the match of grep_filter */
-	int invert_grep;

 	/* Display history graph */
 	struct git_graph *graph;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 7884e3d46b..765742fdbc 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -2010,4 +2010,23 @@ test_expect_success 'log --end-of-options' '
        test_cmp expect actual
 '

+test_expect_success 'set up commits with different authors' '
+	git checkout --orphan authors &&
+	test_commit --author "Jim <jim@example.com>" jim_1 &&
+	test_commit --author "Val <val@example.com>" val_1 &&
+	test_commit --author "Val <val@example.com>" val_2 &&
+	test_commit --author "Jim <jim@example.com>" jim_2 &&
+	test_commit --author "Val <val@example.com>" val_3 &&
+	test_commit --author "Jim <jim@example.com>" jim_3
+'
+
+test_expect_success 'log --invert-grep --grep --author' '
+	cat >expect <<-\EOF &&
+	val_3
+	val_1
+	EOF
+	git log --format=%s --author=Val --grep 2 --invert-grep >actual &&
+	test_cmp expect actual
+'
+
 test_done
--
2.34.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Git bug: Filter ignored when "--invert-grep" option is used.
  2021-12-17 16:48       ` René Scharfe
@ 2021-12-17 18:16         ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2021-12-17 18:16 UTC (permalink / raw)
  To: René Scharfe; +Cc: Dotan Cohen, git, Christoph Junghans

René Scharfe <l.s.r@web.de> writes:

> Subject: [PATCH] log: let --invert-grep only invert --grep
>
> The option --invert-grep is documented to filter out commits whose
> messages match the --grep filters.  However, it also affects the
> header matches (--author, --committer), which is not intended.

I re-read the log message that introduced this feature, and I agree
with the "not intended" part.  I do not think the change itself was
even done with awareness that the header matches may also be
affected, and there is no test for it to see the interaction.

> Move the handling of that option to grep.c, as only the code there can
> distinguish between matches in the header from those in the message
> body.  If --invert-grep is given then enable extended expressions (not
> the regex type, we just need git grep's --not to work), negate the body
> patterns and check if any of them match by piggy-backing on the
> collect_hits mechanism of grep_source_1().

Nice.  The original says that --files-without-matches being a
negation of --files-with-matches was what triggered them to have the
bit in the revisions, not in grep_opt, by the way.

> Collecting the matches in struct grep_opt is a bit iffy, but with
> "last_shown" we have a precedent for writing state information to that
> struct.

I think this is perfectly fine.  apply_state, grep_opt,
diff_options, and rev_info are used the same way within their
subsystems to carry in options that affect behaviour, carry around
the state of the machinery, and carry out the result.  The word
"option" does make it sound it is an input-only thing, but others
are not much better ;-).

> diff --git a/grep.c b/grep.c
> index fe847a0111..beef5fe47e 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -699,6 +699,14 @@ static struct grep_expr *compile_pattern_expr(struct grep_pat **list)
>  	return compile_pattern_or(list);
>  }
>
> +static struct grep_expr *grep_not_expr(struct grep_expr *expr)
> +{
> +	struct grep_expr *z = xcalloc(1, sizeof(*z));
> +	z->node = GREP_NODE_NOT;
> +	z->u.unary = expr;
> +	return z;
> +}

A bit surprising to see that we already had GREP_NODE_NOT without a
helper to create a node.  Not updating compile_pattern_not() to use
this new helper does make this patch simpler to read by allowing
readers to focus on what matters, which is very much appreciaed.

The rest of the patch looks good to me, too.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-17 18:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-15  9:50 Git bug: Filter ignored when "--invert-grep" option is used Dotan Cohen
2021-12-15 22:08 ` Junio C Hamano
2021-12-16 14:54   ` Dotan Cohen
2021-12-16 19:42     ` Junio C Hamano
2021-12-17 16:48       ` René Scharfe
2021-12-17 18:16         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).