git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Junio C Hamano <gitster@pobox.com>, Dotan Cohen <dotancohen@gmail.com>
Cc: git@vger.kernel.org, Christoph Junghans <ottxor@gentoo.org>
Subject: Re: Git bug: Filter ignored when "--invert-grep" option is used.
Date: Fri, 17 Dec 2021 17:48:49 +0100	[thread overview]
Message-ID: <e2e7759e-aa97-1117-6df2-f93a12afb094@web.de> (raw)
In-Reply-To: <xmqqee6cbalb.fsf@gitster.g>

Am 16.12.21 um 20:42 schrieb Junio C Hamano:
> Dotan Cohen <dotancohen@gmail.com> writes:
>
>>> I think --author and --grep uses the same internal pattern matching
>>> engine, so with --invert-grep, I would not be surprised if the
>>> command looks for commits that do not have Revert and (or is that
>>> or?  I dunno) not authored by Shachar.
>>
>> Possibly, but the flag is called --invert-grep not --invert-matches so
>> one would expect it to revert grep only.
>
> That is an actionable improvement idea to introduce a synonym ;-)

Documentation/rev-list-options.txt says about --invert-grep:

        Limit the commits output to ones with log message that do not
        match the pattern specified with `--grep=<pattern>`.

Both the option name and this sentence suggest that it only should
invert --grep, which makes sense to me.

> But in general, the way the internal "git grep" machinery is exposed
> to the commands in the "git log" family is very limited.  With "git
> grep", it is quite straight-forward to say "report hits for lines
> that has this but not that"
>
>     $ git grep -e this --and --not -e that
>
> but because that the commands in the "log" family already use
> "--not" for a quite different purpose, "git log --grep" cannot even
> express something similar, even to find hits on a single line, let
> alone finding hits on two different lines (i.e. one on the "author"
> header, the other in the message part, of the commit object).

Right, but we can pass in the necessary bit via struct grep_opt.
22dfa8a23d (log: teach --invert-grep option, 2015-01-12) even mentions
that this done in an earlier iteration of that feature.

Representing buffer-level operations like --all-match and this one as
expression nodes would be nice.  At least I suspect that would make
changing the behavior easier, without having to touch as many places.

Anyway, here's a patch that is intended to bring the code in line
with its documentation.  The multiple negations hurt my head, so I
may have snuck in some logic errors, though. :-/

--- >8 ---
Subject: [PATCH] log: let --invert-grep only invert --grep

The option --invert-grep is documented to filter out commits whose
messages match the --grep filters.  However, it also affects the
header matches (--author, --committer), which is not intended.

Move the handling of that option to grep.c, as only the code there can
distinguish between matches in the header from those in the message
body.  If --invert-grep is given then enable extended expressions (not
the regex type, we just need git grep's --not to work), negate the body
patterns and check if any of them match by piggy-backing on the
collect_hits mechanism of grep_source_1().

Collecting the matches in struct grep_opt is a bit iffy, but with
"last_shown" we have a precedent for writing state information to that
struct.

Reported-by: Dotan Cohen <dotancohen@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 grep.c         | 22 +++++++++++++++++++---
 grep.h         |  2 ++
 revision.c     |  4 ++--
 revision.h     |  2 --
 t/t4202-log.sh | 19 +++++++++++++++++++
 5 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/grep.c b/grep.c
index fe847a0111..beef5fe47e 100644
--- a/grep.c
+++ b/grep.c
@@ -699,6 +699,14 @@ static struct grep_expr *compile_pattern_expr(struct grep_pat **list)
 	return compile_pattern_or(list);
 }

+static struct grep_expr *grep_not_expr(struct grep_expr *expr)
+{
+	struct grep_expr *z = xcalloc(1, sizeof(*z));
+	z->node = GREP_NODE_NOT;
+	z->u.unary = expr;
+	return z;
+}
+
 static struct grep_expr *grep_true_expr(void)
 {
 	struct grep_expr *z = xcalloc(1, sizeof(*z));
@@ -797,7 +805,7 @@ void compile_grep_patterns(struct grep_opt *opt)
 		}
 	}

-	if (opt->all_match || header_expr)
+	if (opt->all_match || opt->no_body_match || header_expr)
 		opt->extended = 1;
 	else if (!opt->extended)
 		return;
@@ -808,6 +816,9 @@ void compile_grep_patterns(struct grep_opt *opt)
 	if (p)
 		die("incomplete pattern expression: %s", p->pattern);

+	if (opt->no_body_match && opt->pattern_expression)
+		opt->pattern_expression = grep_not_expr(opt->pattern_expression);
+
 	if (!header_expr)
 		return;

@@ -1057,6 +1068,8 @@ static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x,
 			if (h && (*col < 0 || tmp.rm_so < *col))
 				*col = tmp.rm_so;
 		}
+		if (x->u.atom->token == GREP_PATTERN_BODY)
+			opt->body_hit |= h;
 		break;
 	case GREP_NODE_NOT:
 		/*
@@ -1825,16 +1838,19 @@ int grep_source(struct grep_opt *opt, struct grep_source *gs)
 	 * we do not have to do the two-pass grep when we do not check
 	 * buffer-wide "all-match".
 	 */
-	if (!opt->all_match)
+	if (!opt->all_match && !opt->no_body_match)
 		return grep_source_1(opt, gs, 0);

 	/* Otherwise the toplevel "or" terms hit a bit differently.
 	 * We first clear hit markers from them.
 	 */
 	clr_hit_marker(opt->pattern_expression);
+	opt->body_hit = 0;
 	grep_source_1(opt, gs, 1);

-	if (!chk_hit_marker(opt->pattern_expression))
+	if (opt->all_match && !chk_hit_marker(opt->pattern_expression))
+		return 0;
+	if (opt->no_body_match && opt->body_hit)
 		return 0;

 	return grep_source_1(opt, gs, 0);
diff --git a/grep.h b/grep.h
index 3e8815c347..6a1f0ab017 100644
--- a/grep.h
+++ b/grep.h
@@ -148,6 +148,8 @@ struct grep_opt {
 	int word_regexp;
 	int fixed;
 	int all_match;
+	int no_body_match;
+	int body_hit;
 #define GREP_BINARY_DEFAULT	0
 #define GREP_BINARY_NOMATCH	1
 #define GREP_BINARY_TEXT	2
diff --git a/revision.c b/revision.c
index 1981a0859f..97a06bc8fe 100644
--- a/revision.c
+++ b/revision.c
@@ -2493,7 +2493,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
 	} else if (!strcmp(arg, "--invert-grep")) {
-		revs->invert_grep = 1;
+		revs->grep_filter.no_body_match = 1;
 	} else if ((argcount = parse_long_opt("encoding", argv, &optarg))) {
 		if (strcmp(optarg, "none"))
 			git_log_output_encoding = xstrdup(optarg);
@@ -3778,7 +3778,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 				     (char *)message, strlen(message));
 	strbuf_release(&buf);
 	unuse_commit_buffer(commit, message);
-	return opt->invert_grep ? !retval : retval;
+	return retval;
 }

 static inline int want_ancestry(const struct rev_info *revs)
diff --git a/revision.h b/revision.h
index 5578bb4720..3f66147bfd 100644
--- a/revision.h
+++ b/revision.h
@@ -246,8 +246,6 @@ struct rev_info {

 	/* Filter by commit log message */
 	struct grep_opt	grep_filter;
-	/* Negate the match of grep_filter */
-	int invert_grep;

 	/* Display history graph */
 	struct git_graph *graph;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 7884e3d46b..765742fdbc 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -2010,4 +2010,23 @@ test_expect_success 'log --end-of-options' '
        test_cmp expect actual
 '

+test_expect_success 'set up commits with different authors' '
+	git checkout --orphan authors &&
+	test_commit --author "Jim <jim@example.com>" jim_1 &&
+	test_commit --author "Val <val@example.com>" val_1 &&
+	test_commit --author "Val <val@example.com>" val_2 &&
+	test_commit --author "Jim <jim@example.com>" jim_2 &&
+	test_commit --author "Val <val@example.com>" val_3 &&
+	test_commit --author "Jim <jim@example.com>" jim_3
+'
+
+test_expect_success 'log --invert-grep --grep --author' '
+	cat >expect <<-\EOF &&
+	val_3
+	val_1
+	EOF
+	git log --format=%s --author=Val --grep 2 --invert-grep >actual &&
+	test_cmp expect actual
+'
+
 test_done
--
2.34.0

  reply	other threads:[~2021-12-17 16:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15  9:50 Git bug: Filter ignored when "--invert-grep" option is used Dotan Cohen
2021-12-15 22:08 ` Junio C Hamano
2021-12-16 14:54   ` Dotan Cohen
2021-12-16 19:42     ` Junio C Hamano
2021-12-17 16:48       ` René Scharfe [this message]
2021-12-17 18:16         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2e7759e-aa97-1117-6df2-f93a12afb094@web.de \
    --to=l.s.r@web.de \
    --cc=dotancohen@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ottxor@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).