git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use
@ 2021-09-21  0:30 Hamza Mahfooz
  2021-09-21  0:30 ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
  2021-09-21  1:15 ` [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use Jeff King
  0 siblings, 2 replies; 10+ messages in thread
From: Hamza Mahfooz @ 2021-09-21  0:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Hamza Mahfooz

These changes are made in preparation of, the colorization support for the
"git log" subcommands that, rely on regex functionality (i.e. "--author",
"--committer" and "--grep"). These changes are necessary primarily because
the format of "bol" differs in the context that I require to use
match_one_pattern() in and because next_match() doesn't handle the case of
"ctx == GREP_CONTEXT_HEAD" at all. So, teach each function how to handle
the new cases.

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
---
v5: separate grep changes from pretty changes.

v6: rescope some variables.
---
 grep.c | 50 +++++++++++++++++++++++++++++++++++---------------
 grep.h |  3 +++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/grep.c b/grep.c
index 424a39591b..2901233865 100644
--- a/grep.c
+++ b/grep.c
@@ -956,26 +956,34 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 	const char *start = bol;
 
 	if ((p->token != GREP_PATTERN) &&
-	    ((p->token == GREP_PATTERN_HEAD) != (ctx == GREP_CONTEXT_HEAD)))
+	    ((p->token == GREP_PATTERN_HEAD) != (ctx == GREP_CONTEXT_HEAD)) &&
+	    ((p->token == GREP_PATTERN_BODY) != (ctx == GREP_CONTEXT_BODY)))
 		return 0;
 
 	if (p->token == GREP_PATTERN_HEAD) {
 		const char *field;
 		size_t len;
-		assert(p->field < ARRAY_SIZE(header_field));
-		field = header_field[p->field].field;
-		len = header_field[p->field].len;
-		if (strncmp(bol, field, len))
-			return 0;
-		bol += len;
+		const char *end = eol;
+
 		switch (p->field) {
 		case GREP_HEADER_AUTHOR:
 		case GREP_HEADER_COMMITTER:
 			saved_ch = strip_timestamp(bol, &eol);
+			if (eol == end)
+				goto again;
 			break;
 		default:
 			break;
 		}
+
+		assert(p->field < ARRAY_SIZE(header_field));
+		field = header_field[p->field].field;
+		len = header_field[p->field].len;
+
+		if (strncmp(bol, field, len))
+			goto restore;
+
+		bol += len;
 	}
 
  again:
@@ -1021,12 +1029,17 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 				goto again;
 		}
 	}
-	if (p->token == GREP_PATTERN_HEAD && saved_ch)
-		*eol = saved_ch;
+
 	if (hit) {
 		pmatch[0].rm_so += bol - start;
 		pmatch[0].rm_eo += bol - start;
 	}
+
+restore:
+	if (p->token == GREP_PATTERN_HEAD && saved_ch)
+		*eol = saved_ch;
+
+
 	return hit;
 }
 
@@ -1159,21 +1172,27 @@ static int match_next_pattern(struct grep_pat *p, char *bol, char *eol,
 	return 1;
 }
 
-static int next_match(struct grep_opt *opt, char *bol, char *eol,
-		      enum grep_context ctx, regmatch_t *pmatch, int eflags)
+int grep_next_match(struct grep_opt *opt, char *bol, char *eol,
+		    enum grep_context ctx, regmatch_t *pmatch,
+		    enum grep_header_field field, int eflags)
 {
 	struct grep_pat *p;
 	int hit = 0;
 
 	pmatch->rm_so = pmatch->rm_eo = -1;
 	if (bol < eol) {
-		for (p = opt->pattern_list; p; p = p->next) {
+		for (p = ((ctx == GREP_CONTEXT_HEAD)
+			   ? opt->header_list : opt->pattern_list);
+			  p; p = p->next) {
 			switch (p->token) {
 			case GREP_PATTERN: /* atom */
 			case GREP_PATTERN_HEAD:
 			case GREP_PATTERN_BODY:
-				hit |= match_next_pattern(p, bol, eol, ctx,
-							  pmatch, eflags);
+				if ((field == GREP_HEADER_FIELD_MAX) ||
+				    (p->field == field))
+					hit |= match_next_pattern(p, bol, eol,
+								  ctx, pmatch,
+								  eflags);
 				break;
 			default:
 				break;
@@ -1262,7 +1281,8 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
 				line_color = opt->colors[GREP_COLOR_FUNCTION];
 		}
 		*eol = '\0';
-		while (next_match(opt, bol, eol, ctx, &match, eflags)) {
+		while (grep_next_match(opt, bol, eol, ctx, &match,
+				       GREP_HEADER_FIELD_MAX, eflags)) {
 			if (match.rm_so == match.rm_eo)
 				break;
 
diff --git a/grep.h b/grep.h
index 72f82b1e30..d2943e29ea 100644
--- a/grep.h
+++ b/grep.h
@@ -177,6 +177,9 @@ void append_header_grep_pattern(struct grep_opt *, enum grep_header_field, const
 void compile_grep_patterns(struct grep_opt *opt);
 void free_grep_patterns(struct grep_opt *opt);
 int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size);
+int grep_next_match(struct grep_opt *opt, char *bol, char *eol,
+		    enum grep_context ctx, regmatch_t *pmatch,
+		    enum grep_header_field field, int eflags);
 
 struct grep_source {
 	char *name;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/2] pretty: colorize pattern matches in commit messages
  2021-09-21  0:30 [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use Hamza Mahfooz
@ 2021-09-21  0:30 ` Hamza Mahfooz
  2021-09-21  1:24   ` Jeff King
  2021-09-21  1:15 ` [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Hamza Mahfooz @ 2021-09-21  0:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Hamza Mahfooz

The "git log" command limits its output to the commits that contain strings
matched by a pattern when the "--grep=<pattern>" option is used, but unlike
output from "git grep -e <pattern>", the matches are not highlighted,
making them harder to spot.

Teach the pretty-printer code to highlight matches from the
"--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
options (to view the last one, you may have to ask for --pretty=fuller).

Also, it must be noted that we are effectively greping the content twice,
however it only slows down "git log --author=^H" on this repository by
around 1-2% (compared to v2.33.0), so it should be a small enough slow
down to justify the addition of the feature.

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
---
v2: make the commit message whole (add the missing ingredients), rename
    append_matched_line() to append_line_with_color(), use
    colors[GREP_COLOR_MATCH_SELECTED] instead of
    colors[GREP_COLOR_MATCH_CONTEXT], allow the background color to be
    customized, don't copy strings to a buffer when not coloring in
    append_line_with_color(), rename next_match() to grep_next_match(),
    repurpose grep_next_match()/match_one_pattern() for use in
    append_line_with_color() (allowing us to remove duplicated matching
    code in append_line_with_color()), document how to customize the
    feature and modify some of the tests to fit the feature better.

v3: fix a formatting issue with the added documentation.

v4: add strbuf_add_with_color(), use the correct color code scheme in the
    unit tests and add more unit tests.

v5: separate grep changes from pretty changes and add some performance
    analysis in the commit message.

v6: put the documentation in the correct place, cleanup pretty.c and
    format the unit tests according to the current convention.
---
 Documentation/config/color.txt |   7 ++-
 pretty.c                       | 107 +++++++++++++++++++++++++++++----
 t/t4202-log.sh                 |  51 ++++++++++++++++
 3 files changed, 151 insertions(+), 14 deletions(-)

diff --git a/Documentation/config/color.txt b/Documentation/config/color.txt
index e05d520a86..91d9a9da32 100644
--- a/Documentation/config/color.txt
+++ b/Documentation/config/color.txt
@@ -104,9 +104,12 @@ color.grep.<slot>::
 `matchContext`;;
 	matching text in context lines
 `matchSelected`;;
-	matching text in selected lines
+	matching text in selected lines. Also, used to customize the following
+	linkgit:git-log[1] subcommands: `--grep`, `--author` and `--committer`.
 `selected`;;
-	non-matching text in selected lines
+	non-matching text in selected lines. Also, used to customize the
+	following linkgit:git-log[1] subcommands: `--grep`, `--author` and
+	`--committer`.
 `separator`;;
 	separators between fields on a line (`:`, `-`, and `=`)
 	and between hunks (`--`)
diff --git a/pretty.c b/pretty.c
index 73b5ead509..943a2d2ee2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -431,6 +431,56 @@ const char *show_ident_date(const struct ident_split *ident,
 	return show_date(date, tz, mode);
 }
 
+static inline void strbuf_add_with_color(struct strbuf *sb, const char *color,
+					 char *buf, size_t buflen)
+{
+	strbuf_addstr(sb, color);
+	strbuf_add(sb, buf, buflen);
+	if (*color)
+		strbuf_addstr(sb, GIT_COLOR_RESET);
+}
+
+static void append_line_with_color(struct strbuf *sb, struct grep_opt *opt,
+				   const char *line, size_t linelen,
+				   int color, enum grep_context ctx,
+				   enum grep_header_field field)
+{
+	char *buf, *eol;
+	const char *line_color, *match_color;
+	regmatch_t match;
+	int eflags = 0;
+
+	if (!opt || !want_color(color) || opt->invert) {
+		strbuf_add(sb, line, linelen);
+		return;
+	}
+
+	buf = (char *)line;
+	eol = buf + linelen;
+
+	line_color = opt->colors[GREP_COLOR_SELECTED];
+	match_color = opt->colors[GREP_COLOR_MATCH_SELECTED];
+
+	while (grep_next_match(opt, buf, eol, ctx, &match, field, eflags)) {
+		if (match.rm_so == match.rm_eo)
+			break;
+
+		strbuf_grow(sb, strlen(line_color) + strlen(match_color) +
+			    (2 * strlen(GIT_COLOR_RESET)));
+		strbuf_add_with_color(sb, line_color, buf, match.rm_so);
+		strbuf_add_with_color(sb, match_color, buf + match.rm_so,
+				      match.rm_eo - match.rm_so);
+		buf += match.rm_eo;
+		eflags = REG_NOTBOL;
+	}
+
+	if (eflags) {
+		strbuf_grow(sb, strlen(line_color) + strlen(GIT_COLOR_RESET));
+		strbuf_add_with_color(sb, line_color, buf, eol - buf);
+	} else
+		strbuf_add(sb, buf, eol - buf);
+}
+
 void pp_user_info(struct pretty_print_context *pp,
 		  const char *what, struct strbuf *sb,
 		  const char *line, const char *encoding)
@@ -496,9 +546,28 @@ void pp_user_info(struct pretty_print_context *pp,
 			strbuf_addch(sb, '\n');
 		strbuf_addf(sb, " <%.*s>\n", (int)maillen, mailbuf);
 	} else {
-		strbuf_addf(sb, "%s: %.*s%.*s <%.*s>\n", what,
-			    (pp->fmt == CMIT_FMT_FULLER) ? 4 : 0, "    ",
-			    (int)namelen, namebuf, (int)maillen, mailbuf);
+		struct strbuf id;
+		enum grep_header_field field = GREP_HEADER_FIELD_MAX;
+		struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
+
+		strbuf_init(&id, namelen + maillen + 4);
+
+		if (!strcmp(what, "Author"))
+			field = GREP_HEADER_AUTHOR;
+		else if (!strcmp(what, "Commit"))
+			field = GREP_HEADER_COMMITTER;
+
+		strbuf_addf(sb, "%s: ", what);
+		if (pp->fmt == CMIT_FMT_FULLER)
+			strbuf_addchars(sb, ' ', 4);
+
+		strbuf_addf(&id, "%.*s <%.*s>", (int)namelen, namebuf,
+			    (int)maillen, mailbuf);
+
+		append_line_with_color(sb, opt, id.buf, id.len, pp->color,
+				       GREP_CONTEXT_HEAD, field);
+		strbuf_addch(sb, '\n');
+		strbuf_release(&id);
 	}
 
 	switch (pp->fmt) {
@@ -1939,8 +2008,9 @@ static int pp_utf8_width(const char *start, const char *end)
 	return width;
 }
 
-static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
-				 const char *line, int linelen)
+static void strbuf_add_tabexpand(struct strbuf *sb, struct grep_opt *opt,
+				 int color, int tabwidth, const char *line,
+				 int linelen)
 {
 	const char *tab;
 
@@ -1957,7 +2027,9 @@ static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
 			break;
 
 		/* Output the data .. */
-		strbuf_add(sb, line, tab - line);
+		append_line_with_color(sb, opt, line, tab - line, color,
+				       GREP_CONTEXT_BODY,
+				       GREP_HEADER_FIELD_MAX);
 
 		/* .. and the de-tabified tab */
 		strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth));
@@ -1972,7 +2044,8 @@ static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
 	 * worrying about width - there's nothing more to
 	 * align.
 	 */
-	strbuf_add(sb, line, linelen);
+	append_line_with_color(sb, opt, line, linelen, color, GREP_CONTEXT_BODY,
+			       GREP_HEADER_FIELD_MAX);
 }
 
 /*
@@ -1984,11 +2057,16 @@ static void pp_handle_indent(struct pretty_print_context *pp,
 			     struct strbuf *sb, int indent,
 			     const char *line, int linelen)
 {
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
+
 	strbuf_addchars(sb, ' ', indent);
 	if (pp->expand_tabs_in_log)
-		strbuf_add_tabexpand(sb, pp->expand_tabs_in_log, line, linelen);
+		strbuf_add_tabexpand(sb, opt, pp->color, pp->expand_tabs_in_log,
+				     line, linelen);
 	else
-		strbuf_add(sb, line, linelen);
+		append_line_with_color(sb, opt, line, linelen, pp->color,
+				       GREP_CONTEXT_BODY,
+				       GREP_HEADER_FIELD_MAX);
 }
 
 static int is_mboxrd_from(const char *line, int len)
@@ -2006,7 +2084,9 @@ void pp_remainder(struct pretty_print_context *pp,
 		  struct strbuf *sb,
 		  int indent)
 {
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
 	int first = 1;
+
 	for (;;) {
 		const char *line = *msg_p;
 		int linelen = get_one_line(line);
@@ -2027,14 +2107,17 @@ void pp_remainder(struct pretty_print_context *pp,
 		if (indent)
 			pp_handle_indent(pp, sb, indent, line, linelen);
 		else if (pp->expand_tabs_in_log)
-			strbuf_add_tabexpand(sb, pp->expand_tabs_in_log,
-					     line, linelen);
+			strbuf_add_tabexpand(sb, opt, pp->color,
+					     pp->expand_tabs_in_log, line,
+					     linelen);
 		else {
 			if (pp->fmt == CMIT_FMT_MBOXRD &&
 					is_mboxrd_from(line, linelen))
 				strbuf_addch(sb, '>');
 
-			strbuf_add(sb, line, linelen);
+			append_line_with_color(sb, opt, line, linelen,
+					       pp->color, GREP_CONTEXT_BODY,
+					       GREP_HEADER_FIELD_MAX);
 		}
 		strbuf_addch(sb, '\n');
 	}
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 9dfead936b..3d240bba57 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -449,6 +449,57 @@ test_expect_success !FAIL_PREREQS 'log with various grep.patternType configurati
 	)
 '
 
+test_expect_success 'log --author' '
+	cat >expect <<-\EOF &&
+	Author: <BOLD;RED>A U<RESET> Thor <author@example.com>
+	EOF
+	git log -1 --color=always --author="A U" >log &&
+	grep Author log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --committer' '
+	cat >expect <<-\EOF &&
+	Commit:     C O Mitter <committer@<BOLD;RED>example<RESET>.com>
+	EOF
+	git log -1 --color=always --pretty=fuller --committer="example" >log &&
+	grep "Commit:" log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log -i --grep with color' '
+	cat >expect <<-\EOF &&
+	    <BOLD;RED>Sec<RESET>ond
+	    <BOLD;RED>sec<RESET>ond
+	EOF
+	git log --color=always -i --grep=^sec >log &&
+	grep -i sec log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '-c color.grep.selected log --grep' '
+	cat >expect <<-\EOF &&
+	    <GREEN>th<RESET><BOLD;RED>ir<RESET><GREEN>d<RESET>
+	EOF
+	git -c color.grep.selected="green" log --color=always --grep=ir >log &&
+	grep ir log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '-c color.grep.matchSelected log --grep' '
+	cat >expect <<-\EOF &&
+	    <BLUE>i<RESET>n<BLUE>i<RESET>t<BLUE>i<RESET>al
+	EOF
+	git -c color.grep.matchSelected="blue" log --color=always --grep=i >log &&
+	grep al log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
 cat > expect <<EOF
 * Second
 * sixth
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use
  2021-09-21  0:30 [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use Hamza Mahfooz
  2021-09-21  0:30 ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
@ 2021-09-21  1:15 ` Jeff King
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff King @ 2021-09-21  1:15 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

On Mon, Sep 20, 2021 at 08:30:49PM -0400, Hamza Mahfooz wrote:

> diff --git a/grep.c b/grep.c
> index 424a39591b..2901233865 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -956,26 +956,34 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
>  	const char *start = bol;
>  
>  	if ((p->token != GREP_PATTERN) &&
> -	    ((p->token == GREP_PATTERN_HEAD) != (ctx == GREP_CONTEXT_HEAD)))
> +	    ((p->token == GREP_PATTERN_HEAD) != (ctx == GREP_CONTEXT_HEAD)) &&
> +	    ((p->token == GREP_PATTERN_BODY) != (ctx == GREP_CONTEXT_BODY)))
>  		return 0;
>  
>  	if (p->token == GREP_PATTERN_HEAD) {
>  		const char *field;
>  		size_t len;
> -		assert(p->field < ARRAY_SIZE(header_field));
> -		field = header_field[p->field].field;
> -		len = header_field[p->field].len;
> -		if (strncmp(bol, field, len))
> -			return 0;
> -		bol += len;
> +		const char *end = eol;
> +
>  		switch (p->field) {
>  		case GREP_HEADER_AUTHOR:
>  		case GREP_HEADER_COMMITTER:
>  			saved_ch = strip_timestamp(bol, &eol);
> +			if (eol == end)
> +				goto again;

I'm not sure if this part is right. If we didn't strip any timestamp,
then we jump to the "again" label, where we actually try to match the
pattern.

But that means we skip the part you deleted above, which got moved down
here:

>  			break;
>  		default:
>  			break;
>  		}
> +
> +		assert(p->field < ARRAY_SIZE(header_field));
> +		field = header_field[p->field].field;
> +		len = header_field[p->field].len;
> +
> +		if (strncmp(bol, field, len))
> +			goto restore;
> +
> +		bol += len;
>  	}

And so we do not check that we have the right field at all. And as a
result, we may return nonsense results. For example, try this in
git.git:

  git log -1 --author=junio 1462b67bc893fc845d28e2748c20357cb16a5ce3

It currently returns no results, because the match is case-sensitive (so
it does not match "Junio" in the author field). But with your patch, it
prints t hat commit (1462b67bc), because it matches a line buried in the
mergetag header ("tag post183-for-junio").

That pattern is how I actually stumbled across it, but an even more
obvious version is:

  git log --author=commit

Currently that returns one result (somebody who has the word "commit" in
their email address). But after your patch, it returns a ton of tag
merges (because they all have "type commit" in their mergetag headers).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] pretty: colorize pattern matches in commit messages
  2021-09-21  0:30 ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
@ 2021-09-21  1:24   ` Jeff King
  2021-09-21  1:39     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2021-09-21  1:24 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

On Mon, Sep 20, 2021 at 08:30:50PM -0400, Hamza Mahfooz wrote:

> Teach the pretty-printer code to highlight matches from the
> "--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
> options (to view the last one, you may have to ask for --pretty=fuller).
> 
> Also, it must be noted that we are effectively greping the content twice,
> however it only slows down "git log --author=^H" on this repository by
> around 1-2% (compared to v2.33.0), so it should be a small enough slow
> down to justify the addition of the feature.

This might or might not be related, but one thing I noticed is that your
earlier patch causes us to grep a lot more lines than we mean to (even
if we are looking for "author" lines, it greps every header line). That
might contribute to the slowdown. Likewise, it calls strip_timestamp()
on every line, even if it does not start with "author").

> +static inline void strbuf_add_with_color(struct strbuf *sb, const char *color,
> +					 char *buf, size_t buflen)
> +{
> +	strbuf_addstr(sb, color);
> +	strbuf_add(sb, buf, buflen);
> +	if (*color)
> +		strbuf_addstr(sb, GIT_COLOR_RESET);
> +}

You could take "buf" as a "const char *" here. That doesn't matter too
much for now, but see below.

> +static void append_line_with_color(struct strbuf *sb, struct grep_opt *opt,
> +				   const char *line, size_t linelen,
> +				   int color, enum grep_context ctx,
> +				   enum grep_header_field field)
> +{
> +	char *buf, *eol;
> +	const char *line_color, *match_color;
> +	regmatch_t match;
> +	int eflags = 0;
> +
> +	if (!opt || !want_color(color) || opt->invert) {
> +		strbuf_add(sb, line, linelen);
> +		return;
> +	}
> +
> +	buf = (char *)line;
> +	eol = buf + linelen;

OK, so we got rid of the copy of "line", which is nice. But we are
casting away const-ness, which is a potential red flag (is somebody
going to modify this string, even though we promised our caller we would
not?). We'd probably want a comment to explain why we are doing so, and
why it is OK (e.g., if somebody in the call stack modifies it
temporarily).

More on this in a moment.

> +	while (grep_next_match(opt, buf, eol, ctx, &match, field, eflags)) {
> +		if (match.rm_so == match.rm_eo)
> +			break;
> +
> +		strbuf_grow(sb, strlen(line_color) + strlen(match_color) +
> +			    (2 * strlen(GIT_COLOR_RESET)));
> +		strbuf_add_with_color(sb, line_color, buf, match.rm_so);
> +		strbuf_add_with_color(sb, match_color, buf + match.rm_so,
> +				      match.rm_eo - match.rm_so);

As Eric mentioned, these strbuf_grow() calls can go away. The whole
point of strbuf is that we do not have to clutter the code with manual
size computations, because it will do the right thing automatically.

Sometimes you can get extra performance by pre-sizing the strbuf, but:

  1. I'd be surprised if we did in this case. We're writing into a
     strbuf that will receive the whole per-commit output, so any growth
     cost due to a couple of short strings here would be amortized
     anyway.

  2. The computation here doesn't represent the needed growth anyway.
     When we call strbuf_add_with_color(), it's going to add not just
     the colors but all of the data for the line itself.

So at best it's doing nothing, and at worst it is making the code harder
to understand.

> +	if (eflags) {
> +		strbuf_grow(sb, strlen(line_color) + strlen(GIT_COLOR_RESET));
> +		strbuf_add_with_color(sb, line_color, buf, eol - buf);
> +	} else
> +		strbuf_add(sb, buf, eol - buf);
> +}

Ditto here (we grow for the colors, but also end up adding "eol - buf"
bytes).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] pretty: colorize pattern matches in commit messages
  2021-09-21  1:24   ` Jeff King
@ 2021-09-21  1:39     ` Jeff King
  2021-09-21  1:41       ` [PATCH 1/2] grep: stop modifying buffer in strip_timestamp Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jeff King @ 2021-09-21  1:39 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

On Mon, Sep 20, 2021 at 09:24:08PM -0400, Jeff King wrote:

> > +	buf = (char *)line;
> > +	eol = buf + linelen;
> 
> OK, so we got rid of the copy of "line", which is nice. But we are
> casting away const-ness, which is a potential red flag (is somebody
> going to modify this string, even though we promised our caller we would
> not?). We'd probably want a comment to explain why we are doing so, and
> why it is OK (e.g., if somebody in the call stack modifies it
> temporarily).
> 
> More on this in a moment.

The root of the issue is that grep_next_match() takes a non-const
buffer, and so on. And indeed, it _does_ eventually get modified,
although only temporarily. I think we can clean that up, though.

Here are two patches I prepared on top of your series to show what's
possible, though I think we should do one of:

  - put them at the front of your series (with the appropriate
    adjustments) as preparatory cleanup

  - keep them separate. You can put a comment above the cast to mention
    what's going on and why it's OK for now, and then later when they're
    both merged, we can remove that cast.

The second option creates a little extra work for the maintainer (they
both touch match_one_patter(), so there will be some textual conflicts).
But it does mean we avoid a dependencies; the cleanups don't derail your
series, nor does your series hold up the cleanups. So I could go either
way.

  [1/2]: grep: stop modifying buffer in strip_timestamp
  [2/2]: grep: mark "haystack" buffers as const

 grep.c   | 30 ++++++++++++------------------
 grep.h   |  3 ++-
 pretty.c |  6 +++---
 3 files changed, 17 insertions(+), 22 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] grep: stop modifying buffer in strip_timestamp
  2021-09-21  1:39     ` Jeff King
@ 2021-09-21  1:41       ` Jeff King
  2021-09-21  1:43       ` [PATCH 2/2] grep: mark "haystack" buffers as const Jeff King
  2021-09-21  2:38       ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
  2 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2021-09-21  1:41 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

When grepping for headers in commit objects, we receive individual
lines (e.g., "author Name <email> 1234 -0000"), and then strip off the
timestamp to do our match. We do so by writing a NUL byte over the
whitespace separator, and then remembering to restore it later.

We had to do it this way when this was added back in a4d7d2c6db (log
--author/--committer: really match only with name part, 2008-09-04),
because we fed the result directly to regexec(), which expects a
NUL-terminated string. But since b7d36ffca0 (regex: use regexec_buf(),
2016-09-21), we have a function which can match on part of a buffer.

So instead of modifying the string, we can instead just move the "eol"
pointer, and the rest of the code will do the right thing. This will let
the next patch make more use of "const" in grep functions.

Signed-off-by: Jeff King <peff@peff.net>
---
I think this is fairly safe. It would fail subtly if somebody _did_ rely
on the NUL, but we just call into patmatch(), which calls functions
which handle the length-delimited buffer.

 grep.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/grep.c b/grep.c
index 35708ce973..3b372ec29d 100644
--- a/grep.c
+++ b/grep.c
@@ -922,20 +922,16 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 	return hit;
 }
 
-static int strip_timestamp(char *bol, char **eol_p)
+static void strip_timestamp(char *bol, char **eol_p)
 {
 	char *eol = *eol_p;
-	int ch;
 
 	while (bol < --eol) {
 		if (*eol != '>')
 			continue;
 		*eol_p = ++eol;
-		ch = *eol;
-		*eol = '\0';
-		return ch;
+		break;
 	}
-	return 0;
 }
 
 static struct {
@@ -952,7 +948,6 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 			     regmatch_t *pmatch, int eflags)
 {
 	int hit = 0;
-	int saved_ch = 0;
 	const char *start = bol;
 
 	if ((p->token != GREP_PATTERN) &&
@@ -968,7 +963,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 		switch (p->field) {
 		case GREP_HEADER_AUTHOR:
 		case GREP_HEADER_COMMITTER:
-			saved_ch = strip_timestamp(bol, &eol);
+			strip_timestamp(bol, &eol);
 			if (eol == end)
 				goto again;
 			break;
@@ -981,7 +976,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 		len = header_field[p->field].len;
 
 		if (strncmp(bol, field, len))
-			goto restore;
+			return 0;
 
 		bol += len;
 	}
@@ -1035,11 +1030,6 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 		pmatch[0].rm_eo += bol - start;
 	}
 
-restore:
-	if (p->token == GREP_PATTERN_HEAD && saved_ch)
-		*eol = saved_ch;
-
-
 	return hit;
 }
 
-- 
2.33.0.1023.gc687d0d3c8


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] grep: mark "haystack" buffers as const
  2021-09-21  1:39     ` Jeff King
  2021-09-21  1:41       ` [PATCH 1/2] grep: stop modifying buffer in strip_timestamp Jeff King
@ 2021-09-21  1:43       ` Jeff King
  2021-09-21  2:05         ` Jeff King
  2021-09-21  2:38       ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
  2 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2021-09-21  1:43 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

When we're grepping in a buffer, we don't need to modify it. So we can
take "const char *" buffers, rather than "char *". This can avoid some
awkward casts in our callers.

Signed-off-by: Jeff King <peff@peff.net>
---
This step should be quite safe, because we're not changing any behavior,
and the compiler will alert us if we missed any spots.

There may be further cleanup possible that the compiler doesn't tell us
about (i.e., other callers which have "char *" but could themselves
tighten to "const char *"), since that implicit conversion is OK without
a cast.

 grep.c   | 12 ++++++++----
 grep.h   |  3 ++-
 pretty.c |  6 +++---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/grep.c b/grep.c
index 3b372ec29d..2cb65d191f 100644
--- a/grep.c
+++ b/grep.c
@@ -908,7 +908,8 @@ static void show_name(struct grep_opt *opt, const char *name)
 	opt->output(opt, opt->null_following_name ? "\0" : "\n", 1);
 }
 
-static int patmatch(struct grep_pat *p, char *line, char *eol,
+static int patmatch(struct grep_pat *p,
+		    const char *line, const char *eol,
 		    regmatch_t *match, int eflags)
 {
 	int hit;
@@ -943,7 +944,8 @@ static struct {
 	{ "reflog ", 7 },
 };
 
-static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
+static int match_one_pattern(struct grep_pat *p,
+			     const char *bol, const char *eol,
 			     enum grep_context ctx,
 			     regmatch_t *pmatch, int eflags)
 {
@@ -1141,7 +1143,8 @@ static int match_line(struct grep_opt *opt, char *bol, char *eol,
 	return hit;
 }
 
-static int match_next_pattern(struct grep_pat *p, char *bol, char *eol,
+static int match_next_pattern(struct grep_pat *p,
+			      const char *bol, const char *eol,
 			      enum grep_context ctx,
 			      regmatch_t *pmatch, int eflags)
 {
@@ -1162,7 +1165,8 @@ static int match_next_pattern(struct grep_pat *p, char *bol, char *eol,
 	return 1;
 }
 
-int grep_next_match(struct grep_opt *opt, char *bol, char *eol,
+int grep_next_match(struct grep_opt *opt,
+		    const char *bol, const char *eol,
 		    enum grep_context ctx, regmatch_t *pmatch,
 		    enum grep_header_field field, int eflags)
 {
diff --git a/grep.h b/grep.h
index b82e5de982..a1880899ba 100644
--- a/grep.h
+++ b/grep.h
@@ -190,7 +190,8 @@ void append_header_grep_pattern(struct grep_opt *, enum grep_header_field, const
 void compile_grep_patterns(struct grep_opt *opt);
 void free_grep_patterns(struct grep_opt *opt);
 int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size);
-int grep_next_match(struct grep_opt *opt, char *bol, char *eol,
+int grep_next_match(struct grep_opt *opt,
+		    const char *bol, const char *eol,
 		    enum grep_context ctx, regmatch_t *pmatch,
 		    enum grep_header_field field, int eflags);
 
diff --git a/pretty.c b/pretty.c
index 943a2d2ee2..be4efd9364 100644
--- a/pretty.c
+++ b/pretty.c
@@ -432,7 +432,7 @@ const char *show_ident_date(const struct ident_split *ident,
 }
 
 static inline void strbuf_add_with_color(struct strbuf *sb, const char *color,
-					 char *buf, size_t buflen)
+					 const char *buf, size_t buflen)
 {
 	strbuf_addstr(sb, color);
 	strbuf_add(sb, buf, buflen);
@@ -445,7 +445,7 @@ static void append_line_with_color(struct strbuf *sb, struct grep_opt *opt,
 				   int color, enum grep_context ctx,
 				   enum grep_header_field field)
 {
-	char *buf, *eol;
+	const char *buf, *eol;
 	const char *line_color, *match_color;
 	regmatch_t match;
 	int eflags = 0;
@@ -455,7 +455,7 @@ static void append_line_with_color(struct strbuf *sb, struct grep_opt *opt,
 		return;
 	}
 
-	buf = (char *)line;
+	buf = line;
 	eol = buf + linelen;
 
 	line_color = opt->colors[GREP_COLOR_SELECTED];
-- 
2.33.0.1023.gc687d0d3c8

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] grep: mark "haystack" buffers as const
  2021-09-21  1:43       ` [PATCH 2/2] grep: mark "haystack" buffers as const Jeff King
@ 2021-09-21  2:05         ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2021-09-21  2:05 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

On Mon, Sep 20, 2021 at 09:43:26PM -0400, Jeff King wrote:

> When we're grepping in a buffer, we don't need to modify it. So we can
> take "const char *" buffers, rather than "char *". This can avoid some
> awkward casts in our callers.

Sorry, this patch should have touched strip_timestamp(), too. I had
originally done it in the earlier patch, but because we pass a
pointer-to-pointer, the compiler got mad about the mis-matched type from
the caller. So I reverted it there, but forgot to add it back in here.

At any rate, these are mostly for illustration. They'd need a little
rebasing if not put on top of your patches anyway.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] pretty: colorize pattern matches in commit messages
  2021-09-21  1:39     ` Jeff King
  2021-09-21  1:41       ` [PATCH 1/2] grep: stop modifying buffer in strip_timestamp Jeff King
  2021-09-21  1:43       ` [PATCH 2/2] grep: mark "haystack" buffers as const Jeff King
@ 2021-09-21  2:38       ` Hamza Mahfooz
  2021-09-21  3:15         ` Jeff King
  2 siblings, 1 reply; 10+ messages in thread
From: Hamza Mahfooz @ 2021-09-21  2:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano


On Mon, Sep 20 2021 at 09:39:27 PM -0400, Jeff King <peff@peff.net> 
wrote:
> Here are two patches I prepared on top of your series to show what's
> possible, though I think we should do one of:
> 
>   - put them at the front of your series (with the appropriate
>     adjustments) as preparatory cleanup
> 
>   - keep them separate. You can put a comment above the cast to 
> mention
>     what's going on and why it's OK for now, and then later when 
> they're
>     both merged, we can remove that cast.

Option 1 is preferable from my perspective, in that case.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/2] pretty: colorize pattern matches in commit messages
  2021-09-21  2:38       ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
@ 2021-09-21  3:15         ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2021-09-21  3:15 UTC (permalink / raw)
  To: Hamza Mahfooz; +Cc: git, Junio C Hamano

On Mon, Sep 20, 2021 at 10:38:10PM -0400, Hamza Mahfooz wrote:

> 
> On Mon, Sep 20 2021 at 09:39:27 PM -0400, Jeff King <peff@peff.net> wrote:
> > Here are two patches I prepared on top of your series to show what's
> > possible, though I think we should do one of:
> > 
> >   - put them at the front of your series (with the appropriate
> >     adjustments) as preparatory cleanup
> > 
> >   - keep them separate. You can put a comment above the cast to mention
> >     what's going on and why it's OK for now, and then later when they're
> >     both merged, we can remove that cast.
> 
> Option 1 is preferable from my perspective, in that case.

OK. The patches I showed were the minimum to get your series working,
but there's actually a bit more cleanup we can do. I'll post a new
series in a moment, and then you can build on top of that.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-21  3:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21  0:30 [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use Hamza Mahfooz
2021-09-21  0:30 ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
2021-09-21  1:24   ` Jeff King
2021-09-21  1:39     ` Jeff King
2021-09-21  1:41       ` [PATCH 1/2] grep: stop modifying buffer in strip_timestamp Jeff King
2021-09-21  1:43       ` [PATCH 2/2] grep: mark "haystack" buffers as const Jeff King
2021-09-21  2:05         ` Jeff King
2021-09-21  2:38       ` [PATCH v6 2/2] pretty: colorize pattern matches in commit messages Hamza Mahfooz
2021-09-21  3:15         ` Jeff King
2021-09-21  1:15 ` [PATCH v6 1/2] grep: refactor next_match() and match_one_pattern() for external use Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).