git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Antoine Pelisse <apelisse@gmail.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH v2] log: grep author/committer using mailmap
Date: Thu, 27 Dec 2012 10:48:56 -0800	[thread overview]
Message-ID: <7vwqw3l49z.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <7v1uebmizx.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Thu, 27 Dec 2012 10:45:38 -0800")

Junio C Hamano <gitster@pobox.com> writes:

> Thanks.  I'll queue this on top.
>
> -- >8 --
> Subject: [PATCH] log --use-mailmap: optimize for cases without --author/--committer search

And this I will *not* queue further on top.

-- >8 --
Subject: [PATCH] [DO NOT USE] log --use-mailmap --author/--committer: further
 optimize identity rewriting

We used to always allocate a temporary buffer to search for
author/committer names even when the author/committer does not
appear in the mailmap.  Update the logic to do the allocation
on demand to reduce needless malloc()/free() calls.

It turns out that this does not affect performance at all; all the
time is spent in map_user() which is a look-up in string_list, so
let's not use this patch in the production, but it illustrates an
interesting point.

In map_identities() function, we already know who the author
recorded in the commit is, either in "author" strbuf, or in buffer
between [a_at..a_len], so we could grep_buffer() the author
regexp(s) specified from the command line right there, and combine
that result with the main grep_buffer() done for the --grep= option
at the end of the commit_match() function.

That may essentially amount to going in the totally opposite
direction from what 2d10c55 (git log: Unify header_filter and
message_filter into one., 2006-09-20) attempted to do.  We used to
have two grep expressions (one for header, the other one for body)
commit_match() runs grep_buffer() on and combined the results.
2d10c55 merged them into one grep expression by introducing a term
that matches only header elements.  But we would instead split the
"header" expression into "author" and "committer" expressions
(making it three from one) if we go that route.

In any case, I *think* the bottleneck is in map_user() so until that
is solved, such an update (or this patch) is not very useful.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c | 95 +++++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 38 deletions(-)

diff --git a/revision.c b/revision.c
index 56b72f7..4b66598 100644
--- a/revision.c
+++ b/revision.c
@@ -2220,49 +2220,73 @@ static int rewrite_parents(struct rev_info *revs, struct commit *commit)
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+static void map_person(const char *buf, size_t len, const char *head, int headlen,
+		       struct strbuf *result, struct string_list *mailmap,
+		       int *matchofs, int *matchlen)
 {
-	char *person, *endp;
-	size_t len;
+	struct ident_split ident;
+	const char *endp, *person;
 	struct strbuf name = STRBUF_INIT;
 	struct strbuf mail = STRBUF_INIT;
-	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
+	person = memmem(buf, len, head, headlen);
 	if (!person)
-		goto left_intact;
-
-	person += strlen(what);
+		return;
+	person += headlen;
 	endp = strchr(person, '\n');
 	if (!endp)
-		goto left_intact;
-
-	len = endp - person;
-
-	if (split_ident_line(&ident, person, len))
-		goto left_intact;
-
+		return;
+	*matchofs = person - buf;
+	*matchlen = endp - person;
+	if (split_ident_line(&ident, person, *matchlen))
+		return;
 	strbuf_add(&name, ident.name_begin, ident.name_end - ident.name_begin);
 	strbuf_add(&mail, ident.mail_begin, ident.mail_end - ident.mail_begin);
-
-	if (map_user(mailmap, &mail, &name)) {
-		strbuf_addf(&name, " <%s>", mail.buf);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      name.buf, name.len);
-
-		strbuf_release(&name);
-		strbuf_release(&mail);
-
-		return 1;
-	}
-
-left_intact:
+	if (map_user(mailmap, &mail, &name))
+		strbuf_addf(result, "%s <%s>", name.buf, mail.buf);
 	strbuf_release(&name);
 	strbuf_release(&mail);
+}
 
-	return 0;
+static void map_identities(struct strbuf *buf, const char *buffer, struct string_list *mailmap)
+{
+	const char *eob;
+	struct strbuf author = STRBUF_INIT;
+	struct strbuf committer = STRBUF_INIT;
+	int a_at = -1, a_len, c_at = -1, c_len;
+
+	eob = strstr(buffer, "\n\n");
+	if (!eob)
+		eob = buffer + strlen(buffer);
+	map_person(buffer, eob - buffer, "\nauthor ", 8, &author, mailmap,
+		   &a_at, &a_len);
+	map_person(buffer, eob - buffer, "\ncommitter ", 11, &committer, mailmap,
+		   &c_at, &c_len);
+	if (!author.len && !committer.len)
+		goto done;
+
+	/* Now, we know we need rewriting */
+	if (!buf->len)
+		strbuf_addstr(buf, buffer);
+
+	if (c_at < 0 || !committer.len) {
+		strbuf_splice(buf, a_at, a_len, author.buf, author.len);
+	} else if (a_at < 0 || !author.len) {
+		strbuf_splice(buf, c_at, c_len, committer.buf, committer.len);
+	} else if (a_at < c_at) {
+		strbuf_splice(buf, c_at, c_len, committer.buf, committer.len);
+		strbuf_splice(buf, a_at, a_len, author.buf, author.len);
+	} else {
+		/*
+		 * The commit records committer before the author, which is malformed,
+		 * which we may want to warn about.
+		 */
+		strbuf_splice(buf, a_at, a_len, author.buf, author.len);
+		strbuf_splice(buf, c_at, c_len, committer.buf, committer.len);
+	}
+ done:
+	strbuf_release(&author);
+	strbuf_release(&committer);
 }
 
 static int commit_match(struct commit *commit, struct rev_info *opt)
@@ -2283,13 +2307,8 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 	if (buf.len)
 		strbuf_addstr(&buf, commit->buffer);
 
-	if (opt->grep_filter.header_list && opt->mailmap) {
-		if (!buf.len)
-			strbuf_addstr(&buf, commit->buffer);
-
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
-	}
+	if (opt->grep_filter.header_list && opt->mailmap)
+		map_identities(&buf, commit->buffer, opt->mailmap);
 
 	/* Append "fake" message parts as needed */
 	if (opt->show_notes) {
-- 
1.8.1.rc3.221.g8d09d94

  reply	other threads:[~2012-12-27 18:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-22 16:58 [PATCH 0/2] Mailmap in log improvements Antoine Pelisse
2012-12-22 16:58 ` [PATCH 1/2] log: grep author/committer using mailmap Antoine Pelisse
2012-12-26 19:27   ` Junio C Hamano
2012-12-26 21:12     ` Antoine Pelisse
2012-12-26 21:37       ` Junio C Hamano
2012-12-27 15:31         ` [PATCH v2] " Antoine Pelisse
2012-12-27 18:45           ` Junio C Hamano
2012-12-27 18:48             ` Junio C Hamano [this message]
2012-12-28 18:00               ` Antoine Pelisse
2012-12-28 18:43                 ` Junio C Hamano
2012-12-28 20:37                   ` Antoine Pelisse
2012-12-22 16:58 ` [PATCH 2/2] log: add log.mailmap configuration option Antoine Pelisse
2012-12-23  4:26   ` Junio C Hamano
2012-12-26 16:14     ` Junio C Hamano
2012-12-26 16:42       ` Antoine Pelisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vwqw3l49z.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=apelisse@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).