git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Rene Scharfe <l.s.r@web.de>, git@vger.kernel.org
Subject: [PATCH] shortlog: skip format/parse roundtrip for internal traversal
Date: Fri, 8 Sep 2017 05:21:27 -0400	[thread overview]
Message-ID: <20170908092126.55o3342macegtlga@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqk219zobb.fsf@gitster.mtv.corp.google.com>

On Fri, Sep 08, 2017 at 03:39:36PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > IOW, something like the patch below, which pushes the re-parsing out to
> > the stdin code-path, and lets the internal traversal format directly
> > into the final buffer. It seems to be about 3% faster than the existing
> > code, and fixes the leak (by dropping that variable entirely).
> 
> Wow, that is soooo logical a conclusion that I somewhat feel ashamed
> that I didn't think of it myself.
> 
> Nicely done.

Thanks. Here it is with a commit message.

Note that the non-stdin path no longer looks at the "mailmap" entry of
"struct shortlog" (instead we use the one cached inside pretty.c). But
we still waste time loading it. I'm not sure if it's worth addressing
that. It's only once per program invocation, and it's a little tricky to
fix (we do shortlog_init() before we know whether or not we're using
stdin). We could just load it lazily, though, which would cover the
stdin case.

-- >8 --
Subject: shortlog: skip format/parse roundtrip for internal traversal

The original git-shortlog command parsed the output of
git-log, and the logic went something like this:

  1. Read stdin looking for "author" lines.

  2. Parse the identity into its name/email bits.

  3. Apply mailmap to the name/email.

  4. Reformat the identity into a single buffer that is our
     "key" for grouping entries (either a name by default,
     or "name <email>" if --email was given).

The first part happens in read_from_stdin(), and the other
three steps are part of insert_one_record().

When we do an internal traversal, we just swap out the stdin
read in step 1 for reading the commit objects ourselves.
Prior to 2db6b83d18 (shortlog: replace hand-parsing of
author with pretty-printer, 2016-01-18), that made sense; we
still had to parse the ident in the commit message.

But after that commit, we use pretty.c's "%an <%ae>" to get
the author ident (for simplicity). Which means that the
pretty printer is doing a parse/format under the hood, and
then we parse the result, apply the mailmap, and format the
result again.

Instead, we can just ask pretty.c to do all of those steps
for us (including the mailmap via "%aN <%aE>", and not
formatting the address when --email is missing).

And then we can push steps 2-4 into read_from_stdin(). This
speeds up "git shortlog -ns" on linux.git by about 3%, and
eliminates a leak in insert_one_record() of the namemailbuf
strbuf.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 56 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 43c4799ea9..e29875b843 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -52,26 +52,8 @@ static void insert_one_record(struct shortlog *log,
 			      const char *oneline)
 {
 	struct string_list_item *item;
-	const char *mailbuf, *namebuf;
-	size_t namelen, maillen;
-	struct strbuf namemailbuf = STRBUF_INIT;
-	struct ident_split ident;
 
-	if (split_ident_line(&ident, author, strlen(author)))
-		return;
-
-	namebuf = ident.name_begin;
-	mailbuf = ident.mail_begin;
-	namelen = ident.name_end - ident.name_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-
-	map_user(&log->mailmap, &mailbuf, &maillen, &namebuf, &namelen);
-	strbuf_add(&namemailbuf, namebuf, namelen);
-
-	if (log->email)
-		strbuf_addf(&namemailbuf, " <%.*s>", (int)maillen, mailbuf);
-
-	item = string_list_insert(&log->list, namemailbuf.buf);
+	item = string_list_insert(&log->list, author);
 
 	if (log->summary)
 		item->util = (void *)(UTIL_TO_INT(item) + 1);
@@ -114,9 +96,33 @@ static void insert_one_record(struct shortlog *log,
 	}
 }
 
+static int parse_stdin_author(struct shortlog *log,
+			       struct strbuf *out, const char *in)
+{
+	const char *mailbuf, *namebuf;
+	size_t namelen, maillen;
+	struct ident_split ident;
+
+	if (split_ident_line(&ident, in, strlen(in)))
+		return -1;
+
+	namebuf = ident.name_begin;
+	mailbuf = ident.mail_begin;
+	namelen = ident.name_end - ident.name_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+
+	map_user(&log->mailmap, &mailbuf, &maillen, &namebuf, &namelen);
+	strbuf_add(out, namebuf, namelen);
+	if (log->email)
+		strbuf_addf(out, " <%.*s>", (int)maillen, mailbuf);
+
+	return 0;
+}
+
 static void read_from_stdin(struct shortlog *log)
 {
 	struct strbuf author = STRBUF_INIT;
+	struct strbuf mapped_author = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	static const char *author_match[2] = { "Author: ", "author " };
 	static const char *committer_match[2] = { "Commit: ", "committer " };
@@ -134,9 +140,15 @@ static void read_from_stdin(struct shortlog *log)
 		while (strbuf_getline_lf(&oneline, stdin) != EOF &&
 		       !oneline.len)
 			; /* discard blanks */
-		insert_one_record(log, v, oneline.buf);
+
+		strbuf_reset(&mapped_author);
+		if (parse_stdin_author(log, &mapped_author, v) < 0)
+			continue;
+
+		insert_one_record(log, mapped_author.buf, oneline.buf);
 	}
 	strbuf_release(&author);
+	strbuf_release(&mapped_author);
 	strbuf_release(&oneline);
 }
 
@@ -153,7 +165,9 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	ctx.date_mode.type = DATE_NORMAL;
 	ctx.output_encoding = get_log_output_encoding();
 
-	fmt = log->committer ? "%cn <%ce>" : "%an <%ae>";
+	fmt = log->committer ?
+		(log->email ? "%cN <%cE>" : "%cN") :
+		(log->email ? "%aN <%aE>" : "%aN");
 
 	format_commit_message(commit, fmt, &author, &ctx);
 	if (!log->summary) {
-- 
2.14.1.769.g4f4ea7dfd3


  reply	other threads:[~2017-09-08  9:21 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-30 17:49 [PATCH 00/34] plug strbuf memory leaks Rene Scharfe
2017-08-30 17:49 ` [PATCH 01/34] am: release strbufs after use in detect_patch_format() Rene Scharfe
2017-08-31 17:31   ` Stefan Beller
2017-08-30 17:49 ` [PATCH 02/34] am: release strbuf on error return in hg_patch_to_mail() Rene Scharfe
2017-08-30 17:49 ` [PATCH 03/34] am: release strbuf after use in safe_to_abort() Rene Scharfe
2017-08-30 17:49 ` [PATCH 04/34] check-ref-format: release strbuf after use in check_ref_format_branch() Rene Scharfe
2017-08-30 17:49 ` [PATCH 05/34] clean: release strbuf after use in remove_dirs() Rene Scharfe
2017-08-30 17:49 ` [PATCH 06/34] clone: release strbuf after use in remove_junk() Rene Scharfe
2017-09-06 19:51   ` Junio C Hamano
2017-09-10  6:27     ` René Scharfe
2017-09-10  7:30       ` Jeff King
2017-09-10 10:37         ` René Scharfe
2017-09-10 17:38           ` Jeff King
2017-09-11 21:40             ` René Scharfe
2017-09-13 12:56               ` Jeff King
2017-08-30 17:49 ` [PATCH 07/34] commit: release strbuf on error return in commit_tree_extended() Rene Scharfe
2017-08-31 17:40   ` Stefan Beller
2017-08-30 17:49 ` [PATCH 08/34] connect: release strbuf on error return in git_connect() Rene Scharfe
2017-08-31 17:44   ` Stefan Beller
2017-08-30 17:49 ` [PATCH 09/34] convert: release strbuf on error return in filter_buffer_or_fd() Rene Scharfe
2017-08-30 17:49 ` [PATCH 10/34] diff: release strbuf after use in diff_summary() Rene Scharfe
2017-08-31 17:46   ` Stefan Beller
2017-08-30 17:49 ` [PATCH 11/34] diff: release strbuf after use in show_rename_copy() Rene Scharfe
2017-08-30 17:49 ` [PATCH 12/34] diff: release strbuf after use in show_stats() Rene Scharfe
2017-08-30 17:49 ` [PATCH 13/34] help: release strbuf on error return in exec_man_konqueror() Rene Scharfe
2017-08-30 17:49 ` [PATCH 14/34] help: release strbuf on error return in exec_man_man() Rene Scharfe
2017-08-30 17:49 ` [PATCH 15/34] help: release strbuf on error return in exec_woman_emacs() Rene Scharfe
2017-08-30 17:49 ` [PATCH 16/34] mailinfo: release strbuf after use in handle_from() Rene Scharfe
2017-08-30 17:49 ` [PATCH 17/34] mailinfo: release strbuf on error return in handle_boundary() Rene Scharfe
2017-08-30 18:23   ` Martin Ågren
2017-08-31 17:21     ` René Scharfe
2017-09-05 17:10       ` Martin Ågren
2017-08-30 17:49 ` [PATCH 18/34] merge: release strbuf after use in save_state() Rene Scharfe
2017-08-30 17:49 ` [PATCH 19/34] merge: release strbuf after use in write_merge_heads() Rene Scharfe
2017-08-30 17:57 ` [PATCH 20/34] notes: release strbuf after use in notes_copy_from_stdin() Rene Scharfe
2017-08-30 17:58 ` [PATCH 02/34] am: release strbuf on error return in hg_patch_to_mail() Rene Scharfe
2017-08-30 17:58   ` [PATCH 03/34] am: release strbuf after use in safe_to_abort() Rene Scharfe
2017-08-30 17:58   ` [PATCH 04/34] check-ref-format: release strbuf after use in check_ref_format_branch() Rene Scharfe
2017-08-30 17:58   ` [PATCH 08/34] connect: release strbuf on error return in git_connect() Rene Scharfe
2017-08-30 17:58   ` [PATCH 09/34] convert: release strbuf on error return in filter_buffer_or_fd() Rene Scharfe
2017-08-30 17:58   ` [PATCH 11/34] diff: release strbuf after use in show_rename_copy() Rene Scharfe
2017-08-30 17:58   ` [PATCH 12/34] diff: release strbuf after use in show_stats() Rene Scharfe
2017-08-30 17:58   ` [PATCH 21/34] refs: release strbuf on error return in write_pseudoref() Rene Scharfe
2017-08-30 18:00 ` [PATCH 08/34] connect: release strbuf on error return in git_connect() Rene Scharfe
2017-08-30 18:00   ` [PATCH 21/34] refs: release strbuf on error return in write_pseudoref() Rene Scharfe
2017-08-30 18:00   ` [PATCH 22/34] remote: release strbuf after use in read_remote_branches() Rene Scharfe
2017-08-30 18:00   ` [PATCH 23/34] remote: release strbuf after use in migrate_file() Rene Scharfe
2017-08-30 18:00   ` [PATCH 24/34] remote: release strbuf after use in set_url() Rene Scharfe
2017-08-30 18:00   ` [PATCH 25/34] send-pack: release strbuf on error return in send_pack() Rene Scharfe
2017-08-30 18:00   ` [PATCH 26/34] sha1_file: release strbuf on error return in index_path() Rene Scharfe
2017-08-30 18:00   ` [PATCH 27/34] shortlog: release strbuf after use in insert_one_record() Rene Scharfe
2017-09-06 19:51     ` Junio C Hamano
2017-09-07  4:33       ` Jeff King
2017-09-08  0:33         ` Junio C Hamano
2017-09-08  3:56           ` Jeff King
2017-09-08  4:36             ` Jeff King
2017-09-08  6:39               ` Junio C Hamano
2017-09-08  9:21                 ` Jeff King [this message]
2017-09-10  8:44                   ` [PATCH] shortlog: skip format/parse roundtrip for internal traversal René Scharfe
2017-09-10  8:50                     ` Jeff King
2017-08-30 18:05 ` [PATCH 08/34] connect: release strbuf on error return in git_connect() Rene Scharfe
2017-08-30 18:20 ` [PATCH 21/34] refs: release strbuf on error return in write_pseudoref() Rene Scharfe
2017-08-30 18:20 ` [PATCH 25/34] send-pack: release strbuf on error return in send_pack() Rene Scharfe
2017-08-30 18:20 ` [PATCH 28/34] sequencer: release strbuf after use in save_head() Rene Scharfe
2017-08-30 18:20 ` [PATCH 30/34] userdiff: release strbuf after use in userdiff_get_textconv() Rene Scharfe
2017-08-30 18:20 ` [PATCH 29/34] transport-helper: release strbuf after use in process_connect_service() Rene Scharfe
2017-08-30 18:20 ` [PATCH 31/34] utf8: release strbuf on error return in strbuf_utf8_replace() Rene Scharfe
2017-08-30 18:20 ` [PATCH 32/34] vcs-svn: release strbuf after use in end_revision() Rene Scharfe
2017-08-30 18:20 ` [PATCH 33/34] wt-status: release strbuf after use in read_rebase_todolist() Rene Scharfe
2017-08-30 18:20 ` [PATCH 34/34] wt-status: release strbuf after use in wt_longstatus_print_tracking() Rene Scharfe
2017-09-06 19:51   ` Junio C Hamano
2017-09-10  6:27     ` René Scharfe
2017-09-10  7:39       ` Junio C Hamano
2017-08-31 18:05 ` [PATCH 00/34] plug strbuf memory leaks Stefan Beller
2017-09-06 19:51 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170908092126.55o3342macegtlga@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).