git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] parsing trailers with shortlog
@ 2020-09-25  7:01 Jeff King
  2020-09-25  7:01 ` [PATCH 1/8] shortlog: change "author" variables to "ident" Jeff King
                   ` (10 more replies)
  0 siblings, 11 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:01 UTC (permalink / raw)
  To: git

Somebody mentioned at the inclusion summit that it would be nice for
mentoring/pairing relationships if we could more easily give credit to
multiple authors of a commit. When people ask about adding multiple
"author" headers, we usually recommend that they use a "co-authored-by"
trailer. But you can't convince shortlog to count it for anything. :)

So this series adds support for counting trailers to shortlog. You can
do a number of fun things with it, like:

    # credit reviewers
    git shortlog -ns --group=trailer:reviewed-by

    # credit authors and co-authors equally
    git shortlog -ns --group=author \
                     --group=trailer:co-authored-by

    # see who helps whom
    git shortlog --format="...helped %an on %as" \
                 --group=trailer:helped-by

If some of this looks familiar, it's because I sent the early patches
years ago. But I won't even bother linking to it; I cleaned up quite a
few rough edges since then, so it's not really worth looking at.

  [1/8]: shortlog: change "author" variables to "ident"
  [2/8]: shortlog: refactor committer/author grouping
  [3/8]: trailer: add interface for iterating over commit trailers
  [4/8]: shortlog: match commit trailers with --group
  [5/8]: shortlog: de-duplicate trailer values
  [6/8]: shortlog: rename parse_stdin_ident()
  [7/8]: shortlog: parse trailer idents
  [8/8]: shortlog: allow multiple groups to be specified

 Documentation/git-shortlog.txt |  29 +++++
 builtin/log.c                  |   1 +
 builtin/shortlog.c             | 218 +++++++++++++++++++++++++++++----
 shortlog.h                     |   8 +-
 t/t4201-shortlog.sh            | 141 +++++++++++++++++++++
 trailer.c                      |  36 ++++++
 trailer.h                      |  44 ++++++-
 7 files changed, 447 insertions(+), 30 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/8] shortlog: change "author" variables to "ident"
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
@ 2020-09-25  7:01 ` Jeff King
  2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:01 UTC (permalink / raw)
  To: git

We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index c856c58bb5..edcf2e0d54 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -49,12 +49,12 @@ static int compare_by_list(const void *a1, const void *a2)
 }
 
 static void insert_one_record(struct shortlog *log,
-			      const char *author,
+			      const char *ident,
 			      const char *oneline)
 {
 	struct string_list_item *item;
 
-	item = string_list_insert(&log->list, author);
+	item = string_list_insert(&log->list, ident);
 
 	if (log->summary)
 		item->util = (void *)(UTIL_TO_INT(item) + 1);
@@ -97,8 +97,8 @@ static void insert_one_record(struct shortlog *log,
 	}
 }
 
-static int parse_stdin_author(struct shortlog *log,
-			       struct strbuf *out, const char *in)
+static int parse_stdin_ident(struct shortlog *log,
+			     struct strbuf *out, const char *in)
 {
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
@@ -122,18 +122,18 @@ static int parse_stdin_author(struct shortlog *log,
 
 static void read_from_stdin(struct shortlog *log)
 {
-	struct strbuf author = STRBUF_INIT;
-	struct strbuf mapped_author = STRBUF_INIT;
+	struct strbuf ident = STRBUF_INIT;
+	struct strbuf mapped_ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	static const char *author_match[2] = { "Author: ", "author " };
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
 	match = log->committer ? committer_match : author_match;
-	while (strbuf_getline_lf(&author, stdin) != EOF) {
+	while (strbuf_getline_lf(&ident, stdin) != EOF) {
 		const char *v;
-		if (!skip_prefix(author.buf, match[0], &v) &&
-		    !skip_prefix(author.buf, match[1], &v))
+		if (!skip_prefix(ident.buf, match[0], &v) &&
+		    !skip_prefix(ident.buf, match[1], &v))
 			continue;
 		while (strbuf_getline_lf(&oneline, stdin) != EOF &&
 		       oneline.len)
@@ -142,20 +142,20 @@ static void read_from_stdin(struct shortlog *log)
 		       !oneline.len)
 			; /* discard blanks */
 
-		strbuf_reset(&mapped_author);
-		if (parse_stdin_author(log, &mapped_author, v) < 0)
+		strbuf_reset(&mapped_ident);
+		if (parse_stdin_ident(log, &mapped_ident, v) < 0)
 			continue;
 
-		insert_one_record(log, mapped_author.buf, oneline.buf);
+		insert_one_record(log, mapped_ident.buf, oneline.buf);
 	}
-	strbuf_release(&author);
-	strbuf_release(&mapped_author);
+	strbuf_release(&ident);
+	strbuf_release(&mapped_ident);
 	strbuf_release(&oneline);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf author = STRBUF_INIT;
+	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
 	const char *fmt;
@@ -170,17 +170,17 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		(log->email ? "%cN <%cE>" : "%cN") :
 		(log->email ? "%aN <%aE>" : "%aN");
 
-	format_commit_message(commit, fmt, &author, &ctx);
+	format_commit_message(commit, fmt, &ident, &ctx);
 	if (!log->summary) {
 		if (log->user_format)
 			pretty_print_commit(&ctx, commit, &oneline);
 		else
 			format_commit_message(commit, "%s", &oneline, &ctx);
 	}
 
-	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
+	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
 
-	strbuf_release(&author);
+	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
  2020-09-25  7:01 ` [PATCH 1/8] shortlog: change "author" variables to "ident" Jeff King
@ 2020-09-25  7:02 ` Jeff King
  2020-09-25 20:05   ` Eric Sunshine
  2020-09-26 12:31   ` Martin Ågren
  2020-09-25  7:02 ` [PATCH 3/8] trailer: add interface for iterating over commit trailers Jeff King
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:02 UTC (permalink / raw)
  To: git

In preparation for adding more grouping types, let's
refactor the committer/author grouping code. In particular:

  - the master option is now "--group", to make it clear
    that the various group types are mutually exclusive. The
    "--committer" option is an alias for "--group=committer".

  - we keep an enum rather than a binary flag, to prepare
    for more values

  - we prefer switch statements to ternary assignment, since
    other group types will need more custom code

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  9 ++++++
 builtin/shortlog.c             | 59 +++++++++++++++++++++++++++-------
 shortlog.h                     |  6 +++-
 t/t4201-shortlog.sh            |  5 +++
 4 files changed, 67 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index a72ea7f7ba..41b7679aa1 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -47,9 +47,18 @@ OPTIONS
 
 	Each pretty-printed commit will be rewrapped before it is shown.
 
+--group=<type>::
+	By default, `shortlog` collects and collates author identities;
+	using `--group` will collect other fields.
+	`<type>` is one of:
++
+ - `author`, commits are grouped by author (this is the default)
+ - `committer`, commits are grouped by committer (the same as `-c`)
+
 -c::
 --committer::
 	Collect and show committer identities instead of authors.
+	This is an alias for `--group=committer`.
 
 -w[<width>[,<indent1>[,<indent2>]]]::
 	Linewrap the output by wrapping each line at `width`.  The first
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index edcf2e0d54..880ce19304 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -129,7 +129,17 @@ static void read_from_stdin(struct shortlog *log)
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
-	match = log->committer ? committer_match : author_match;
+	switch (log->group) {
+	case SHORTLOG_GROUP_AUTHOR:
+		match = author_match;
+		break;
+	case SHORTLOG_GROUP_COMMITTER:
+		match = committer_match;
+		break;
+	default:
+		BUG("unhandled shortlog group");
+	}
+
 	while (strbuf_getline_lf(&ident, stdin) != EOF) {
 		const char *v;
 		if (!skip_prefix(ident.buf, match[0], &v) &&
@@ -158,27 +168,36 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
-	const char *fmt;
+	const char *oneline_str;
 
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
 	ctx.print_email_subject = 1;
 	ctx.date_mode.type = DATE_NORMAL;
 	ctx.output_encoding = get_log_output_encoding();
 
-	fmt = log->committer ?
-		(log->email ? "%cN <%cE>" : "%cN") :
-		(log->email ? "%aN <%aE>" : "%aN");
-
-	format_commit_message(commit, fmt, &ident, &ctx);
 	if (!log->summary) {
 		if (log->user_format)
 			pretty_print_commit(&ctx, commit, &oneline);
 		else
 			format_commit_message(commit, "%s", &oneline, &ctx);
 	}
-
-	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
+	oneline_str = oneline.len ? oneline.buf : "<none>";
+
+	switch (log->group) {
+	case SHORTLOG_GROUP_AUTHOR:
+		format_commit_message(commit,
+				      log->email ? "%aN <%aE>" : "%aN",
+				      &ident, &ctx);
+		insert_one_record(log, ident.buf, oneline_str);
+		break;
+	case SHORTLOG_GROUP_COMMITTER:
+		format_commit_message(commit,
+				      log->email ? "%cN <%cE>" : "%cN",
+				      &ident, &ctx);
+		insert_one_record(log, ident.buf, oneline_str);
+		break;
+	}
 
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
@@ -241,6 +260,21 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int parse_group_option(const struct option *opt, const char *arg, int unset)
+{
+	struct shortlog *log = opt->value;
+
+	if (unset || !strcasecmp(arg, "author"))
+		log->group = SHORTLOG_GROUP_AUTHOR;
+	else if (!strcasecmp(arg, "committer"))
+		log->group = SHORTLOG_GROUP_COMMITTER;
+	else
+		return error(_("unknown group type: %s"), arg);
+
+	return 0;
+}
+
+
 void shortlog_init(struct shortlog *log)
 {
 	memset(log, 0, sizeof(*log));
@@ -260,8 +294,9 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 
 	const struct option options[] = {
-		OPT_BOOL('c', "committer", &log.committer,
-			 N_("Group by committer rather than author")),
+		OPT_SET_INT('c', "committer", &log.group,
+			    N_("Group by committer rather than author"),
+			    SHORTLOG_GROUP_COMMITTER),
 		OPT_BOOL('n', "numbered", &log.sort_by_number,
 			 N_("sort output according to the number of commits per author")),
 		OPT_BOOL('s', "summary", &log.summary,
@@ -271,6 +306,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 		OPT_CALLBACK_F('w', NULL, &log, N_("<w>[,<i1>[,<i2>]]"),
 			N_("Linewrap output"), PARSE_OPT_OPTARG,
 			&parse_wrap_args),
+		OPT_CALLBACK(0, "group", &log, N_("field"),
+			N_("Group by field"), parse_group_option),
 		OPT_END(),
 	};
 
diff --git a/shortlog.h b/shortlog.h
index 2fa61c4294..9deeeb3ffb 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -15,7 +15,11 @@ struct shortlog {
 	int in2;
 	int user_format;
 	int abbrev;
-	int committer;
+
+	enum {
+		SHORTLOG_GROUP_AUTHOR = 0,
+		SHORTLOG_GROUP_COMMITTER
+	} group;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index d3a7ce6bbb..65e4468746 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -215,4 +215,9 @@ test_expect_success 'shortlog --committer (external)' '
 	test_cmp expect actual
 '
 
+test_expect_success '--group=committer is the same as --committer' '
+	git shortlog -ns --group=committer HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 3/8] trailer: add interface for iterating over commit trailers
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
  2020-09-25  7:01 ` [PATCH 1/8] shortlog: change "author" variables to "ident" Jeff King
  2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
@ 2020-09-25  7:02 ` Jeff King
  2020-09-26 12:39   ` Martin Ågren
  2020-09-25  7:03 ` [PATCH 4/8] shortlog: match commit trailers with --group Jeff King
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:02 UTC (permalink / raw)
  To: git

The trailer code knows how to parse out the trailers and re-format them,
but there's no easy way to iterate over the trailers (you can use
trailer_info, but you have to then do a bunch of extra parsing).

Let's add an iteration interface that makes this easy to do.

Signed-off-by: Jeff King <peff@peff.net>
---
 trailer.c | 36 ++++++++++++++++++++++++++++++++++++
 trailer.h | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/trailer.c b/trailer.c
index 68dabc2556..3f7391d793 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1183,3 +1183,39 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	format_trailer_info(out, &info, opts);
 	trailer_info_release(&info);
 }
+
+void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
+{
+	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
+	strbuf_init(&iter->key, 0);
+	strbuf_init(&iter->val, 0);
+	opts.no_divider = 1;
+	trailer_info_get(&iter->info, msg, &opts);
+	iter->cur = 0;
+}
+
+int trailer_iterator_advance(struct trailer_iterator *iter)
+{
+	while (iter->cur < iter->info.trailer_nr) {
+		char *trailer = iter->info.trailers[iter->cur++];
+		int separator_pos = find_separator(trailer, separators);
+
+		if (separator_pos < 1)
+			continue; /* not a real trailer */
+
+		strbuf_reset(&iter->key);
+		strbuf_reset(&iter->val);
+		parse_trailer(&iter->key, &iter->val, NULL,
+			      trailer, separator_pos);
+		unfold_value(&iter->val);
+		return 1;
+	}
+	return 0;
+}
+
+void trailer_iterator_release(struct trailer_iterator *iter)
+{
+	trailer_info_release(&iter->info);
+	strbuf_release(&iter->val);
+	strbuf_release(&iter->key);
+}
diff --git a/trailer.h b/trailer.h
index 203acf4ee1..af9949363a 100644
--- a/trailer.h
+++ b/trailer.h
@@ -2,8 +2,7 @@
 #define TRAILER_H
 
 #include "list.h"
-
-struct strbuf;
+#include "strbuf.h"
 
 enum trailer_where {
 	WHERE_DEFAULT,
@@ -103,4 +102,45 @@ void trailer_info_release(struct trailer_info *info);
 void format_trailers_from_commit(struct strbuf *out, const char *msg,
 				 const struct process_trailer_options *opts);
 
+/*
+ * An interface for iterating over the trailers found in a particular commit
+ * message. Use like:
+ *
+ *   struct trailer_iterator iter;
+ *   trailer_iterator_init(&iter, msg);
+ *   while (trailer_iterator_advance(&iter))
+ *      ... do something with iter.key and iter.val ...
+ *   trailer_iterator_release(&iter);
+ */
+struct trailer_iterator {
+	struct strbuf key;
+	struct strbuf val;
+
+	/* private */
+	struct trailer_info info;
+	size_t cur;
+};
+
+/*
+ * Initialize "iter" in preparation for walking over the trailers in the commit
+ * message "msg". The "msg" pointer must remain valid until the iterator is
+ * released.
+ *
+ * After initializing, we are not yet pointing
+ */
+void trailer_iterator_init(struct trailer_iterator *iter, const char *msg);
+
+/*
+ * Advance to the next trailer of the iterator. Returns 0 if there is no such
+ * trailer, and 1 otherwise. The key and value of the trailer can be
+ * fetched from the iter->key and iter->value fields (which are valid
+ * only until the next advance).
+ */
+int trailer_iterator_advance(struct trailer_iterator *iter);
+
+/*
+ * Release all resources associated with the trailer iteration.
+ */
+void trailer_iterator_release(struct trailer_iterator *iter);
+
 #endif /* TRAILER_H */
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 4/8] shortlog: match commit trailers with --group
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (2 preceding siblings ...)
  2020-09-25  7:02 ` [PATCH 3/8] trailer: add interface for iterating over commit trailers Jeff King
@ 2020-09-25  7:03 ` Jeff King
  2020-09-25  7:05 ` [PATCH 5/8] shortlog: de-duplicate trailer values Jeff King
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:03 UTC (permalink / raw)
  To: git

If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:

  git shortlog -ns --group=trailer:reviewed-by

in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:

  git shortlog --format="...helped %an (%ad)" \
               --group=trailer:helped-by

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt | 13 ++++++++++
 builtin/shortlog.c             | 44 +++++++++++++++++++++++++++++++++-
 shortlog.h                     |  4 +++-
 t/t4201-shortlog.sh            | 14 +++++++++++
 4 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 41b7679aa1..40752bc573 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -54,6 +54,19 @@ OPTIONS
 +
  - `author`, commits are grouped by author (this is the default)
  - `committer`, commits are grouped by committer (the same as `-c`)
+ - `trailer:<field>`, the `<field>` is interpreted as a case-insensitive
+   commit message trailer (see linkgit:git-interpret-trailers[1]). For
+   example, if your project uses `Reviewed-by` trailers, you might want
+   to see who has been reviewing with
+   `git shortlog -ns --group=trailer:reviewed-by`.
++
+Note that commits that do not include the trailer will not be counted.
+Likewise, commits with multiple trailers (e.g., multiple signoffs) may
+be counted more than once.
++
+The contents of each trailer value are taken literally and completely.
+No mailmap is applied, and the `-e` option has no effect (if the trailer
+contains a username and email, they are both always shown).
 
 -c::
 --committer::
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 880ce19304..e1d9ee909f 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -9,6 +9,7 @@
 #include "mailmap.h"
 #include "shortlog.h"
 #include "parse-options.h"
+#include "trailer.h"
 
 static char const * const shortlog_usage[] = {
 	N_("git shortlog [<options>] [<revision-range>] [[--] <path>...]"),
@@ -136,6 +137,8 @@ static void read_from_stdin(struct shortlog *log)
 	case SHORTLOG_GROUP_COMMITTER:
 		match = committer_match;
 		break;
+	case SHORTLOG_GROUP_TRAILER:
+		die(_("using --group=trailer with stdin is not supported"));
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -163,6 +166,37 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
+static void insert_records_from_trailers(struct shortlog *log,
+					 struct commit *commit,
+					 struct pretty_print_context *ctx,
+					 const char *oneline)
+{
+	struct trailer_iterator iter;
+	const char *commit_buffer, *body;
+
+	/*
+	 * Using format_commit_message("%B") would be simpler here, but
+	 * this saves us copying the message.
+	 */
+	commit_buffer = logmsg_reencode(commit, NULL, ctx->output_encoding);
+	body = strstr(commit_buffer, "\n\n");
+	if (!body)
+		return;
+
+	trailer_iterator_init(&iter, body);
+	while (trailer_iterator_advance(&iter)) {
+		const char *value = iter.val.buf;
+
+		if (strcasecmp(iter.key.buf, log->trailer))
+			continue;
+
+		insert_one_record(log, value, oneline);
+	}
+	trailer_iterator_release(&iter);
+
+	unuse_commit_buffer(commit, commit_buffer);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -197,6 +231,9 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 				      &ident, &ctx);
 		insert_one_record(log, ident.buf, oneline_str);
 		break;
+	case SHORTLOG_GROUP_TRAILER:
+		insert_records_from_trailers(log, commit, &ctx, oneline_str);
+		break;
 	}
 
 	strbuf_release(&ident);
@@ -263,12 +300,17 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset)
 static int parse_group_option(const struct option *opt, const char *arg, int unset)
 {
 	struct shortlog *log = opt->value;
+	const char *field;
 
 	if (unset || !strcasecmp(arg, "author"))
 		log->group = SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
 		log->group = SHORTLOG_GROUP_COMMITTER;
-	else
+	else if (skip_prefix(arg, "trailer:", &field)) {
+		log->group = SHORTLOG_GROUP_TRAILER;
+		free(log->trailer);
+		log->trailer = xstrdup(field);
+	} else
 		return error(_("unknown group type: %s"), arg);
 
 	return 0;
diff --git a/shortlog.h b/shortlog.h
index 9deeeb3ffb..54ce55e9e9 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -18,8 +18,10 @@ struct shortlog {
 
 	enum {
 		SHORTLOG_GROUP_AUTHOR = 0,
-		SHORTLOG_GROUP_COMMITTER
+		SHORTLOG_GROUP_COMMITTER,
+		SHORTLOG_GROUP_TRAILER
 	} group;
+	char *trailer;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 65e4468746..e97d891a71 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -220,4 +220,18 @@ test_expect_success '--group=committer is the same as --committer' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=trailer:signed-off-by' '
+	git commit --allow-empty -m foo -s &&
+	GIT_COMMITTER_NAME="SOB One" \
+	GIT_COMMITTER_EMAIL=sob@example.com \
+		git commit --allow-empty -m foo -s &&
+	git commit --allow-empty --amend --no-edit -s &&
+	cat >expect <<-\EOF &&
+	     2	C O Mitter <committer@example.com>
+	     1	SOB One <sob@example.com>
+	EOF
+	git shortlog -ns --group=trailer:signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 5/8] shortlog: de-duplicate trailer values
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (3 preceding siblings ...)
  2020-09-25  7:03 ` [PATCH 4/8] shortlog: match commit trailers with --group Jeff King
@ 2020-09-25  7:05 ` Jeff King
  2020-09-25  7:05 ` [PATCH 6/8] shortlog: rename parse_stdin_ident() Jeff King
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:05 UTC (permalink / raw)
  To: git

The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:

  Signed-off-by: One
  Signed-off-by: Two
  Signed-off-by: One

We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  3 +-
 builtin/shortlog.c             | 58 ++++++++++++++++++++++++++++++++++
 t/t4201-shortlog.sh            | 28 ++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 40752bc573..29a9b22d3b 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -62,7 +62,8 @@ OPTIONS
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
-be counted more than once.
+be counted more than once (but only once per unique trailer value in
+that commit).
 +
 The contents of each trailer value are taken literally and completely.
 No mailmap is applied, and the `-e` option has no effect (if the trailer
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index e1d9ee909f..d2d8103dd3 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -166,13 +166,68 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
+struct strset_item {
+	struct hashmap_entry ent;
+	char value[FLEX_ARRAY];
+};
+
+struct strset {
+	struct hashmap map;
+};
+
+#define STRSET_INIT { { NULL } }
+
+static int strset_item_hashcmp(const void *hash_data,
+			       const struct hashmap_entry *entry,
+			       const struct hashmap_entry *entry_or_key,
+			       const void *keydata)
+{
+	const struct strset_item *a, *b;
+
+	a = container_of(entry, const struct strset_item, ent);
+	if (keydata)
+		return strcmp(a->value, keydata);
+
+	b = container_of(entry_or_key, const struct strset_item, ent);
+	return strcmp(a->value, b->value);
+}
+
+/*
+ * Adds "str" to the set if it was not already present; returns true if it was
+ * already there.
+ */
+static int strset_check_and_add(struct strset *ss, const char *str)
+{
+	unsigned int hash = strhash(str);
+	struct strset_item *item;
+
+	if (!ss->map.table)
+		hashmap_init(&ss->map, strset_item_hashcmp, NULL, 0);
+
+	if (hashmap_get_from_hash(&ss->map, hash, str))
+		return 1;
+
+	FLEX_ALLOC_STR(item, value, str);
+	hashmap_entry_init(&item->ent, hash);
+	hashmap_add(&ss->map, &item->ent);
+	return 0;
+}
+
+static void strset_clear(struct strset *ss)
+{
+	if (!ss->map.table)
+		return;
+	hashmap_free_entries(&ss->map, struct strset_item, ent);
+}
+
 static void insert_records_from_trailers(struct shortlog *log,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
+	struct strset dups = STRSET_INIT;
 
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
@@ -190,10 +245,13 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (strcasecmp(iter.key.buf, log->trailer))
 			continue;
 
+		if (strset_check_and_add(&dups, value))
+			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
+	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index e97d891a71..83dbbc44e8 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -234,4 +234,32 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog de-duplicates trailers in a single commit' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this message has two distinct values, plus a repeat
+
+	Repeated-trailer: Foo
+	Repeated-trailer: Bar
+	Repeated-trailer: Foo
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	similar to the previous, but without the second distinct value
+
+	Repeated-trailer: Foo
+	Repeated-trailer: Foo
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	Foo
+	     1	Bar
+	EOF
+	git shortlog -ns --group=trailer:repeated-trailer -2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 6/8] shortlog: rename parse_stdin_ident()
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (4 preceding siblings ...)
  2020-09-25  7:05 ` [PATCH 5/8] shortlog: de-duplicate trailer values Jeff King
@ 2020-09-25  7:05 ` Jeff King
  2020-09-25  7:05 ` [PATCH 7/8] shortlog: parse trailer idents Jeff King
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:05 UTC (permalink / raw)
  To: git

This function is actually useful for parsing any identity, whether from
stdin or not. We'll need it for handling trailers.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d2d8103dd3..e6f4faec7c 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -98,8 +98,8 @@ static void insert_one_record(struct shortlog *log,
 	}
 }
 
-static int parse_stdin_ident(struct shortlog *log,
-			     struct strbuf *out, const char *in)
+static int parse_ident(struct shortlog *log,
+		       struct strbuf *out, const char *in)
 {
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
@@ -156,7 +156,7 @@ static void read_from_stdin(struct shortlog *log)
 			; /* discard blanks */
 
 		strbuf_reset(&mapped_ident);
-		if (parse_stdin_ident(log, &mapped_ident, v) < 0)
+		if (parse_ident(log, &mapped_ident, v) < 0)
 			continue;
 
 		insert_one_record(log, mapped_ident.buf, oneline.buf);
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 7/8] shortlog: parse trailer idents
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (5 preceding siblings ...)
  2020-09-25  7:05 ` [PATCH 6/8] shortlog: rename parse_stdin_ident() Jeff King
@ 2020-09-25  7:05 ` Jeff King
  2020-09-25  7:05 ` [PATCH 8/8] shortlog: allow multiple groups to be specified Jeff King
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:05 UTC (permalink / raw)
  To: git

Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.

When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  7 ++++---
 builtin/shortlog.c             |  6 ++++++
 t/t4201-shortlog.sh            | 20 ++++++++++++++++++++
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 29a9b22d3b..5f8f918cad 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -65,9 +65,10 @@ Likewise, commits with multiple trailers (e.g., multiple signoffs) may
 be counted more than once (but only once per unique trailer value in
 that commit).
 +
-The contents of each trailer value are taken literally and completely.
-No mailmap is applied, and the `-e` option has no effect (if the trailer
-contains a username and email, they are both always shown).
+Shortlog will attempt to parse each trailer value as a `name <email>`
+identity. If successful, the mailmap is applied and the email is omitted
+unless the `--email` option is specified. If the value cannot be parsed
+as an identity, it will be taken literally and completely.
 
 -c::
 --committer::
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index e6f4faec7c..28133aec68 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -228,6 +228,7 @@ static void insert_records_from_trailers(struct shortlog *log,
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
 	struct strset dups = STRSET_INIT;
+	struct strbuf ident = STRBUF_INIT;
 
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
@@ -245,12 +246,17 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (strcasecmp(iter.key.buf, log->trailer))
 			continue;
 
+		strbuf_reset(&ident);
+		if (!parse_ident(log, &ident, value))
+			value = ident.buf;
+
 		if (strset_check_and_add(&dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
+	strbuf_release(&ident);
 	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 83dbbc44e8..a62ee9ed55 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -230,10 +230,30 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	     2	C O Mitter <committer@example.com>
 	     1	SOB One <sob@example.com>
 	EOF
+	git shortlog -nse --group=trailer:signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'trailer idents are split' '
+	cat >expect <<-\EOF &&
+	     2	C O Mitter
+	     1	SOB One
+	EOF
 	git shortlog -ns --group=trailer:signed-off-by HEAD >actual &&
 	test_cmp expect actual
 '
 
+test_expect_success 'trailer idents are mailmapped' '
+	cat >expect <<-\EOF &&
+	     2	C O Mitter
+	     1	Another Name
+	EOF
+	echo "Another Name <sob@example.com>" >mail.map &&
+	git -c mailmap.file=mail.map shortlog -ns \
+		--group=trailer:signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'shortlog de-duplicates trailers in a single commit' '
 	git commit --allow-empty -F - <<-\EOF &&
 	subject one
-- 
2.28.0.1085.g44a0350633


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 8/8] shortlog: allow multiple groups to be specified
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (6 preceding siblings ...)
  2020-09-25  7:05 ` [PATCH 7/8] shortlog: parse trailer idents Jeff King
@ 2020-09-25  7:05 ` Jeff King
  2020-09-25 20:23   ` Eric Sunshine
  2020-09-26 12:48   ` Martin Ågren
  2020-09-25 14:27 ` [PATCH 0/8] parsing trailers with shortlog Derrick Stolee
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 44+ messages in thread
From: Jeff King @ 2020-09-25  7:05 UTC (permalink / raw)
  To: git

Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.

This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:

  git shortlog -ns --group=author --group=trailer:co-authored-by

to get a shortlog that counts authors and co-authors equally.

The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.

The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).

There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  5 +++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 69 ++++++++++++++++++++-----------
 shortlog.h                     | 10 ++---
 t/t4201-shortlog.sh            | 74 ++++++++++++++++++++++++++++++++++
 5 files changed, 130 insertions(+), 29 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 5f8f918cad..f31770dab5 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -69,6 +69,11 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
 identity. If successful, the mailmap is applied and the email is omitted
 unless the `--email` option is specified. If the value cannot be parsed
 as an identity, it will be taken literally and completely.
++
+If `--group` is specified multiple times, commits are counted under each
+value (but again, only once per unique value in that commit). For
+example, `git shortlog --group=author --group=trailer:co-authored-by`
+counts both authors and co-authors.
 
 -c::
 --committer::
diff --git a/builtin/log.c b/builtin/log.c
index b8824d898f..7f27e9eca1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1195,6 +1195,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	log.in1 = 2;
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
+	log.groups = SHORTLOG_GROUP_AUTHOR;
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 28133aec68..a36a8717b7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -121,6 +121,11 @@ static int parse_ident(struct shortlog *log,
 	return 0;
 }
 
+static inline int has_multi_bits(unsigned n)
+{
+	return (n & (n - 1)) != 0;
+}
+
 static void read_from_stdin(struct shortlog *log)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -130,7 +135,10 @@ static void read_from_stdin(struct shortlog *log)
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
-	switch (log->group) {
+	if (has_multi_bits(log->groups))
+		die(_("using multiple --group options with stdin is not supported"));
+
+	switch (log->groups) {
 	case SHORTLOG_GROUP_AUTHOR:
 		match = author_match;
 		break;
@@ -221,13 +229,13 @@ static void strset_clear(struct strset *ss)
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
+					 struct strset *dups,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
-	struct strset dups = STRSET_INIT;
 	struct strbuf ident = STRBUF_INIT;
 
 	/*
@@ -243,28 +251,28 @@ static void insert_records_from_trailers(struct shortlog *log,
 	while (trailer_iterator_advance(&iter)) {
 		const char *value = iter.val.buf;
 
-		if (strcasecmp(iter.key.buf, log->trailer))
+		if (!string_list_has_string(&log->trailers, iter.key.buf))
 			continue;
 
 		strbuf_reset(&ident);
 		if (!parse_ident(log, &ident, value))
 			value = ident.buf;
 
-		if (strset_check_and_add(&dups, value))
+		if (strset_check_and_add(dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
 	strbuf_release(&ident);
-	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
+	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
 	const char *oneline_str;
 
@@ -282,24 +290,29 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	switch (log->group) {
-	case SHORTLOG_GROUP_AUTHOR:
+	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%aN <%aE>" : "%aN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_COMMITTER:
+		if (!has_multi_bits(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%cN <%cE>" : "%cN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_TRAILER:
-		insert_records_from_trailers(log, commit, &ctx, oneline_str);
-		break;
+		if (!has_multi_bits(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_TRAILER) {
+		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 
+	strset_clear(&dups);
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
@@ -366,14 +379,16 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	struct shortlog *log = opt->value;
 	const char *field;
 
-	if (unset || !strcasecmp(arg, "author"))
-		log->group = SHORTLOG_GROUP_AUTHOR;
+	if (unset) {
+		log->groups = 0;
+		string_list_clear(&log->trailers, 0);
+	} else if (!strcasecmp(arg, "author"))
+		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
-		log->group = SHORTLOG_GROUP_COMMITTER;
+		log->groups |= SHORTLOG_GROUP_COMMITTER;
 	else if (skip_prefix(arg, "trailer:", &field)) {
-		log->group = SHORTLOG_GROUP_TRAILER;
-		free(log->trailer);
-		log->trailer = xstrdup(field);
+		log->groups |= SHORTLOG_GROUP_TRAILER;
+		string_list_append(&log->trailers, field);
 	} else
 		return error(_("unknown group type: %s"), arg);
 
@@ -391,6 +406,8 @@ void shortlog_init(struct shortlog *log)
 	log->wrap = DEFAULT_WRAPLEN;
 	log->in1 = DEFAULT_INDENT1;
 	log->in2 = DEFAULT_INDENT2;
+	log->trailers.strdup_strings = 1;
+	log->trailers.cmp = strcasecmp;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -400,9 +417,9 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 
 	const struct option options[] = {
-		OPT_SET_INT('c', "committer", &log.group,
-			    N_("Group by committer rather than author"),
-			    SHORTLOG_GROUP_COMMITTER),
+		OPT_BIT('c', "committer", &log.groups,
+			N_("Group by committer rather than author"),
+			SHORTLOG_GROUP_COMMITTER),
 		OPT_BOOL('n', "numbered", &log.sort_by_number,
 			 N_("sort output according to the number of commits per author")),
 		OPT_BOOL('s', "summary", &log.summary,
@@ -454,6 +471,10 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
 
+	if (!log.groups)
+		log.groups = SHORTLOG_GROUP_AUTHOR;
+	string_list_sort(&log.trailers);
+
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
 		add_head_to_pending(&rev);
diff --git a/shortlog.h b/shortlog.h
index 54ce55e9e9..030a0b8490 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -17,11 +17,11 @@ struct shortlog {
 	int abbrev;
 
 	enum {
-		SHORTLOG_GROUP_AUTHOR = 0,
-		SHORTLOG_GROUP_COMMITTER,
-		SHORTLOG_GROUP_TRAILER
-	} group;
-	char *trailer;
+		SHORTLOG_GROUP_AUTHOR = (1 << 0),
+		SHORTLOG_GROUP_COMMITTER = (1 << 1),
+		SHORTLOG_GROUP_TRAILER = (1 << 2)
+	} groups;
+	struct string_list trailers;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index a62ee9ed55..3d5c4a2086 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -282,4 +282,78 @@ test_expect_success 'shortlog de-duplicates trailers in a single commit' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple groups' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this has two trailers that are distinct from the author; it will count
+	3 times in the output
+
+	Some-trailer: User A <a@example.com>
+	Another-trailer: User B <b@example.com>
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	this one has two trailers, one of which is a duplicate with the author;
+	it will only be counted once for them
+
+	Another-trailer: A U Thor <author@example.com>
+	Some-trailer: User B <b@example.com>
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     2	User B
+	     1	User A
+	EOF
+	git shortlog -ns \
+		--group=author \
+		--group=trailer:some-trailer \
+		--group=trailer:another-trailer \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up option selection tests' '
+	git commit --allow-empty -F - <<-\EOF
+	subject
+
+	body
+
+	Trailer-one: value-one
+	Trailer-two: value-two
+	EOF
+'
+
+test_expect_success '--no-group resets group list to author' '
+	cat >expect <<-\EOF &&
+	     1	A U Thor
+	EOF
+	git shortlog -ns \
+		--group=committer \
+		--group=trailer:trailer-one \
+		--no-group \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-group resets trailer list' '
+	cat >expect <<-\EOF &&
+	     1	value-two
+	EOF
+	git shortlog -ns \
+		--group=trailer:trailer-one \
+		--no-group \
+		--group=trailer:trailer-two \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin with multiple groups reports error' '
+	git log >log &&
+	test_must_fail git shortlog --group=author --group=committer <log
+'
+
 test_done
-- 
2.28.0.1085.g44a0350633

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/8] parsing trailers with shortlog
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (7 preceding siblings ...)
  2020-09-25  7:05 ` [PATCH 8/8] shortlog: allow multiple groups to be specified Jeff King
@ 2020-09-25 14:27 ` Derrick Stolee
  2020-09-25 16:57 ` Junio C Hamano
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
  10 siblings, 0 replies; 44+ messages in thread
From: Derrick Stolee @ 2020-09-25 14:27 UTC (permalink / raw)
  To: Jeff King, git

On 9/25/2020 3:01 AM, Jeff King wrote:
> Somebody mentioned at the inclusion summit that it would be nice for
> mentoring/pairing relationships if we could more easily give credit to
> multiple authors of a commit. When people ask about adding multiple
> "author" headers, we usually recommend that they use a "co-authored-by"
> trailer. But you can't convince shortlog to count it for anything. :)

This discussion was worthwhile, and could even have measurable effects
on this community, depending on how Junio creates the list of
contributors for the release notes.

> So this series adds support for counting trailers to shortlog. You can
> do a number of fun things with it, like:
> 
>     # credit reviewers
>     git shortlog -ns --group=trailer:reviewed-by
> 
>     # credit authors and co-authors equally
>     git shortlog -ns --group=author \
>                      --group=trailer:co-authored-by
> 
>     # see who helps whom
>     git shortlog --format="...helped %an on %as" \
>                  --group=trailer:helped-by

I built Git with these patches and played around with these and
other options (such as `--group=author --group=committer`). I
could not find any bugs or other ways to improve these patches.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/8] parsing trailers with shortlog
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (8 preceding siblings ...)
  2020-09-25 14:27 ` [PATCH 0/8] parsing trailers with shortlog Derrick Stolee
@ 2020-09-25 16:57 ` Junio C Hamano
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
  10 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-09-25 16:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Somebody mentioned at the inclusion summit that it would be nice for
> mentoring/pairing relationships if we could more easily give credit to
> multiple authors of a commit. When people ask about adding multiple
> "author" headers, we usually recommend that they use a "co-authored-by"
> trailer. But you can't convince shortlog to count it for anything. :)
>
> So this series adds support for counting trailers to shortlog. You can
> do a number of fun things with it, like:
>
>     # credit reviewers
>     git shortlog -ns --group=trailer:reviewed-by
>
>     # credit authors and co-authors equally
>     git shortlog -ns --group=author \
>                      --group=trailer:co-authored-by
>
>     # see who helps whom
>     git shortlog --format="...helped %an on %as" \
>                  --group=trailer:helped-by
>
> If some of this looks familiar, it's because I sent the early patches
> years ago. But I won't even bother linking to it; I cleaned up quite a
> few rough edges since then, so it's not really worth looking at.

Fun stuff.  I like the idea.

Thanks.


>
>   [1/8]: shortlog: change "author" variables to "ident"
>   [2/8]: shortlog: refactor committer/author grouping
>   [3/8]: trailer: add interface for iterating over commit trailers
>   [4/8]: shortlog: match commit trailers with --group
>   [5/8]: shortlog: de-duplicate trailer values
>   [6/8]: shortlog: rename parse_stdin_ident()
>   [7/8]: shortlog: parse trailer idents
>   [8/8]: shortlog: allow multiple groups to be specified
>
>  Documentation/git-shortlog.txt |  29 +++++
>  builtin/log.c                  |   1 +
>  builtin/shortlog.c             | 218 +++++++++++++++++++++++++++++----
>  shortlog.h                     |   8 +-
>  t/t4201-shortlog.sh            | 141 +++++++++++++++++++++
>  trailer.c                      |  36 ++++++
>  trailer.h                      |  44 ++++++-
>  7 files changed, 447 insertions(+), 30 deletions(-)
>
> -Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
@ 2020-09-25 20:05   ` Eric Sunshine
  2020-09-27  8:03     ` Jeff King
  2020-09-26 12:31   ` Martin Ågren
  1 sibling, 1 reply; 44+ messages in thread
From: Eric Sunshine @ 2020-09-25 20:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Fri, Sep 25, 2020 at 3:02 AM Jeff King <peff@peff.net> wrote:
> In preparation for adding more grouping types, let's
> refactor the committer/author grouping code. In particular:
>
>   - the master option is now "--group", to make it clear
>     that the various group types are mutually exclusive. The
>     "--committer" option is an alias for "--group=committer".
>
>   - we keep an enum rather than a binary flag, to prepare
>     for more values
>
>   - we prefer switch statements to ternary assignment, since
>     other group types will need more custom code
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> @@ -47,9 +47,18 @@ OPTIONS
> +--group=<type>::
> +       By default, `shortlog` collects and collates author identities;
> +       using `--group` will collect other fields.
> +       `<type>` is one of:
> ++
> + - `author`, commits are grouped by author (this is the default)
> + - `committer`, commits are grouped by committer (the same as `-c`)

I had trouble interpreting "(this is the default)". It made me think
that <type> is optional:

    --group[=<type>]

but that isn't the case at all. Instead, it means that if --group
isn't specified, then grouping is done by `author` by default. It also
repeats what the general description of --group already says with
regard to the default, thus it is redundant to say it again when
describing the `author` type. Therefore, perhaps drop "(this is the
default)" altogether?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] shortlog: allow multiple groups to be specified
  2020-09-25  7:05 ` [PATCH 8/8] shortlog: allow multiple groups to be specified Jeff King
@ 2020-09-25 20:23   ` Eric Sunshine
  2020-09-27  8:06     ` Jeff King
  2020-09-26 12:48   ` Martin Ågren
  1 sibling, 1 reply; 44+ messages in thread
From: Eric Sunshine @ 2020-09-25 20:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Fri, Sep 25, 2020 at 3:05 AM Jeff King <peff@peff.net> wrote:
> This patch teaches shortlog to accept multiple --group options on the
> command line, and pull data from all of them. That makes it possible to
> run:
> [...]
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> @@ -69,6 +69,11 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
>  identity. If successful, the mailmap is applied and the email is omitted
>  unless the `--email` option is specified. If the value cannot be parsed
>  as an identity, it will be taken literally and completely.
> ++
> +If `--group` is specified multiple times, commits are counted under each
> +value (but again, only once per unique value in that commit). For
> +example, `git shortlog --group=author --group=trailer:co-authored-by`
> +counts both authors and co-authors.

Intuitively, I understand (or hope) that the first use of --group
overrides what is otherwise collected by default (authors), however,
would there be value in stating this explicitly?

At this point, the documentation for --committer still says:

    --committer::
        Collect and show committer identities instead of authors.
        This is an alias for `--group=committer`.

which stops feeling accurate now that --group can be specified more
than once. The "instead of authors" bit is particularly jarring. I
wonder if the first sentence can simply be dropped.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
  2020-09-25 20:05   ` Eric Sunshine
@ 2020-09-26 12:31   ` Martin Ågren
  2020-09-27  7:59     ` Jeff King
  1 sibling, 1 reply; 44+ messages in thread
From: Martin Ågren @ 2020-09-26 12:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

Hi Peff,

Some random thoughts follow on this patch and a few others.

On Fri, 25 Sep 2020 at 09:04, Jeff King <peff@peff.net> wrote:
>
> In preparation for adding more grouping types, let's
> refactor the committer/author grouping code. In particular:
>
>   - the master option is now "--group", to make it clear
>     that the various group types are mutually exclusive. The
>     "--committer" option is an alias for "--group=committer".

I think this is more than a refactoring of the code. The user interface
is also refactored (if that's even the right word). From the subject and
the first sentence above, I did not expect this first "-" item, nor that
the patch would touch Documentation/.

> +       enum {
> +               SHORTLOG_GROUP_AUTHOR = 0,
> +               SHORTLOG_GROUP_COMMITTER
> +       } group;

You could reduce the patch noise by adding a trailing comma, see
cc0c42975a ("CodingGuidelines: spell out post-C89 rules", 2019-07-16).
(I do realize that you later redefine all values anyway.)

Martin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] trailer: add interface for iterating over commit trailers
  2020-09-25  7:02 ` [PATCH 3/8] trailer: add interface for iterating over commit trailers Jeff King
@ 2020-09-26 12:39   ` Martin Ågren
  2020-09-27  8:20     ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Martin Ågren @ 2020-09-26 12:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Fri, 25 Sep 2020 at 09:04, Jeff King <peff@peff.net> wrote:
>
> The trailer code knows how to parse out the trailers and re-format them,
> but there's no easy way to iterate over the trailers (you can use
> trailer_info, but you have to then do a bunch of extra parsing).
>
> Let's add an iteration interface that makes this easy to do.

> +void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
> +{
> +       struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
> +       strbuf_init(&iter->key, 0);
> +       strbuf_init(&iter->val, 0);
> +       opts.no_divider = 1;
> +       trailer_info_get(&iter->info, msg, &opts);
> +       iter->cur = 0;
> +}

Ok, this does initialize everything...

> +void trailer_iterator_release(struct trailer_iterator *iter)
> +{
> +       trailer_info_release(&iter->info);
> +       strbuf_release(&iter->val);
> +       strbuf_release(&iter->key);
> +}

... and this side takes care of everything, too.

>  #define TRAILER_H
>
>  #include "list.h"
> -
> -struct strbuf;
> +#include "strbuf.h"

Spotting and removing the forward declaration, ok.

> +/*
> + * An interface for iterating over the trailers found in a particular commit
> + * message. Use like:
> + *
> + *   struct trailer_iterator iter;
> + *   trailer_iterator_init(&iter, msg);
> + *   while (trailer_iterator_advance(&iter))
> + *      ... do something with iter.key and iter.val ...
> + *   trailer_iterator_release(&iter);
> + */
> +struct trailer_iterator {
> +       struct strbuf key;
> +       struct strbuf val;
> +
> +       /* private */
> +       struct trailer_info info;
> +       size_t cur;
> +};

Ok.

> +/*
> + * Initialize "iter" in preparation for walking over the trailers in the commit
> + * message "msg". The "msg" pointer must remain valid until the iterator is
> + * released.
> + *
> + * After initializing, we are not yet pointing
> + */

Truncated sentence. "... not yet pointing at any trailer"?

Martin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] shortlog: allow multiple groups to be specified
  2020-09-25  7:05 ` [PATCH 8/8] shortlog: allow multiple groups to be specified Jeff King
  2020-09-25 20:23   ` Eric Sunshine
@ 2020-09-26 12:48   ` Martin Ågren
  2020-09-27  8:25     ` Jeff King
  1 sibling, 1 reply; 44+ messages in thread
From: Martin Ågren @ 2020-09-26 12:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Fri, 25 Sep 2020 at 09:09, Jeff King <peff@peff.net> wrote:
> +static inline int has_multi_bits(unsigned n)
> +{
> +       return (n & (n - 1)) != 0;
> +}

There's a HAS_MULTI_BITS macro in git-compat-util.h. I like how you came
up with the exact same name. It even makes me wonder if you are actively
avoiding it in favor of a re-implementation, but I can't see why.

>         enum {
> -               SHORTLOG_GROUP_AUTHOR = 0,
> -               SHORTLOG_GROUP_COMMITTER,
> -               SHORTLOG_GROUP_TRAILER
> -       } group;
> +               SHORTLOG_GROUP_AUTHOR = (1 << 0),
> +               SHORTLOG_GROUP_COMMITTER = (1 << 1),
> +               SHORTLOG_GROUP_TRAILER = (1 << 2)
> +       } groups;

Coming back to the comment on 2/8, adding a comma at the end wouldn't
reduce patch noise here and now, but whenever someone touches this place
the next time.

Of my comments in the last few mails, probably the only actionable one
is the truncated sentence in the trailer iterator header file.

Martin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-26 12:31   ` Martin Ågren
@ 2020-09-27  7:59     ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  7:59 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sat, Sep 26, 2020 at 02:31:32PM +0200, Martin Ågren wrote:

> On Fri, 25 Sep 2020 at 09:04, Jeff King <peff@peff.net> wrote:
> >
> > In preparation for adding more grouping types, let's
> > refactor the committer/author grouping code. In particular:
> >
> >   - the master option is now "--group", to make it clear
> >     that the various group types are mutually exclusive. The
> >     "--committer" option is an alias for "--group=committer".
> 
> I think this is more than a refactoring of the code. The user interface
> is also refactored (if that's even the right word). From the subject and
> the first sentence above, I did not expect this first "-" item, nor that
> the patch would touch Documentation/.

Yeah, I agree it's a bit misleading. I've reworded it (especially the
title) to make the intent more clear.

> > +       enum {
> > +               SHORTLOG_GROUP_AUTHOR = 0,
> > +               SHORTLOG_GROUP_COMMITTER
> > +       } group;
> 
> You could reduce the patch noise by adding a trailing comma, see
> cc0c42975a ("CodingGuidelines: spell out post-C89 rules", 2019-07-16).
> (I do realize that you later redefine all values anyway.)

Thanks, that's worth doing. This part of the series was actually from
2015, so perhaps before we had that guideline. :)

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-25 20:05   ` Eric Sunshine
@ 2020-09-27  8:03     ` Jeff King
  2020-09-27  8:08       ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:03 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Fri, Sep 25, 2020 at 04:05:04PM -0400, Eric Sunshine wrote:

> > +--group=<type>::
> > +       By default, `shortlog` collects and collates author identities;
> > +       using `--group` will collect other fields.
> > +       `<type>` is one of:
> > ++
> > + - `author`, commits are grouped by author (this is the default)
> > + - `committer`, commits are grouped by committer (the same as `-c`)
> 
> I had trouble interpreting "(this is the default)". It made me think
> that <type> is optional:
> 
>     --group[=<type>]
> 
> but that isn't the case at all. Instead, it means that if --group
> isn't specified, then grouping is done by `author` by default.

Right.

> It also
> repeats what the general description of --group already says with
> regard to the default, thus it is redundant to say it again when
> describing the `author` type. Therefore, perhaps drop "(this is the
> default)" altogether?

OK. I was worried somebody wouldn't quite pick up on the default. We
could spell it out as:

  - `author`, commits are grouped by author (this is the default if no
    --group is given)

but that's pretty lengthy. We could also reword the whole thing as:

  --group=<type>::
          Collect and collate values of type `<type>`; if no `--group`
	  option is given, the default is `author`:
   - `author`, commits are grouped by author
   - `committer`, commits are grouped by committer (the same as `-c`)

I'm not sure if that's clearer or not. The first sentence seems a bit
opaque, but I'm not sure how to improve it.

For now (and what plan to send out in v2), I'll take your suggestion to
just drop the phrase.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] shortlog: allow multiple groups to be specified
  2020-09-25 20:23   ` Eric Sunshine
@ 2020-09-27  8:06     ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:06 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Fri, Sep 25, 2020 at 04:23:39PM -0400, Eric Sunshine wrote:

> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > @@ -69,6 +69,11 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
> >  identity. If successful, the mailmap is applied and the email is omitted
> >  unless the `--email` option is specified. If the value cannot be parsed
> >  as an identity, it will be taken literally and completely.
> > ++
> > +If `--group` is specified multiple times, commits are counted under each
> > +value (but again, only once per unique value in that commit). For
> > +example, `git shortlog --group=author --group=trailer:co-authored-by`
> > +counts both authors and co-authors.
> 
> Intuitively, I understand (or hope) that the first use of --group
> overrides what is otherwise collected by default (authors), however,
> would there be value in stating this explicitly?

Yes. Maybe my "if no --group is specified" suggestion from the other
part of the thread _is_ worth doing then.

> At this point, the documentation for --committer still says:
> 
>     --committer::
>         Collect and show committer identities instead of authors.
>         This is an alias for `--group=committer`.
> 
> which stops feeling accurate now that --group can be specified more
> than once. The "instead of authors" bit is particularly jarring. I
> wonder if the first sentence can simply be dropped.

Hmm, yeah. I actually kept --committer different at first, having it
override any other --group options. But that seemed confusing, too
(since it's now _almost_ an alias for --group=committer). I think your
suggestion works to just explicitly mark it as an alias and nothing.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-27  8:03     ` Jeff King
@ 2020-09-27  8:08       ` Jeff King
  2020-09-27  8:23         ` Eric Sunshine
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:08 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Sun, Sep 27, 2020 at 04:03:45AM -0400, Jeff King wrote:

> but that's pretty lengthy. We could also reword the whole thing as:
> 
>   --group=<type>::
>           Collect and collate values of type `<type>`; if no `--group`
> 	  option is given, the default is `author`:
>    - `author`, commits are grouped by author
>    - `committer`, commits are grouped by committer (the same as `-c`)
> 
> I'm not sure if that's clearer or not. The first sentence seems a bit
> opaque, but I'm not sure how to improve it.
> 
> For now (and what plan to send out in v2), I'll take your suggestion to
> just drop the phrase.

OK, I lied. After reading your other email, I ended up with:

  --group=<type>::
          Group commits based on `<type>`. If no `--group` option is
          specified, the default is `author`. `<type>` is one of:
  +
   - `author`, commits are grouped by author
   - `committer`, commits are grouped by committer (the same as `-c`)

  -c::
  --committer::
          This is an alias for `--group=committer`.

which I think works by itself, and extends nicely to trailers.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] trailer: add interface for iterating over commit trailers
  2020-09-26 12:39   ` Martin Ågren
@ 2020-09-27  8:20     ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:20 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sat, Sep 26, 2020 at 02:39:32PM +0200, Martin Ågren wrote:

> > +/*
> > + * Initialize "iter" in preparation for walking over the trailers in the commit
> > + * message "msg". The "msg" pointer must remain valid until the iterator is
> > + * released.
> > + *
> > + * After initializing, we are not yet pointing
> > + */
> 
> Truncated sentence. "... not yet pointing at any trailer"?

Oops. Yes, though I expanded it to:

  * After initializing, note that key/val will not yet point to any trailer.
  * Call advance() to parse the first one (if any).

Obviously another convention would be:

  for (trailer_iterator_init(&iter, msg);
       !trailer_iterator_done();
       trailer_iterator_advance()) {
  }

but I don't think it matters much either way as long as it's clearly
documented.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] shortlog: refactor committer/author grouping
  2020-09-27  8:08       ` Jeff King
@ 2020-09-27  8:23         ` Eric Sunshine
  0 siblings, 0 replies; 44+ messages in thread
From: Eric Sunshine @ 2020-09-27  8:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Sun, Sep 27, 2020 at 4:08 AM Jeff King <peff@peff.net> wrote:
> OK, I lied. After reading your other email, I ended up with:
>
>   --group=<type>::
>           Group commits based on `<type>`. If no `--group` option is
>           specified, the default is `author`. `<type>` is one of:
>   +
>    - `author`, commits are grouped by author
>    - `committer`, commits are grouped by committer (the same as `-c`)

Sounds good. Makes it much more clear to me.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] shortlog: allow multiple groups to be specified
  2020-09-26 12:48   ` Martin Ågren
@ 2020-09-27  8:25     ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:25 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sat, Sep 26, 2020 at 02:48:44PM +0200, Martin Ågren wrote:

> On Fri, 25 Sep 2020 at 09:09, Jeff King <peff@peff.net> wrote:
> > +static inline int has_multi_bits(unsigned n)
> > +{
> > +       return (n & (n - 1)) != 0;
> > +}
> 
> There's a HAS_MULTI_BITS macro in git-compat-util.h. I like how you came
> up with the exact same name. It even makes me wonder if you are actively
> avoiding it in favor of a re-implementation, but I can't see why.

Heh. I _thought_ we had such a helper, and I even thought I remembered
the name. But when I grepped for it, I couldn't come up with it. Silly
"-i". (So I guess it's no surprise that I came up with the same
implementation, which is the only good one, and the same name, which I
did pull from the back of my mind).

I'll drop this in favor of the macro.

> >         enum {
> > -               SHORTLOG_GROUP_AUTHOR = 0,
> > -               SHORTLOG_GROUP_COMMITTER,
> > -               SHORTLOG_GROUP_TRAILER
> > -       } group;
> > +               SHORTLOG_GROUP_AUTHOR = (1 << 0),
> > +               SHORTLOG_GROUP_COMMITTER = (1 << 1),
> > +               SHORTLOG_GROUP_TRAILER = (1 << 2)
> > +       } groups;
> 
> Coming back to the comment on 2/8, adding a comma at the end wouldn't
> reduce patch noise here and now, but whenever someone touches this place
> the next time.

Agreed, will do.

> Of my comments in the last few mails, probably the only actionable one
> is the truncated sentence in the trailer iterator header file.

I agree none are big, but worth addressing since I was making a re-roll
anyway. Thanks.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/8] parsing trailers with shortlog
  2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
                   ` (9 preceding siblings ...)
  2020-09-25 16:57 ` Junio C Hamano
@ 2020-09-27  8:39 ` Jeff King
  2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
                     ` (8 more replies)
  10 siblings, 9 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:39 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

On Fri, Sep 25, 2020 at 03:01:20AM -0400, Jeff King wrote:

> Somebody mentioned at the inclusion summit that it would be nice for
> mentoring/pairing relationships if we could more easily give credit to
> multiple authors of a commit. When people ask about adding multiple
> "author" headers, we usually recommend that they use a "co-authored-by"
> trailer. But you can't convince shortlog to count it for anything. :)
> 
> So this series adds support for counting trailers to shortlog. You can
> do a number of fun things with it, like:

Here's a re-roll with some cosmetic fixups. All except the first address
the points raised by Eric and Martin (thanks, both):

  - put the --group=<type> unordered list into a delimited block, so
    that the follow-on paragraph about multiple --group options isn't
    indented with "trailer" item

  - reword --group docs for clarity

  - use existing HAS_MULTI_BITS

  - add trailing commas in enum

  - complete truncated trailer_iterator comment

  - fix misleading "refactor" commit message in patch 2

Range diff is below.

  [1/8]: shortlog: change "author" variables to "ident"
  [2/8]: shortlog: add grouping option
  [3/8]: trailer: add interface for iterating over commit trailers
  [4/8]: shortlog: match commit trailers with --group
  [5/8]: shortlog: de-duplicate trailer values
  [6/8]: shortlog: rename parse_stdin_ident()
  [7/8]: shortlog: parse trailer idents
  [8/8]: shortlog: allow multiple groups to be specified

 Documentation/git-shortlog.txt |  31 ++++-
 builtin/log.c                  |   1 +
 builtin/shortlog.c             | 213 ++++++++++++++++++++++++++++-----
 shortlog.h                     |   8 +-
 t/t4201-shortlog.sh            | 141 ++++++++++++++++++++++
 trailer.c                      |  36 ++++++
 trailer.h                      |  45 ++++++-
 7 files changed, 444 insertions(+), 31 deletions(-)

1:  675e8cea10 = 1:  3b658cba7c shortlog: change "author" variables to "ident"
2:  72d94a2601 ! 2:  3ed8a55597 shortlog: refactor committer/author grouping
    @@ Metadata
     Author: Jeff King <peff@peff.net>
     
      ## Commit message ##
    -    shortlog: refactor committer/author grouping
    +    shortlog: add grouping option
     
    -    In preparation for adding more grouping types, let's
    -    refactor the committer/author grouping code. In particular:
    +    In preparation for adding more grouping types, let's refactor the
    +    committer/author grouping code and add a user-facing option that binds
    +    them together. In particular:
     
    -      - the master option is now "--group", to make it clear
    +      - the main option is now "--group", to make it clear
             that the various group types are mutually exclusive. The
             "--committer" option is an alias for "--group=committer".
     
    @@ Documentation/git-shortlog.txt: OPTIONS
      	Each pretty-printed commit will be rewrapped before it is shown.
      
     +--group=<type>::
    -+	By default, `shortlog` collects and collates author identities;
    -+	using `--group` will collect other fields.
    -+	`<type>` is one of:
    ++	Group commits based on `<type>`. If no `--group` option is
    ++	specified, the default is `author`. `<type>` is one of:
     ++
    -+ - `author`, commits are grouped by author (this is the default)
    ++ - `author`, commits are grouped by author
     + - `committer`, commits are grouped by committer (the same as `-c`)
     +
      -c::
      --committer::
    - 	Collect and show committer identities instead of authors.
    +-	Collect and show committer identities instead of authors.
     +	This is an alias for `--group=committer`.
      
      -w[<width>[,<indent1>[,<indent2>]]]::
    @@ shortlog.h: struct shortlog {
     +
     +	enum {
     +		SHORTLOG_GROUP_AUTHOR = 0,
    -+		SHORTLOG_GROUP_COMMITTER
    ++		SHORTLOG_GROUP_COMMITTER,
     +	} group;
      
      	char *common_repo_prefix;
3:  bec4ec1433 ! 3:  d314578bde trailer: add interface for iterating over commit trailers
    @@ trailer.h: void trailer_info_release(struct trailer_info *info);
     + * message "msg". The "msg" pointer must remain valid until the iterator is
     + * released.
     + *
    -+ * After initializing, we are not yet pointing
    ++ * After initializing, note that key/val will not yet point to any trailer.
    ++ * Call advance() to parse the first one (if any).
     + */
     +void trailer_iterator_init(struct trailer_iterator *iter, const char *msg);
     +
4:  7e6ee8317f ! 4:  fdb1dbd164 shortlog: match commit trailers with --group
    @@ Commit message
      ## Documentation/git-shortlog.txt ##
     @@ Documentation/git-shortlog.txt: OPTIONS
      +
    -  - `author`, commits are grouped by author (this is the default)
    +  - `author`, commits are grouped by author
       - `committer`, commits are grouped by committer (the same as `-c`)
     + - `trailer:<field>`, the `<field>` is interpreted as a case-insensitive
     +   commit message trailer (see linkgit:git-interpret-trailers[1]). For
    @@ builtin/shortlog.c: static int parse_wrap_args(const struct option *opt, const c
     
      ## shortlog.h ##
     @@ shortlog.h: struct shortlog {
    - 
      	enum {
      		SHORTLOG_GROUP_AUTHOR = 0,
    --		SHORTLOG_GROUP_COMMITTER
    -+		SHORTLOG_GROUP_COMMITTER,
    -+		SHORTLOG_GROUP_TRAILER
    + 		SHORTLOG_GROUP_COMMITTER,
    ++		SHORTLOG_GROUP_TRAILER,
      	} group;
     +	char *trailer;
      
5:  8bc6a94239 = 5:  6374604756 shortlog: de-duplicate trailer values
6:  16294b5d16 = 6:  dfabab9957 shortlog: rename parse_stdin_ident()
7:  82c637c0a2 = 7:  bbe87d0e2a shortlog: parse trailer idents
8:  4cacdecaa0 ! 8:  ca782fa92e shortlog: allow multiple groups to be specified
    @@ Commit message
         Signed-off-by: Jeff King <peff@peff.net>
     
      ## Documentation/git-shortlog.txt ##
    +@@ Documentation/git-shortlog.txt: OPTIONS
    + 	Group commits based on `<type>`. If no `--group` option is
    + 	specified, the default is `author`. `<type>` is one of:
    + +
    ++--
    +  - `author`, commits are grouped by author
    +  - `committer`, commits are grouped by committer (the same as `-c`)
    +  - `trailer:<field>`, the `<field>` is interpreted as a case-insensitive
     @@ Documentation/git-shortlog.txt: Shortlog will attempt to parse each trailer value as a `name <email>`
      identity. If successful, the mailmap is applied and the email is omitted
      unless the `--email` option is specified. If the value cannot be parsed
      as an identity, it will be taken literally and completely.
    ++--
     ++
     +If `--group` is specified multiple times, commits are counted under each
     +value (but again, only once per unique value in that commit). For
    @@ builtin/log.c: static void make_cover_letter(struct rev_info *rev, int use_stdou
      
     
      ## builtin/shortlog.c ##
    -@@ builtin/shortlog.c: static int parse_ident(struct shortlog *log,
    - 	return 0;
    - }
    - 
    -+static inline int has_multi_bits(unsigned n)
    -+{
    -+	return (n & (n - 1)) != 0;
    -+}
    -+
    - static void read_from_stdin(struct shortlog *log)
    - {
    - 	struct strbuf ident = STRBUF_INIT;
     @@ builtin/shortlog.c: static void read_from_stdin(struct shortlog *log)
      	static const char *committer_match[2] = { "Commit: ", "committer " };
      	const char **match;
      
     -	switch (log->group) {
    -+	if (has_multi_bits(log->groups))
    ++	if (HAS_MULTI_BITS(log->groups))
     +		die(_("using multiple --group options with stdin is not supported"));
     +
     +	switch (log->groups) {
    @@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit
     -		insert_one_record(log, ident.buf, oneline_str);
     -		break;
     -	case SHORTLOG_GROUP_COMMITTER:
    -+		if (!has_multi_bits(log->groups) ||
    ++		if (!HAS_MULTI_BITS(log->groups) ||
     +		    !strset_check_and_add(&dups, ident.buf))
     +			insert_one_record(log, ident.buf, oneline_str);
     +	}
    @@ builtin/shortlog.c: void shortlog_add_commit(struct shortlog *log, struct commit
     -	case SHORTLOG_GROUP_TRAILER:
     -		insert_records_from_trailers(log, commit, &ctx, oneline_str);
     -		break;
    -+		if (!has_multi_bits(log->groups) ||
    ++		if (!HAS_MULTI_BITS(log->groups) ||
     +		    !strset_check_and_add(&dups, ident.buf))
     +			insert_one_record(log, ident.buf, oneline_str);
     +	}
    @@ shortlog.h: struct shortlog {
      	enum {
     -		SHORTLOG_GROUP_AUTHOR = 0,
     -		SHORTLOG_GROUP_COMMITTER,
    --		SHORTLOG_GROUP_TRAILER
    +-		SHORTLOG_GROUP_TRAILER,
     -	} group;
     -	char *trailer;
     +		SHORTLOG_GROUP_AUTHOR = (1 << 0),
     +		SHORTLOG_GROUP_COMMITTER = (1 << 1),
    -+		SHORTLOG_GROUP_TRAILER = (1 << 2)
    ++		SHORTLOG_GROUP_TRAILER = (1 << 2),
     +	} groups;
     +	struct string_list trailers;
      

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/8] shortlog: change "author" variables to "ident"
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
@ 2020-09-27  8:39   ` Jeff King
  2020-09-27 19:18     ` Junio C Hamano
  2020-09-27  8:39   ` [PATCH v2 2/8] shortlog: add grouping option Jeff King
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:39 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index c856c58bb5..edcf2e0d54 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -49,12 +49,12 @@ static int compare_by_list(const void *a1, const void *a2)
 }
 
 static void insert_one_record(struct shortlog *log,
-			      const char *author,
+			      const char *ident,
 			      const char *oneline)
 {
 	struct string_list_item *item;
 
-	item = string_list_insert(&log->list, author);
+	item = string_list_insert(&log->list, ident);
 
 	if (log->summary)
 		item->util = (void *)(UTIL_TO_INT(item) + 1);
@@ -97,8 +97,8 @@ static void insert_one_record(struct shortlog *log,
 	}
 }
 
-static int parse_stdin_author(struct shortlog *log,
-			       struct strbuf *out, const char *in)
+static int parse_stdin_ident(struct shortlog *log,
+			     struct strbuf *out, const char *in)
 {
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
@@ -122,18 +122,18 @@ static int parse_stdin_author(struct shortlog *log,
 
 static void read_from_stdin(struct shortlog *log)
 {
-	struct strbuf author = STRBUF_INIT;
-	struct strbuf mapped_author = STRBUF_INIT;
+	struct strbuf ident = STRBUF_INIT;
+	struct strbuf mapped_ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	static const char *author_match[2] = { "Author: ", "author " };
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
 	match = log->committer ? committer_match : author_match;
-	while (strbuf_getline_lf(&author, stdin) != EOF) {
+	while (strbuf_getline_lf(&ident, stdin) != EOF) {
 		const char *v;
-		if (!skip_prefix(author.buf, match[0], &v) &&
-		    !skip_prefix(author.buf, match[1], &v))
+		if (!skip_prefix(ident.buf, match[0], &v) &&
+		    !skip_prefix(ident.buf, match[1], &v))
 			continue;
 		while (strbuf_getline_lf(&oneline, stdin) != EOF &&
 		       oneline.len)
@@ -142,20 +142,20 @@ static void read_from_stdin(struct shortlog *log)
 		       !oneline.len)
 			; /* discard blanks */
 
-		strbuf_reset(&mapped_author);
-		if (parse_stdin_author(log, &mapped_author, v) < 0)
+		strbuf_reset(&mapped_ident);
+		if (parse_stdin_ident(log, &mapped_ident, v) < 0)
 			continue;
 
-		insert_one_record(log, mapped_author.buf, oneline.buf);
+		insert_one_record(log, mapped_ident.buf, oneline.buf);
 	}
-	strbuf_release(&author);
-	strbuf_release(&mapped_author);
+	strbuf_release(&ident);
+	strbuf_release(&mapped_ident);
 	strbuf_release(&oneline);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
-	struct strbuf author = STRBUF_INIT;
+	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
 	const char *fmt;
@@ -170,17 +170,17 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 		(log->email ? "%cN <%cE>" : "%cN") :
 		(log->email ? "%aN <%aE>" : "%aN");
 
-	format_commit_message(commit, fmt, &author, &ctx);
+	format_commit_message(commit, fmt, &ident, &ctx);
 	if (!log->summary) {
 		if (log->user_format)
 			pretty_print_commit(&ctx, commit, &oneline);
 		else
 			format_commit_message(commit, "%s", &oneline, &ctx);
 	}
 
-	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
+	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
 
-	strbuf_release(&author);
+	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
 
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 2/8] shortlog: add grouping option
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
  2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
@ 2020-09-27  8:39   ` Jeff King
  2020-09-27  8:40   ` [PATCH v2 3/8] trailer: add interface for iterating over commit trailers Jeff King
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:39 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:

  - the main option is now "--group", to make it clear
    that the various group types are mutually exclusive. The
    "--committer" option is an alias for "--group=committer".

  - we keep an enum rather than a binary flag, to prepare
    for more values

  - we prefer switch statements to ternary assignment, since
    other group types will need more custom code

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  9 +++++-
 builtin/shortlog.c             | 59 +++++++++++++++++++++++++++-------
 shortlog.h                     |  6 +++-
 t/t4201-shortlog.sh            |  5 +++
 4 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index a72ea7f7ba..6496d313c1 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -47,9 +47,16 @@ OPTIONS
 
 	Each pretty-printed commit will be rewrapped before it is shown.
 
+--group=<type>::
+	Group commits based on `<type>`. If no `--group` option is
+	specified, the default is `author`. `<type>` is one of:
++
+ - `author`, commits are grouped by author
+ - `committer`, commits are grouped by committer (the same as `-c`)
+
 -c::
 --committer::
-	Collect and show committer identities instead of authors.
+	This is an alias for `--group=committer`.
 
 -w[<width>[,<indent1>[,<indent2>]]]::
 	Linewrap the output by wrapping each line at `width`.  The first
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index edcf2e0d54..880ce19304 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -129,7 +129,17 @@ static void read_from_stdin(struct shortlog *log)
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
-	match = log->committer ? committer_match : author_match;
+	switch (log->group) {
+	case SHORTLOG_GROUP_AUTHOR:
+		match = author_match;
+		break;
+	case SHORTLOG_GROUP_COMMITTER:
+		match = committer_match;
+		break;
+	default:
+		BUG("unhandled shortlog group");
+	}
+
 	while (strbuf_getline_lf(&ident, stdin) != EOF) {
 		const char *v;
 		if (!skip_prefix(ident.buf, match[0], &v) &&
@@ -158,27 +168,36 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
 	struct pretty_print_context ctx = {0};
-	const char *fmt;
+	const char *oneline_str;
 
 	ctx.fmt = CMIT_FMT_USERFORMAT;
 	ctx.abbrev = log->abbrev;
 	ctx.print_email_subject = 1;
 	ctx.date_mode.type = DATE_NORMAL;
 	ctx.output_encoding = get_log_output_encoding();
 
-	fmt = log->committer ?
-		(log->email ? "%cN <%cE>" : "%cN") :
-		(log->email ? "%aN <%aE>" : "%aN");
-
-	format_commit_message(commit, fmt, &ident, &ctx);
 	if (!log->summary) {
 		if (log->user_format)
 			pretty_print_commit(&ctx, commit, &oneline);
 		else
 			format_commit_message(commit, "%s", &oneline, &ctx);
 	}
-
-	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
+	oneline_str = oneline.len ? oneline.buf : "<none>";
+
+	switch (log->group) {
+	case SHORTLOG_GROUP_AUTHOR:
+		format_commit_message(commit,
+				      log->email ? "%aN <%aE>" : "%aN",
+				      &ident, &ctx);
+		insert_one_record(log, ident.buf, oneline_str);
+		break;
+	case SHORTLOG_GROUP_COMMITTER:
+		format_commit_message(commit,
+				      log->email ? "%cN <%cE>" : "%cN",
+				      &ident, &ctx);
+		insert_one_record(log, ident.buf, oneline_str);
+		break;
+	}
 
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
@@ -241,6 +260,21 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int parse_group_option(const struct option *opt, const char *arg, int unset)
+{
+	struct shortlog *log = opt->value;
+
+	if (unset || !strcasecmp(arg, "author"))
+		log->group = SHORTLOG_GROUP_AUTHOR;
+	else if (!strcasecmp(arg, "committer"))
+		log->group = SHORTLOG_GROUP_COMMITTER;
+	else
+		return error(_("unknown group type: %s"), arg);
+
+	return 0;
+}
+
+
 void shortlog_init(struct shortlog *log)
 {
 	memset(log, 0, sizeof(*log));
@@ -260,8 +294,9 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 
 	const struct option options[] = {
-		OPT_BOOL('c', "committer", &log.committer,
-			 N_("Group by committer rather than author")),
+		OPT_SET_INT('c', "committer", &log.group,
+			    N_("Group by committer rather than author"),
+			    SHORTLOG_GROUP_COMMITTER),
 		OPT_BOOL('n', "numbered", &log.sort_by_number,
 			 N_("sort output according to the number of commits per author")),
 		OPT_BOOL('s', "summary", &log.summary,
@@ -271,6 +306,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 		OPT_CALLBACK_F('w', NULL, &log, N_("<w>[,<i1>[,<i2>]]"),
 			N_("Linewrap output"), PARSE_OPT_OPTARG,
 			&parse_wrap_args),
+		OPT_CALLBACK(0, "group", &log, N_("field"),
+			N_("Group by field"), parse_group_option),
 		OPT_END(),
 	};
 
diff --git a/shortlog.h b/shortlog.h
index 2fa61c4294..876a52158d 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -15,7 +15,11 @@ struct shortlog {
 	int in2;
 	int user_format;
 	int abbrev;
-	int committer;
+
+	enum {
+		SHORTLOG_GROUP_AUTHOR = 0,
+		SHORTLOG_GROUP_COMMITTER,
+	} group;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index d3a7ce6bbb..65e4468746 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -215,4 +215,9 @@ test_expect_success 'shortlog --committer (external)' '
 	test_cmp expect actual
 '
 
+test_expect_success '--group=committer is the same as --committer' '
+	git shortlog -ns --group=committer HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 3/8] trailer: add interface for iterating over commit trailers
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
  2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
  2020-09-27  8:39   ` [PATCH v2 2/8] shortlog: add grouping option Jeff King
@ 2020-09-27  8:40   ` Jeff King
  2020-09-27  8:40   ` [PATCH v2 4/8] shortlog: match commit trailers with --group Jeff King
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:40 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

The trailer code knows how to parse out the trailers and re-format them,
but there's no easy way to iterate over the trailers (you can use
trailer_info, but you have to then do a bunch of extra parsing).

Let's add an iteration interface that makes this easy to do.

Signed-off-by: Jeff King <peff@peff.net>
---
 trailer.c | 36 ++++++++++++++++++++++++++++++++++++
 trailer.h | 45 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/trailer.c b/trailer.c
index 68dabc2556..3f7391d793 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1183,3 +1183,39 @@ void format_trailers_from_commit(struct strbuf *out, const char *msg,
 	format_trailer_info(out, &info, opts);
 	trailer_info_release(&info);
 }
+
+void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
+{
+	struct process_trailer_options opts = PROCESS_TRAILER_OPTIONS_INIT;
+	strbuf_init(&iter->key, 0);
+	strbuf_init(&iter->val, 0);
+	opts.no_divider = 1;
+	trailer_info_get(&iter->info, msg, &opts);
+	iter->cur = 0;
+}
+
+int trailer_iterator_advance(struct trailer_iterator *iter)
+{
+	while (iter->cur < iter->info.trailer_nr) {
+		char *trailer = iter->info.trailers[iter->cur++];
+		int separator_pos = find_separator(trailer, separators);
+
+		if (separator_pos < 1)
+			continue; /* not a real trailer */
+
+		strbuf_reset(&iter->key);
+		strbuf_reset(&iter->val);
+		parse_trailer(&iter->key, &iter->val, NULL,
+			      trailer, separator_pos);
+		unfold_value(&iter->val);
+		return 1;
+	}
+	return 0;
+}
+
+void trailer_iterator_release(struct trailer_iterator *iter)
+{
+	trailer_info_release(&iter->info);
+	strbuf_release(&iter->val);
+	strbuf_release(&iter->key);
+}
diff --git a/trailer.h b/trailer.h
index 203acf4ee1..cd93e7ddea 100644
--- a/trailer.h
+++ b/trailer.h
@@ -2,8 +2,7 @@
 #define TRAILER_H
 
 #include "list.h"
-
-struct strbuf;
+#include "strbuf.h"
 
 enum trailer_where {
 	WHERE_DEFAULT,
@@ -103,4 +102,46 @@ void trailer_info_release(struct trailer_info *info);
 void format_trailers_from_commit(struct strbuf *out, const char *msg,
 				 const struct process_trailer_options *opts);
 
+/*
+ * An interface for iterating over the trailers found in a particular commit
+ * message. Use like:
+ *
+ *   struct trailer_iterator iter;
+ *   trailer_iterator_init(&iter, msg);
+ *   while (trailer_iterator_advance(&iter))
+ *      ... do something with iter.key and iter.val ...
+ *   trailer_iterator_release(&iter);
+ */
+struct trailer_iterator {
+	struct strbuf key;
+	struct strbuf val;
+
+	/* private */
+	struct trailer_info info;
+	size_t cur;
+};
+
+/*
+ * Initialize "iter" in preparation for walking over the trailers in the commit
+ * message "msg". The "msg" pointer must remain valid until the iterator is
+ * released.
+ *
+ * After initializing, note that key/val will not yet point to any trailer.
+ * Call advance() to parse the first one (if any).
+ */
+void trailer_iterator_init(struct trailer_iterator *iter, const char *msg);
+
+/*
+ * Advance to the next trailer of the iterator. Returns 0 if there is no such
+ * trailer, and 1 otherwise. The key and value of the trailer can be
+ * fetched from the iter->key and iter->value fields (which are valid
+ * only until the next advance).
+ */
+int trailer_iterator_advance(struct trailer_iterator *iter);
+
+/*
+ * Release all resources associated with the trailer iteration.
+ */
+void trailer_iterator_release(struct trailer_iterator *iter);
+
 #endif /* TRAILER_H */
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 4/8] shortlog: match commit trailers with --group
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
                     ` (2 preceding siblings ...)
  2020-09-27  8:40   ` [PATCH v2 3/8] trailer: add interface for iterating over commit trailers Jeff King
@ 2020-09-27  8:40   ` Jeff King
  2020-09-27 19:51     ` Junio C Hamano
  2020-09-27  8:40   ` [PATCH v2 5/8] shortlog: de-duplicate trailer values Jeff King
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:40 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:

  git shortlog -ns --group=trailer:reviewed-by

in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:

  git shortlog --format="...helped %an (%ad)" \
               --group=trailer:helped-by

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt | 13 ++++++++++
 builtin/shortlog.c             | 44 +++++++++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            | 14 +++++++++++
 4 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 6496d313c1..edd6cda58a 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -53,6 +53,19 @@ OPTIONS
 +
  - `author`, commits are grouped by author
  - `committer`, commits are grouped by committer (the same as `-c`)
+ - `trailer:<field>`, the `<field>` is interpreted as a case-insensitive
+   commit message trailer (see linkgit:git-interpret-trailers[1]). For
+   example, if your project uses `Reviewed-by` trailers, you might want
+   to see who has been reviewing with
+   `git shortlog -ns --group=trailer:reviewed-by`.
++
+Note that commits that do not include the trailer will not be counted.
+Likewise, commits with multiple trailers (e.g., multiple signoffs) may
+be counted more than once.
++
+The contents of each trailer value are taken literally and completely.
+No mailmap is applied, and the `-e` option has no effect (if the trailer
+contains a username and email, they are both always shown).
 
 -c::
 --committer::
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 880ce19304..e1d9ee909f 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -9,6 +9,7 @@
 #include "mailmap.h"
 #include "shortlog.h"
 #include "parse-options.h"
+#include "trailer.h"
 
 static char const * const shortlog_usage[] = {
 	N_("git shortlog [<options>] [<revision-range>] [[--] <path>...]"),
@@ -136,6 +137,8 @@ static void read_from_stdin(struct shortlog *log)
 	case SHORTLOG_GROUP_COMMITTER:
 		match = committer_match;
 		break;
+	case SHORTLOG_GROUP_TRAILER:
+		die(_("using --group=trailer with stdin is not supported"));
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -163,6 +166,37 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
+static void insert_records_from_trailers(struct shortlog *log,
+					 struct commit *commit,
+					 struct pretty_print_context *ctx,
+					 const char *oneline)
+{
+	struct trailer_iterator iter;
+	const char *commit_buffer, *body;
+
+	/*
+	 * Using format_commit_message("%B") would be simpler here, but
+	 * this saves us copying the message.
+	 */
+	commit_buffer = logmsg_reencode(commit, NULL, ctx->output_encoding);
+	body = strstr(commit_buffer, "\n\n");
+	if (!body)
+		return;
+
+	trailer_iterator_init(&iter, body);
+	while (trailer_iterator_advance(&iter)) {
+		const char *value = iter.val.buf;
+
+		if (strcasecmp(iter.key.buf, log->trailer))
+			continue;
+
+		insert_one_record(log, value, oneline);
+	}
+	trailer_iterator_release(&iter);
+
+	unuse_commit_buffer(commit, commit_buffer);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -197,6 +231,9 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 				      &ident, &ctx);
 		insert_one_record(log, ident.buf, oneline_str);
 		break;
+	case SHORTLOG_GROUP_TRAILER:
+		insert_records_from_trailers(log, commit, &ctx, oneline_str);
+		break;
 	}
 
 	strbuf_release(&ident);
@@ -263,12 +300,17 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset)
 static int parse_group_option(const struct option *opt, const char *arg, int unset)
 {
 	struct shortlog *log = opt->value;
+	const char *field;
 
 	if (unset || !strcasecmp(arg, "author"))
 		log->group = SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
 		log->group = SHORTLOG_GROUP_COMMITTER;
-	else
+	else if (skip_prefix(arg, "trailer:", &field)) {
+		log->group = SHORTLOG_GROUP_TRAILER;
+		free(log->trailer);
+		log->trailer = xstrdup(field);
+	} else
 		return error(_("unknown group type: %s"), arg);
 
 	return 0;
diff --git a/shortlog.h b/shortlog.h
index 876a52158d..89c2dbc5e6 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -19,7 +19,9 @@ struct shortlog {
 	enum {
 		SHORTLOG_GROUP_AUTHOR = 0,
 		SHORTLOG_GROUP_COMMITTER,
+		SHORTLOG_GROUP_TRAILER,
 	} group;
+	char *trailer;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 65e4468746..e97d891a71 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -220,4 +220,18 @@ test_expect_success '--group=committer is the same as --committer' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=trailer:signed-off-by' '
+	git commit --allow-empty -m foo -s &&
+	GIT_COMMITTER_NAME="SOB One" \
+	GIT_COMMITTER_EMAIL=sob@example.com \
+		git commit --allow-empty -m foo -s &&
+	git commit --allow-empty --amend --no-edit -s &&
+	cat >expect <<-\EOF &&
+	     2	C O Mitter <committer@example.com>
+	     1	SOB One <sob@example.com>
+	EOF
+	git shortlog -ns --group=trailer:signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 5/8] shortlog: de-duplicate trailer values
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
                     ` (3 preceding siblings ...)
  2020-09-27  8:40   ` [PATCH v2 4/8] shortlog: match commit trailers with --group Jeff King
@ 2020-09-27  8:40   ` Jeff King
  2020-09-27 20:23     ` Junio C Hamano
  2020-09-27  8:40   ` [PATCH v2 6/8] shortlog: rename parse_stdin_ident() Jeff King
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:40 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:

  Signed-off-by: One
  Signed-off-by: Two
  Signed-off-by: One

We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  3 +-
 builtin/shortlog.c             | 58 ++++++++++++++++++++++++++++++++++
 t/t4201-shortlog.sh            | 28 ++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index edd6cda58a..9e94613e13 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -61,7 +61,8 @@ OPTIONS
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
-be counted more than once.
+be counted more than once (but only once per unique trailer value in
+that commit).
 +
 The contents of each trailer value are taken literally and completely.
 No mailmap is applied, and the `-e` option has no effect (if the trailer
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index e1d9ee909f..d2d8103dd3 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -166,13 +166,68 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
+struct strset_item {
+	struct hashmap_entry ent;
+	char value[FLEX_ARRAY];
+};
+
+struct strset {
+	struct hashmap map;
+};
+
+#define STRSET_INIT { { NULL } }
+
+static int strset_item_hashcmp(const void *hash_data,
+			       const struct hashmap_entry *entry,
+			       const struct hashmap_entry *entry_or_key,
+			       const void *keydata)
+{
+	const struct strset_item *a, *b;
+
+	a = container_of(entry, const struct strset_item, ent);
+	if (keydata)
+		return strcmp(a->value, keydata);
+
+	b = container_of(entry_or_key, const struct strset_item, ent);
+	return strcmp(a->value, b->value);
+}
+
+/*
+ * Adds "str" to the set if it was not already present; returns true if it was
+ * already there.
+ */
+static int strset_check_and_add(struct strset *ss, const char *str)
+{
+	unsigned int hash = strhash(str);
+	struct strset_item *item;
+
+	if (!ss->map.table)
+		hashmap_init(&ss->map, strset_item_hashcmp, NULL, 0);
+
+	if (hashmap_get_from_hash(&ss->map, hash, str))
+		return 1;
+
+	FLEX_ALLOC_STR(item, value, str);
+	hashmap_entry_init(&item->ent, hash);
+	hashmap_add(&ss->map, &item->ent);
+	return 0;
+}
+
+static void strset_clear(struct strset *ss)
+{
+	if (!ss->map.table)
+		return;
+	hashmap_free_entries(&ss->map, struct strset_item, ent);
+}
+
 static void insert_records_from_trailers(struct shortlog *log,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
+	struct strset dups = STRSET_INIT;
 
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
@@ -190,10 +245,13 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (strcasecmp(iter.key.buf, log->trailer))
 			continue;
 
+		if (strset_check_and_add(&dups, value))
+			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
+	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index e97d891a71..83dbbc44e8 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -234,4 +234,32 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog de-duplicates trailers in a single commit' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this message has two distinct values, plus a repeat
+
+	Repeated-trailer: Foo
+	Repeated-trailer: Bar
+	Repeated-trailer: Foo
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	similar to the previous, but without the second distinct value
+
+	Repeated-trailer: Foo
+	Repeated-trailer: Foo
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	Foo
+	     1	Bar
+	EOF
+	git shortlog -ns --group=trailer:repeated-trailer -2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 6/8] shortlog: rename parse_stdin_ident()
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
                     ` (4 preceding siblings ...)
  2020-09-27  8:40   ` [PATCH v2 5/8] shortlog: de-duplicate trailer values Jeff King
@ 2020-09-27  8:40   ` Jeff King
  2020-09-27  8:40   ` [PATCH v2 7/8] shortlog: parse trailer idents Jeff King
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:40 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

This function is actually useful for parsing any identity, whether from
stdin or not. We'll need it for handling trailers.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/shortlog.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d2d8103dd3..e6f4faec7c 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -98,8 +98,8 @@ static void insert_one_record(struct shortlog *log,
 	}
 }
 
-static int parse_stdin_ident(struct shortlog *log,
-			     struct strbuf *out, const char *in)
+static int parse_ident(struct shortlog *log,
+		       struct strbuf *out, const char *in)
 {
 	const char *mailbuf, *namebuf;
 	size_t namelen, maillen;
@@ -156,7 +156,7 @@ static void read_from_stdin(struct shortlog *log)
 			; /* discard blanks */
 
 		strbuf_reset(&mapped_ident);
-		if (parse_stdin_ident(log, &mapped_ident, v) < 0)
+		if (parse_ident(log, &mapped_ident, v) < 0)
 			continue;
 
 		insert_one_record(log, mapped_ident.buf, oneline.buf);
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 7/8] shortlog: parse trailer idents
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
                     ` (5 preceding siblings ...)
  2020-09-27  8:40   ` [PATCH v2 6/8] shortlog: rename parse_stdin_ident() Jeff King
@ 2020-09-27  8:40   ` Jeff King
  2020-09-27 20:49     ` Junio C Hamano
  2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
  2020-09-27 14:38   ` [PATCH v2 0/8] parsing trailers with shortlog Martin Ågren
  8 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:40 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.

When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  7 ++++---
 builtin/shortlog.c             |  6 ++++++
 t/t4201-shortlog.sh            | 20 ++++++++++++++++++++
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 9e94613e13..3db0db2da0 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -64,9 +64,10 @@ Likewise, commits with multiple trailers (e.g., multiple signoffs) may
 be counted more than once (but only once per unique trailer value in
 that commit).
 +
-The contents of each trailer value are taken literally and completely.
-No mailmap is applied, and the `-e` option has no effect (if the trailer
-contains a username and email, they are both always shown).
+Shortlog will attempt to parse each trailer value as a `name <email>`
+identity. If successful, the mailmap is applied and the email is omitted
+unless the `--email` option is specified. If the value cannot be parsed
+as an identity, it will be taken literally and completely.
 
 -c::
 --committer::
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index e6f4faec7c..28133aec68 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -228,6 +228,7 @@ static void insert_records_from_trailers(struct shortlog *log,
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
 	struct strset dups = STRSET_INIT;
+	struct strbuf ident = STRBUF_INIT;
 
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
@@ -245,12 +246,17 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (strcasecmp(iter.key.buf, log->trailer))
 			continue;
 
+		strbuf_reset(&ident);
+		if (!parse_ident(log, &ident, value))
+			value = ident.buf;
+
 		if (strset_check_and_add(&dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
+	strbuf_release(&ident);
 	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 83dbbc44e8..a62ee9ed55 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -230,10 +230,30 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	     2	C O Mitter <committer@example.com>
 	     1	SOB One <sob@example.com>
 	EOF
+	git shortlog -nse --group=trailer:signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'trailer idents are split' '
+	cat >expect <<-\EOF &&
+	     2	C O Mitter
+	     1	SOB One
+	EOF
 	git shortlog -ns --group=trailer:signed-off-by HEAD >actual &&
 	test_cmp expect actual
 '
 
+test_expect_success 'trailer idents are mailmapped' '
+	cat >expect <<-\EOF &&
+	     2	C O Mitter
+	     1	Another Name
+	EOF
+	echo "Another Name <sob@example.com>" >mail.map &&
+	git -c mailmap.file=mail.map shortlog -ns \
+		--group=trailer:signed-off-by HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'shortlog de-duplicates trailers in a single commit' '
 	git commit --allow-empty -F - <<-\EOF &&
 	subject one
-- 
2.28.0.1127.ga65787d918


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 8/8] shortlog: allow multiple groups to be specified
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
                     ` (6 preceding siblings ...)
  2020-09-27  8:40   ` [PATCH v2 7/8] shortlog: parse trailer idents Jeff King
@ 2020-09-27  8:40   ` Jeff King
  2020-09-27 21:18     ` Junio C Hamano
  2020-12-28 11:29     ` Junio C Hamano
  2020-09-27 14:38   ` [PATCH v2 0/8] parsing trailers with shortlog Martin Ågren
  8 siblings, 2 replies; 44+ messages in thread
From: Jeff King @ 2020-09-27  8:40 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Martin Ågren

Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.

This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:

  git shortlog -ns --group=author --group=trailer:co-authored-by

to get a shortlog that counts authors and co-authors equally.

The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.

The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).

There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  7 ++++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 64 ++++++++++++++++++-----------
 shortlog.h                     | 10 ++---
 t/t4201-shortlog.sh            | 74 ++++++++++++++++++++++++++++++++++
 5 files changed, 127 insertions(+), 29 deletions(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 3db0db2da0..fd93cd41e9 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -51,6 +51,7 @@ OPTIONS
 	Group commits based on `<type>`. If no `--group` option is
 	specified, the default is `author`. `<type>` is one of:
 +
+--
  - `author`, commits are grouped by author
  - `committer`, commits are grouped by committer (the same as `-c`)
  - `trailer:<field>`, the `<field>` is interpreted as a case-insensitive
@@ -68,6 +69,12 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
 identity. If successful, the mailmap is applied and the email is omitted
 unless the `--email` option is specified. If the value cannot be parsed
 as an identity, it will be taken literally and completely.
+--
++
+If `--group` is specified multiple times, commits are counted under each
+value (but again, only once per unique value in that commit). For
+example, `git shortlog --group=author --group=trailer:co-authored-by`
+counts both authors and co-authors.
 
 -c::
 --committer::
diff --git a/builtin/log.c b/builtin/log.c
index b8824d898f..7f27e9eca1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1195,6 +1195,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	log.in1 = 2;
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
+	log.groups = SHORTLOG_GROUP_AUTHOR;
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 28133aec68..0a5c4968f6 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -130,7 +130,10 @@ static void read_from_stdin(struct shortlog *log)
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
-	switch (log->group) {
+	if (HAS_MULTI_BITS(log->groups))
+		die(_("using multiple --group options with stdin is not supported"));
+
+	switch (log->groups) {
 	case SHORTLOG_GROUP_AUTHOR:
 		match = author_match;
 		break;
@@ -221,13 +224,13 @@ static void strset_clear(struct strset *ss)
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
+					 struct strset *dups,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
-	struct strset dups = STRSET_INIT;
 	struct strbuf ident = STRBUF_INIT;
 
 	/*
@@ -243,28 +246,28 @@ static void insert_records_from_trailers(struct shortlog *log,
 	while (trailer_iterator_advance(&iter)) {
 		const char *value = iter.val.buf;
 
-		if (strcasecmp(iter.key.buf, log->trailer))
+		if (!string_list_has_string(&log->trailers, iter.key.buf))
 			continue;
 
 		strbuf_reset(&ident);
 		if (!parse_ident(log, &ident, value))
 			value = ident.buf;
 
-		if (strset_check_and_add(&dups, value))
+		if (strset_check_and_add(dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
 	strbuf_release(&ident);
-	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
+	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
 	const char *oneline_str;
 
@@ -282,24 +285,29 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	switch (log->group) {
-	case SHORTLOG_GROUP_AUTHOR:
+	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%aN <%aE>" : "%aN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_COMMITTER:
+		if (!HAS_MULTI_BITS(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%cN <%cE>" : "%cN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_TRAILER:
-		insert_records_from_trailers(log, commit, &ctx, oneline_str);
-		break;
+		if (!HAS_MULTI_BITS(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_TRAILER) {
+		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 
+	strset_clear(&dups);
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
@@ -366,14 +374,16 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns
 	struct shortlog *log = opt->value;
 	const char *field;
 
-	if (unset || !strcasecmp(arg, "author"))
-		log->group = SHORTLOG_GROUP_AUTHOR;
+	if (unset) {
+		log->groups = 0;
+		string_list_clear(&log->trailers, 0);
+	} else if (!strcasecmp(arg, "author"))
+		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
-		log->group = SHORTLOG_GROUP_COMMITTER;
+		log->groups |= SHORTLOG_GROUP_COMMITTER;
 	else if (skip_prefix(arg, "trailer:", &field)) {
-		log->group = SHORTLOG_GROUP_TRAILER;
-		free(log->trailer);
-		log->trailer = xstrdup(field);
+		log->groups |= SHORTLOG_GROUP_TRAILER;
+		string_list_append(&log->trailers, field);
 	} else
 		return error(_("unknown group type: %s"), arg);
 
@@ -391,6 +401,8 @@ void shortlog_init(struct shortlog *log)
 	log->wrap = DEFAULT_WRAPLEN;
 	log->in1 = DEFAULT_INDENT1;
 	log->in2 = DEFAULT_INDENT2;
+	log->trailers.strdup_strings = 1;
+	log->trailers.cmp = strcasecmp;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -400,9 +412,9 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 
 	const struct option options[] = {
-		OPT_SET_INT('c', "committer", &log.group,
-			    N_("Group by committer rather than author"),
-			    SHORTLOG_GROUP_COMMITTER),
+		OPT_BIT('c', "committer", &log.groups,
+			N_("Group by committer rather than author"),
+			SHORTLOG_GROUP_COMMITTER),
 		OPT_BOOL('n', "numbered", &log.sort_by_number,
 			 N_("sort output according to the number of commits per author")),
 		OPT_BOOL('s', "summary", &log.summary,
@@ -454,6 +466,10 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
 
+	if (!log.groups)
+		log.groups = SHORTLOG_GROUP_AUTHOR;
+	string_list_sort(&log.trailers);
+
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
 		add_head_to_pending(&rev);
diff --git a/shortlog.h b/shortlog.h
index 89c2dbc5e6..64be879b24 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -17,11 +17,11 @@ struct shortlog {
 	int abbrev;
 
 	enum {
-		SHORTLOG_GROUP_AUTHOR = 0,
-		SHORTLOG_GROUP_COMMITTER,
-		SHORTLOG_GROUP_TRAILER,
-	} group;
-	char *trailer;
+		SHORTLOG_GROUP_AUTHOR = (1 << 0),
+		SHORTLOG_GROUP_COMMITTER = (1 << 1),
+		SHORTLOG_GROUP_TRAILER = (1 << 2),
+	} groups;
+	struct string_list trailers;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index a62ee9ed55..3d5c4a2086 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -282,4 +282,78 @@ test_expect_success 'shortlog de-duplicates trailers in a single commit' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple groups' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this has two trailers that are distinct from the author; it will count
+	3 times in the output
+
+	Some-trailer: User A <a@example.com>
+	Another-trailer: User B <b@example.com>
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	this one has two trailers, one of which is a duplicate with the author;
+	it will only be counted once for them
+
+	Another-trailer: A U Thor <author@example.com>
+	Some-trailer: User B <b@example.com>
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     2	User B
+	     1	User A
+	EOF
+	git shortlog -ns \
+		--group=author \
+		--group=trailer:some-trailer \
+		--group=trailer:another-trailer \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up option selection tests' '
+	git commit --allow-empty -F - <<-\EOF
+	subject
+
+	body
+
+	Trailer-one: value-one
+	Trailer-two: value-two
+	EOF
+'
+
+test_expect_success '--no-group resets group list to author' '
+	cat >expect <<-\EOF &&
+	     1	A U Thor
+	EOF
+	git shortlog -ns \
+		--group=committer \
+		--group=trailer:trailer-one \
+		--no-group \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-group resets trailer list' '
+	cat >expect <<-\EOF &&
+	     1	value-two
+	EOF
+	git shortlog -ns \
+		--group=trailer:trailer-one \
+		--no-group \
+		--group=trailer:trailer-two \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin with multiple groups reports error' '
+	git log >log &&
+	test_must_fail git shortlog --group=author --group=committer <log
+'
+
 test_done
-- 
2.28.0.1127.ga65787d918

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/8] parsing trailers with shortlog
  2020-09-27  8:39 ` [PATCH v2 " Jeff King
                     ` (7 preceding siblings ...)
  2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
@ 2020-09-27 14:38   ` Martin Ågren
  8 siblings, 0 replies; 44+ messages in thread
From: Martin Ågren @ 2020-09-27 14:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Eric Sunshine

On Sun, 27 Sep 2020 at 10:39, Jeff King <peff@peff.net> wrote:
>
> Here's a re-roll with some cosmetic fixups. All except the first address
> the points raised by Eric and Martin (thanks, both):
>
>   - put the --group=<type> unordered list into a delimited block, so
>     that the follow-on paragraph about multiple --group options isn't
>     indented with "trailer" item

Yep, that makes sense. The result looks good.

This v2 addresses my comments on v1 and, FWIW, looks good to me.

Martin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/8] shortlog: change "author" variables to "ident"
  2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
@ 2020-09-27 19:18     ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-09-27 19:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> We already match "committer", and we're about to start
> matching more things. Let's use a more neutral variable to
> avoid confusion.

Looks good at least when seen alone.

It is curious how we are going to handle less structured nature of
trailers.  The author and committer header fields are required to be
in good shape, but it is perfectly fine for *-by lines to be just
human-readable strings (e.g. without <address@looking.tokens>).

We'll see.

Thanks.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/shortlog.c | 36 ++++++++++++++++++------------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index c856c58bb5..edcf2e0d54 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -49,12 +49,12 @@ static int compare_by_list(const void *a1, const void *a2)
>  }
>  
>  static void insert_one_record(struct shortlog *log,
> -			      const char *author,
> +			      const char *ident,
>  			      const char *oneline)
>  {
>  	struct string_list_item *item;
>  
> -	item = string_list_insert(&log->list, author);
> +	item = string_list_insert(&log->list, ident);
>  
>  	if (log->summary)
>  		item->util = (void *)(UTIL_TO_INT(item) + 1);
> @@ -97,8 +97,8 @@ static void insert_one_record(struct shortlog *log,
>  	}
>  }
>  
> -static int parse_stdin_author(struct shortlog *log,
> -			       struct strbuf *out, const char *in)
> +static int parse_stdin_ident(struct shortlog *log,
> +			     struct strbuf *out, const char *in)
>  {
>  	const char *mailbuf, *namebuf;
>  	size_t namelen, maillen;
> @@ -122,18 +122,18 @@ static int parse_stdin_author(struct shortlog *log,
>  
>  static void read_from_stdin(struct shortlog *log)
>  {
> -	struct strbuf author = STRBUF_INIT;
> -	struct strbuf mapped_author = STRBUF_INIT;
> +	struct strbuf ident = STRBUF_INIT;
> +	struct strbuf mapped_ident = STRBUF_INIT;
>  	struct strbuf oneline = STRBUF_INIT;
>  	static const char *author_match[2] = { "Author: ", "author " };
>  	static const char *committer_match[2] = { "Commit: ", "committer " };
>  	const char **match;
>  
>  	match = log->committer ? committer_match : author_match;
> -	while (strbuf_getline_lf(&author, stdin) != EOF) {
> +	while (strbuf_getline_lf(&ident, stdin) != EOF) {
>  		const char *v;
> -		if (!skip_prefix(author.buf, match[0], &v) &&
> -		    !skip_prefix(author.buf, match[1], &v))
> +		if (!skip_prefix(ident.buf, match[0], &v) &&
> +		    !skip_prefix(ident.buf, match[1], &v))
>  			continue;
>  		while (strbuf_getline_lf(&oneline, stdin) != EOF &&
>  		       oneline.len)
> @@ -142,20 +142,20 @@ static void read_from_stdin(struct shortlog *log)
>  		       !oneline.len)
>  			; /* discard blanks */
>  
> -		strbuf_reset(&mapped_author);
> -		if (parse_stdin_author(log, &mapped_author, v) < 0)
> +		strbuf_reset(&mapped_ident);
> +		if (parse_stdin_ident(log, &mapped_ident, v) < 0)
>  			continue;
>  
> -		insert_one_record(log, mapped_author.buf, oneline.buf);
> +		insert_one_record(log, mapped_ident.buf, oneline.buf);
>  	}
> -	strbuf_release(&author);
> -	strbuf_release(&mapped_author);
> +	strbuf_release(&ident);
> +	strbuf_release(&mapped_ident);
>  	strbuf_release(&oneline);
>  }
>  
>  void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  {
> -	struct strbuf author = STRBUF_INIT;
> +	struct strbuf ident = STRBUF_INIT;
>  	struct strbuf oneline = STRBUF_INIT;
>  	struct pretty_print_context ctx = {0};
>  	const char *fmt;
> @@ -170,17 +170,17 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  		(log->email ? "%cN <%cE>" : "%cN") :
>  		(log->email ? "%aN <%aE>" : "%aN");
>  
> -	format_commit_message(commit, fmt, &author, &ctx);
> +	format_commit_message(commit, fmt, &ident, &ctx);
>  	if (!log->summary) {
>  		if (log->user_format)
>  			pretty_print_commit(&ctx, commit, &oneline);
>  		else
>  			format_commit_message(commit, "%s", &oneline, &ctx);
>  	}
>  
> -	insert_one_record(log, author.buf, oneline.len ? oneline.buf : "<none>");
> +	insert_one_record(log, ident.buf, oneline.len ? oneline.buf : "<none>");
>  
> -	strbuf_release(&author);
> +	strbuf_release(&ident);
>  	strbuf_release(&oneline);
>  }

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/8] shortlog: match commit trailers with --group
  2020-09-27  8:40   ` [PATCH v2 4/8] shortlog: match commit trailers with --group Jeff King
@ 2020-09-27 19:51     ` Junio C Hamano
  2020-09-28  3:17       ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2020-09-27 19:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> If a project uses commit trailers, this patch lets you use
> shortlog to see who is performing each action. For example,
> running:
>
>   git shortlog -ns --group=trailer:reviewed-by
>
> in git.git shows who has reviewed. You can even use a custom
> format to see things like who has helped whom:
>
>   git shortlog --format="...helped %an (%ad)" \
>                --group=trailer:helped-by

That's a cute example.

> +Note that commits that do not include the trailer will not be counted.

Understandable.

> +Likewise, commits with multiple trailers (e.g., multiple signoffs) may
> +be counted more than once.

Solicits a "is it desirable, or is it just too hard to dedupe?" response.

> +The contents of each trailer value are taken literally and completely.
> +No mailmap is applied, and the `-e` option has no effect (if the trailer
> +contains a username and email, they are both always shown).

OK.  Some users may find that not quite satisfying, though.

But I have a suspicion that the above will be refined in later
steps?  It would have been nicer to see that mentioned in the
proposed log message (e.g. "this step gives the minimum basics and
rough edges like X and Y will be refined with later patches").

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] shortlog: de-duplicate trailer values
  2020-09-27  8:40   ` [PATCH v2 5/8] shortlog: de-duplicate trailer values Jeff King
@ 2020-09-27 20:23     ` Junio C Hamano
  2020-09-28  3:19       ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2020-09-27 20:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> The current documentation is vague about what happens with
> --group=trailer:signed-off-by when we see a commit with:
>
>   Signed-off-by: One
>   Signed-off-by: Two
>   Signed-off-by: One
>
> We clearly should credit both "One" and "Two", but should "One" get
> credited twice? The current code does so, but mostly because that was
> the easiest thing to do.

I thought that "the current documentation" as of step 4/8 were quite
clear about double counting ;-).

> It's probably more useful to count each commit
> at most once. This will become especially important when we allow
> values from multiple sources in a future patch.

Yup.  Makes readers want to read further ;-)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/8] shortlog: parse trailer idents
  2020-09-27  8:40   ` [PATCH v2 7/8] shortlog: parse trailer idents Jeff King
@ 2020-09-27 20:49     ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-09-27 20:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> +Shortlog will attempt to parse each trailer value as a `name <email>`
> +identity. If successful, the mailmap is applied and the email is omitted
> +unless the `--email` option is specified. If the value cannot be parsed
> +as an identity, it will be taken literally and completely.

Makes sense.

> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index e6f4faec7c..28133aec68 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -228,6 +228,7 @@ static void insert_records_from_trailers(struct shortlog *log,
>  	struct trailer_iterator iter;
>  	const char *commit_buffer, *body;
>  	struct strset dups = STRSET_INIT;
> +	struct strbuf ident = STRBUF_INIT;
>  
>  	/*
>  	 * Using format_commit_message("%B") would be simpler here, but
> @@ -245,12 +246,17 @@ static void insert_records_from_trailers(struct shortlog *log,
>  		if (strcasecmp(iter.key.buf, log->trailer))
>  			continue;
>  
> +		strbuf_reset(&ident);
> +		if (!parse_ident(log, &ident, value))
> +			value = ident.buf;

And the implementation is quite straight-forward.

>  		if (strset_check_and_add(&dups, value))
>  			continue;
>  		insert_one_record(log, value, oneline);
>  	}
>  	trailer_iterator_release(&iter);
>  
> +	strbuf_release(&ident);
>  	strset_clear(&dups);
>  	unuse_commit_buffer(commit, commit_buffer);
>  }

Looking good.  Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 8/8] shortlog: allow multiple groups to be specified
  2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
@ 2020-09-27 21:18     ` Junio C Hamano
  2020-09-28  3:25       ` Jeff King
  2020-12-28 11:29     ` Junio C Hamano
  1 sibling, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2020-09-27 21:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> -		if (strcasecmp(iter.key.buf, log->trailer))
> +		if (!string_list_has_string(&log->trailers, iter.key.buf))
>  			continue;
...
> +	if (!log.groups)
> +		log.groups = SHORTLOG_GROUP_AUTHOR;
> +	string_list_sort(&log.trailers);

I initially reacted with "oh, why sort?" to this line, before
realizing that the list is used as a look-up table, which, if I
recall correctly, you said you want to see us move off of in the
longer term.  

As we already have the string-set in this series, I am wondering
why we are not using it.  It's not like the code at some point
needs to iterate over log.trailers in some stable order, right?

I do realize that going to hashmap might be overkill, but once we
have an easy-to-use wrapper around it, between one "table of
strings" API and another "table of strings" API, I do not see a
reason why we want to choose the string_list.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/8] shortlog: match commit trailers with --group
  2020-09-27 19:51     ` Junio C Hamano
@ 2020-09-28  3:17       ` Jeff King
  2020-09-28 17:01         ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2020-09-28  3:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Martin Ågren

On Sun, Sep 27, 2020 at 12:51:43PM -0700, Junio C Hamano wrote:

> > +The contents of each trailer value are taken literally and completely.
> > +No mailmap is applied, and the `-e` option has no effect (if the trailer
> > +contains a username and email, they are both always shown).
> 
> OK.  Some users may find that not quite satisfying, though.
> 
> But I have a suspicion that the above will be refined in later
> steps?  It would have been nicer to see that mentioned in the
> proposed log message (e.g. "this step gives the minimum basics and
> rough edges like X and Y will be refined with later patches").

I wondered if this might confuse people reading the series, and almost
called attention to it in the cover letter. Now that you've presumably
read through and figured it out, is it worth going back and amending the
commit message? It's more of a point for reviewers, I think, but perhaps
somebody reading the commits later would care.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/8] shortlog: de-duplicate trailer values
  2020-09-27 20:23     ` Junio C Hamano
@ 2020-09-28  3:19       ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-28  3:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Martin Ågren

On Sun, Sep 27, 2020 at 01:23:25PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The current documentation is vague about what happens with
> > --group=trailer:signed-off-by when we see a commit with:
> >
> >   Signed-off-by: One
> >   Signed-off-by: Two
> >   Signed-off-by: One
> >
> > We clearly should credit both "One" and "Two", but should "One" get
> > credited twice? The current code does so, but mostly because that was
> > the easiest thing to do.
> 
> I thought that "the current documentation" as of step 4/8 were quite
> clear about double counting ;-).

It was clear to me that the commit would be counted at least twice: once
for the "one" bucket and once for the "two" bucket, but not that it
would count twice for the "two" bucket. Or at least, I didn't think of
this issue at all back when I wrote the original patch long ago, and
only noticed the problem when I revisited it. :)

Anyway, I think the semantics at the end of the series are quite
sensible.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 8/8] shortlog: allow multiple groups to be specified
  2020-09-27 21:18     ` Junio C Hamano
@ 2020-09-28  3:25       ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2020-09-28  3:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Martin Ågren

On Sun, Sep 27, 2020 at 02:18:10PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > -		if (strcasecmp(iter.key.buf, log->trailer))
> > +		if (!string_list_has_string(&log->trailers, iter.key.buf))
> >  			continue;
> ...
> > +	if (!log.groups)
> > +		log.groups = SHORTLOG_GROUP_AUTHOR;
> > +	string_list_sort(&log.trailers);
> 
> I initially reacted with "oh, why sort?" to this line, before
> realizing that the list is used as a look-up table, which, if I
> recall correctly, you said you want to see us move off of in the
> longer term.
> 
> As we already have the string-set in this series, I am wondering
> why we are not using it.  It's not like the code at some point
> needs to iterate over log.trailers in some stable order, right?

Right, it's purely a lookup, and the new strset or any hashmap would
work.  However, the trailers list is part of the public shortlog.h[1]
interface, so we'd have to move strset to be a public thing, too. I
think that may be where we want to go eventually, but I thought I'd
trial it as something strictly internal to shortlog.c for now.

I'm also not _too_ bothered by this particular use of string-list as a
lookup table. It's one of the "easy" cases: we build up the list in one
phase, and then do lookups in the other, so sorting in between is no big
deal (though I do think a strset is even easier, as you do not have to
remember to sort between the phases).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/8] shortlog: match commit trailers with --group
  2020-09-28  3:17       ` Jeff King
@ 2020-09-28 17:01         ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-09-28 17:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> I wondered if this might confuse people reading the series, and almost
> called attention to it in the cover letter. Now that you've presumably
> read through and figured it out, is it worth going back and amending the
> commit message? It's more of a point for reviewers, I think, but perhaps
> somebody reading the commits later would care.

When I read history older than the immediate past, I usuall start at
the point 'git blame' found and then go back---going forward is much
less natural operation.  If my "git blame" session happens to hit an
earlier part of this series, perhaps the future-me would get a wrong
impression that originally the feature was in a limited form and was
extended later in a separate series?  I dunno.

I did wish the log message said what was going on while reading, but
that is a perfectionist in me speaking, and not a complaint that log
messages that did not say the future plans in the series were
offense that deserves a rejection.  A good summary in "What's cooking"
report would become a merge commit log message, and we can teach
users to go from 'git blame', find where the entire series was merged
to the mainline and then read from there---and it would be sufficient
if the summary of the topic described the endgame well.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 8/8] shortlog: allow multiple groups to be specified
  2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
  2020-09-27 21:18     ` Junio C Hamano
@ 2020-12-28 11:29     ` Junio C Hamano
  2021-02-04  6:44       ` Junio C Hamano
  1 sibling, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2020-12-28 11:29 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Martin Ågren

Jeff King <peff@peff.net> writes:

> Now that shortlog supports reading from trailers, it can be useful to
> combine counts from multiple trailers, or between trailers and authors.
> This can be done manually by post-processing the output from multiple
> runs, but it's non-trivial to make sure that each name/commit pair is
> counted only once.
>
> This patch teaches shortlog to accept multiple --group options on the
> command line, and pull data from all of them. That makes it possible to
> run:
>
>   git shortlog -ns --group=author --group=trailer:co-authored-by
>
> to get a shortlog that counts authors and co-authors equally.

As I recently had a chance to revisit an old thread back in v2.3.3
days, I recalled that I wished something like this was available,

  https://lore.kernel.org/git/xmqqd24g6uf1.fsf@gitster.dls.corp.google.com/

where I wished to have a way to count non-author contributions
easily to list them.

I am not comfortable with the idea of changing the release
announcement script immediately before the new release, so I'll do
v2.30.0 with the old "authorship only" script, but I'll play with
the new shortlog feature plus ideas from the old thread to see if we
can give more credit to non author contributors.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 8/8] shortlog: allow multiple groups to be specified
  2020-12-28 11:29     ` Junio C Hamano
@ 2021-02-04  6:44       ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2021-02-04  6:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Eric Sunshine, Martin Ågren

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> Now that shortlog supports reading from trailers, it can be useful to
>> combine counts from multiple trailers, or between trailers and authors.
>> This can be done manually by post-processing the output from multiple
>> runs, but it's non-trivial to make sure that each name/commit pair is
>> counted only once.
>>
>> This patch teaches shortlog to accept multiple --group options on the
>> command line, and pull data from all of them. That makes it possible to
>> run:
>>
>>   git shortlog -ns --group=author --group=trailer:co-authored-by
>>
>> to get a shortlog that counts authors and co-authors equally.
>
> As I recently had a chance to revisit an old thread back in v2.3.3
> days, I recalled that I wished something like this was available,
>
>   https://lore.kernel.org/git/xmqqd24g6uf1.fsf@gitster.dls.corp.google.com/
>
> where I wished to have a way to count non-author contributions
> easily to list them.
>
> I am not comfortable with the idea of changing the release
> announcement script immediately before the new release, so I'll do
> v2.30.0 with the old "authorship only" script, but I'll play with
> the new shortlog feature plus ideas from the old thread to see if we
> can give more credit to non author contributors.

But now it is a good time to do so, as it is not immediately before
a new release.

Here is what I am planning to use for the next release due in mid
March.  We list those who had contribution in the part of the
history before the last release and in the part of the history since
the last release, and run "comm" on these two sets to derive "new
folks" vs "continued supporters".  The list of contributors were
only counting the commit authors in the previous announcement, but
with the "shortlog --group" feature, we can give folks credit for
all kinds of *-by roles.

diff --git a/Announce b/Announce
index 491516b..d6ec20c 100755
--- a/Announce
+++ b/Announce
@@ -44,8 +44,22 @@ esac
 
 vername=$(echo "$vername" | tr "-" ".")
 
-git log --use-mailmap --format='%aN,' "$previous" | sort -u >"$tmpbase.prev"
-git log --use-mailmap --format='%aN,' "$previous..$commit" | sort -u >"$tmpbase.this"
+people () {
+	git shortlog -s --no-merges \
+		--group=author \
+		--group=trailer:co-authored-by \
+		--group=trailer:reviewed-by \
+		--group=trailer:mentored-by \
+		--group=trailer:helped-by \
+		--group=trailer:reported-by \
+		"$@" |
+	sed -e 's/^[ 	0-9]*[ 	]//' -e 's/$/,/' |
+	sort -u
+}
+
+people "$previous" >"$tmpbase.prev"
+people "$previous..$commit" >"$tmpbase.this"
+
 comm -12 "$tmpbase.prev" "$tmpbase.this" >"$tmpbase.old"
 comm -13 "$tmpbase.prev" "$tmpbase.this" >"$tmpbase.new"
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-02-04  6:46 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
2020-09-25  7:01 ` [PATCH 1/8] shortlog: change "author" variables to "ident" Jeff King
2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
2020-09-25 20:05   ` Eric Sunshine
2020-09-27  8:03     ` Jeff King
2020-09-27  8:08       ` Jeff King
2020-09-27  8:23         ` Eric Sunshine
2020-09-26 12:31   ` Martin Ågren
2020-09-27  7:59     ` Jeff King
2020-09-25  7:02 ` [PATCH 3/8] trailer: add interface for iterating over commit trailers Jeff King
2020-09-26 12:39   ` Martin Ågren
2020-09-27  8:20     ` Jeff King
2020-09-25  7:03 ` [PATCH 4/8] shortlog: match commit trailers with --group Jeff King
2020-09-25  7:05 ` [PATCH 5/8] shortlog: de-duplicate trailer values Jeff King
2020-09-25  7:05 ` [PATCH 6/8] shortlog: rename parse_stdin_ident() Jeff King
2020-09-25  7:05 ` [PATCH 7/8] shortlog: parse trailer idents Jeff King
2020-09-25  7:05 ` [PATCH 8/8] shortlog: allow multiple groups to be specified Jeff King
2020-09-25 20:23   ` Eric Sunshine
2020-09-27  8:06     ` Jeff King
2020-09-26 12:48   ` Martin Ågren
2020-09-27  8:25     ` Jeff King
2020-09-25 14:27 ` [PATCH 0/8] parsing trailers with shortlog Derrick Stolee
2020-09-25 16:57 ` Junio C Hamano
2020-09-27  8:39 ` [PATCH v2 " Jeff King
2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
2020-09-27 19:18     ` Junio C Hamano
2020-09-27  8:39   ` [PATCH v2 2/8] shortlog: add grouping option Jeff King
2020-09-27  8:40   ` [PATCH v2 3/8] trailer: add interface for iterating over commit trailers Jeff King
2020-09-27  8:40   ` [PATCH v2 4/8] shortlog: match commit trailers with --group Jeff King
2020-09-27 19:51     ` Junio C Hamano
2020-09-28  3:17       ` Jeff King
2020-09-28 17:01         ` Junio C Hamano
2020-09-27  8:40   ` [PATCH v2 5/8] shortlog: de-duplicate trailer values Jeff King
2020-09-27 20:23     ` Junio C Hamano
2020-09-28  3:19       ` Jeff King
2020-09-27  8:40   ` [PATCH v2 6/8] shortlog: rename parse_stdin_ident() Jeff King
2020-09-27  8:40   ` [PATCH v2 7/8] shortlog: parse trailer idents Jeff King
2020-09-27 20:49     ` Junio C Hamano
2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
2020-09-27 21:18     ` Junio C Hamano
2020-09-28  3:25       ` Jeff King
2020-12-28 11:29     ` Junio C Hamano
2021-02-04  6:44       ` Junio C Hamano
2020-09-27 14:38   ` [PATCH v2 0/8] parsing trailers with shortlog Martin Ågren

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git