git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Eric Sunshine" <sunshine@sunshineco.com>,
	"Martin Ågren" <martin.agren@gmail.com>
Subject: [PATCH v2 5/8] shortlog: de-duplicate trailer values
Date: Sun, 27 Sep 2020 04:40:07 -0400	[thread overview]
Message-ID: <20200927084007.GE2465761@coredump.intra.peff.net> (raw)
In-Reply-To: <20200927083933.GA2222823@coredump.intra.peff.net>

The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:

  Signed-off-by: One
  Signed-off-by: Two
  Signed-off-by: One

We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  3 +-
 builtin/shortlog.c             | 58 ++++++++++++++++++++++++++++++++++
 t/t4201-shortlog.sh            | 28 ++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index edd6cda58a..9e94613e13 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -61,7 +61,8 @@ OPTIONS
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
-be counted more than once.
+be counted more than once (but only once per unique trailer value in
+that commit).
 +
 The contents of each trailer value are taken literally and completely.
 No mailmap is applied, and the `-e` option has no effect (if the trailer
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index e1d9ee909f..d2d8103dd3 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -166,13 +166,68 @@ static void read_from_stdin(struct shortlog *log)
 	strbuf_release(&oneline);
 }
 
+struct strset_item {
+	struct hashmap_entry ent;
+	char value[FLEX_ARRAY];
+};
+
+struct strset {
+	struct hashmap map;
+};
+
+#define STRSET_INIT { { NULL } }
+
+static int strset_item_hashcmp(const void *hash_data,
+			       const struct hashmap_entry *entry,
+			       const struct hashmap_entry *entry_or_key,
+			       const void *keydata)
+{
+	const struct strset_item *a, *b;
+
+	a = container_of(entry, const struct strset_item, ent);
+	if (keydata)
+		return strcmp(a->value, keydata);
+
+	b = container_of(entry_or_key, const struct strset_item, ent);
+	return strcmp(a->value, b->value);
+}
+
+/*
+ * Adds "str" to the set if it was not already present; returns true if it was
+ * already there.
+ */
+static int strset_check_and_add(struct strset *ss, const char *str)
+{
+	unsigned int hash = strhash(str);
+	struct strset_item *item;
+
+	if (!ss->map.table)
+		hashmap_init(&ss->map, strset_item_hashcmp, NULL, 0);
+
+	if (hashmap_get_from_hash(&ss->map, hash, str))
+		return 1;
+
+	FLEX_ALLOC_STR(item, value, str);
+	hashmap_entry_init(&item->ent, hash);
+	hashmap_add(&ss->map, &item->ent);
+	return 0;
+}
+
+static void strset_clear(struct strset *ss)
+{
+	if (!ss->map.table)
+		return;
+	hashmap_free_entries(&ss->map, struct strset_item, ent);
+}
+
 static void insert_records_from_trailers(struct shortlog *log,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
+	struct strset dups = STRSET_INIT;
 
 	/*
 	 * Using format_commit_message("%B") would be simpler here, but
@@ -190,10 +245,13 @@ static void insert_records_from_trailers(struct shortlog *log,
 		if (strcasecmp(iter.key.buf, log->trailer))
 			continue;
 
+		if (strset_check_and_add(&dups, value))
+			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
+	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index e97d891a71..83dbbc44e8 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -234,4 +234,32 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog de-duplicates trailers in a single commit' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this message has two distinct values, plus a repeat
+
+	Repeated-trailer: Foo
+	Repeated-trailer: Bar
+	Repeated-trailer: Foo
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	similar to the previous, but without the second distinct value
+
+	Repeated-trailer: Foo
+	Repeated-trailer: Foo
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	Foo
+	     1	Bar
+	EOF
+	git shortlog -ns --group=trailer:repeated-trailer -2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.1127.ga65787d918


  parent reply	other threads:[~2020-09-27  8:40 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25  7:01 [PATCH 0/8] parsing trailers with shortlog Jeff King
2020-09-25  7:01 ` [PATCH 1/8] shortlog: change "author" variables to "ident" Jeff King
2020-09-25  7:02 ` [PATCH 2/8] shortlog: refactor committer/author grouping Jeff King
2020-09-25 20:05   ` Eric Sunshine
2020-09-27  8:03     ` Jeff King
2020-09-27  8:08       ` Jeff King
2020-09-27  8:23         ` Eric Sunshine
2020-09-26 12:31   ` Martin Ågren
2020-09-27  7:59     ` Jeff King
2020-09-25  7:02 ` [PATCH 3/8] trailer: add interface for iterating over commit trailers Jeff King
2020-09-26 12:39   ` Martin Ågren
2020-09-27  8:20     ` Jeff King
2020-09-25  7:03 ` [PATCH 4/8] shortlog: match commit trailers with --group Jeff King
2020-09-25  7:05 ` [PATCH 5/8] shortlog: de-duplicate trailer values Jeff King
2020-09-25  7:05 ` [PATCH 6/8] shortlog: rename parse_stdin_ident() Jeff King
2020-09-25  7:05 ` [PATCH 7/8] shortlog: parse trailer idents Jeff King
2020-09-25  7:05 ` [PATCH 8/8] shortlog: allow multiple groups to be specified Jeff King
2020-09-25 20:23   ` Eric Sunshine
2020-09-27  8:06     ` Jeff King
2020-09-26 12:48   ` Martin Ågren
2020-09-27  8:25     ` Jeff King
2020-09-25 14:27 ` [PATCH 0/8] parsing trailers with shortlog Derrick Stolee
2020-09-25 16:57 ` Junio C Hamano
2020-09-27  8:39 ` [PATCH v2 " Jeff King
2020-09-27  8:39   ` [PATCH v2 1/8] shortlog: change "author" variables to "ident" Jeff King
2020-09-27 19:18     ` Junio C Hamano
2020-09-27  8:39   ` [PATCH v2 2/8] shortlog: add grouping option Jeff King
2020-09-27  8:40   ` [PATCH v2 3/8] trailer: add interface for iterating over commit trailers Jeff King
2020-09-27  8:40   ` [PATCH v2 4/8] shortlog: match commit trailers with --group Jeff King
2020-09-27 19:51     ` Junio C Hamano
2020-09-28  3:17       ` Jeff King
2020-09-28 17:01         ` Junio C Hamano
2020-09-27  8:40   ` Jeff King [this message]
2020-09-27 20:23     ` [PATCH v2 5/8] shortlog: de-duplicate trailer values Junio C Hamano
2020-09-28  3:19       ` Jeff King
2020-09-27  8:40   ` [PATCH v2 6/8] shortlog: rename parse_stdin_ident() Jeff King
2020-09-27  8:40   ` [PATCH v2 7/8] shortlog: parse trailer idents Jeff King
2020-09-27 20:49     ` Junio C Hamano
2020-09-27  8:40   ` [PATCH v2 8/8] shortlog: allow multiple groups to be specified Jeff King
2020-09-27 21:18     ` Junio C Hamano
2020-09-28  3:25       ` Jeff King
2020-12-28 11:29     ` Junio C Hamano
2021-02-04  6:44       ` Junio C Hamano
2020-09-27 14:38   ` [PATCH v2 0/8] parsing trailers with shortlog Martin Ågren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200927084007.GE2465761@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=martin.agren@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).