git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] Add support for mailmap in cat-file
@ 2022-06-30 14:24 Siddharth Asthana
  2022-06-30 14:24 ` [PATCH 1/3] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
                   ` (4 more replies)
  0 siblings, 5 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-06-30 14:24 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana

Hi Everyone!

I am working as a GSoC mentee with GitLab organization. My project is
to add mailmap support in GitLab. GitLab uses git cat-file to get
commit and tag contents that are then displayed to the users. This
content which has author, committer or tagger information, could benefit
from passing through the mailmap mechanism, before being sent or
displayed.

So, this patch series adds mailmap support to the git cat-file command.
It does that by adding --[no-]use-mailmap command line option to the
git cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

I would like to thank my mentors, Christian Couder and John Cai, for all
of their help!
Looking forward to the reviews!

Siddharth Asthana (3):
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to rewrite_ident_line()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++++
 builtin/cat-file.c             | 32 +++++++++++++++++++++-
 cache.h                        |  8 ++++++
 ident.c                        | 45 +++++++++++++++++++++++++++++++
 revision.c                     | 49 ++--------------------------------
 t/t4203-mailmap.sh             | 30 +++++++++++++++++++++
 6 files changed, 122 insertions(+), 48 deletions(-)


base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
-- 
2.37.0.3.g2093cce7fe.dirty


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/3] ident: move commit_rewrite_person() to ident.c
  2022-06-30 14:24 [PATCH 0/3] Add support for mailmap in cat-file Siddharth Asthana
@ 2022-06-30 14:24 ` Siddharth Asthana
  2022-06-30 16:00   ` Đoàn Trần Công Danh
  2022-06-30 23:22   ` Junio C Hamano
  2022-06-30 14:24 ` [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line() Siddharth Asthana
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-06-30 14:24 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai

commit_rewrite_person() is a static function defined in revision.c.
As the name suggests, this function can be used to replace author's,
committer's or tagger's name in the commit/tag object buffer.

This patch moves this function from revision.c to ident.c which contains
many other functions related to identification like split_ident_line. By
moving this function to ident.c, we intend to use it in git-cat-file to
replace committer's, author's and tagger's names and emails with their
canonical name and email using the mailmap mechanism. The function
is moved as is for now to make it clear that there are no other changes,
but will be renamed in a following commit.

Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
---
 cache.h    |  8 ++++++++
 ident.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
 revision.c | 45 ---------------------------------------------
 3 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/cache.h b/cache.h
index ac5ab4ef9d..442bfe5f6a 100644
--- a/cache.h
+++ b/cache.h
@@ -1688,6 +1688,14 @@ struct ident_split {
  */
 int split_ident_line(struct ident_split *, const char *, int);
 
+/*
+ * Given a commit or tag object buffer, replaces the person's
+ * (author/committer/tagger) name and email with their canonical
+ * name and email using mailmap mechanism. Signals a success with
+ * 1 and failure with a 0.
+ */
+int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap);
+
 /*
  * Compare split idents for equality or strict ordering. Note that we
  * compare only the ident part of the line, ignoring any timestamp.
diff --git a/ident.c b/ident.c
index 89ca5b4700..8c890bd474 100644
--- a/ident.c
+++ b/ident.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "config.h"
 #include "date.h"
+#include "mailmap.h"
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
@@ -346,6 +347,50 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
+int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+{
+	char *person, *endp;
+	size_t len, namelen, maillen;
+	const char *name;
+	const char *mail;
+	struct ident_split ident;
+
+	person = strstr(buf->buf, what);
+	if (!person)
+		return 0;
+
+	person += strlen(what);
+	endp = strchr(person, '\n');
+	if (!endp)
+		return 0;
+
+	len = endp - person;
+
+	if (split_ident_line(&ident, person, len))
+		return 0;
+
+	mail = ident.mail_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name_begin;
+	namelen = ident.name_end - ident.name_begin;
+
+	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
+		struct strbuf namemail = STRBUF_INIT;
+
+		strbuf_addf(&namemail, "%.*s <%.*s>",
+			    (int)namelen, name, (int)maillen, mail);
+
+		strbuf_splice(buf, ident.name_begin - buf->buf,
+			      ident.mail_end - ident.name_begin + 1,
+			      namemail.buf, namemail.len);
+
+		strbuf_release(&namemail);
+
+		return 1;
+	}
+
+	return 0;
+}
 
 static void ident_env_hint(enum want_ident whose_ident)
 {
diff --git a/revision.c b/revision.c
index 211352795c..da49e73cd6 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,51 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
-{
-	char *person, *endp;
-	size_t len, namelen, maillen;
-	const char *name;
-	const char *mail;
-	struct ident_split ident;
-
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
-	endp = strchr(person, '\n');
-	if (!endp)
-		return 0;
-
-	len = endp - person;
-
-	if (split_ident_line(&ident, person, len))
-		return 0;
-
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-
-	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
-		struct strbuf namemail = STRBUF_INIT;
-
-		strbuf_addf(&namemail, "%.*s <%.*s>",
-			    (int)namelen, name, (int)maillen, mail);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      namemail.buf, namemail.len);
-
-		strbuf_release(&namemail);
-
-		return 1;
-	}
-
-	return 0;
-}
-
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
-- 
2.37.0.3.g2093cce7fe.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line()
  2022-06-30 14:24 [PATCH 0/3] Add support for mailmap in cat-file Siddharth Asthana
  2022-06-30 14:24 ` [PATCH 1/3] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-06-30 14:24 ` Siddharth Asthana
  2022-06-30 15:33   ` Phillip Wood
  2022-06-30 23:31   ` Junio C Hamano
  2022-06-30 14:24 ` [PATCH 3/3] cat-file: add mailmap support Siddharth Asthana
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-06-30 14:24 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai

We will be using commit_rewrite_person() in git-cat-file to rewrite
ident line in commit/tag object buffers.

Following are the reason for renaming commit_rewrite_person():
- the function can be used not only on a commit buffer, but also on a
  tag object buffer, so having "commit" in its name is misleading.
- the function works on the ident line in the commit/tag object buffers,
  just like "split_ident_line()". Since these functions are related they
  should have similar terms for uniformity.

Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
---
 cache.h    | 2 +-
 ident.c    | 2 +-
 revision.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 442bfe5f6a..c8a98d8a80 100644
--- a/cache.h
+++ b/cache.h
@@ -1694,7 +1694,7 @@ int split_ident_line(struct ident_split *, const char *, int);
  * name and email using mailmap mechanism. Signals a success with
  * 1 and failure with a 0.
  */
-int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap);
+int rewrite_ident_line(struct strbuf *buf, const char *what, struct string_list *mailmap);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 8c890bd474..d15f579fd5 100644
--- a/ident.c
+++ b/ident.c
@@ -347,7 +347,7 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
-int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+int rewrite_ident_line(struct strbuf *buf, const char *what, struct string_list *mailmap)
 {
 	char *person, *endp;
 	size_t len, namelen, maillen;
diff --git a/revision.c b/revision.c
index da49e73cd6..0c8243a8e0 100644
--- a/revision.c
+++ b/revision.c
@@ -3790,8 +3790,8 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		rewrite_ident_line(&buf, "\nauthor ", opt->mailmap);
+		rewrite_ident_line(&buf, "\ncommitter ", opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.3.g2093cce7fe.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 14:24 [PATCH 0/3] Add support for mailmap in cat-file Siddharth Asthana
  2022-06-30 14:24 ` [PATCH 1/3] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
  2022-06-30 14:24 ` [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line() Siddharth Asthana
@ 2022-06-30 14:24 ` Siddharth Asthana
  2022-06-30 15:50   ` Phillip Wood
                     ` (2 more replies)
  2022-06-30 21:18 ` [PATCH 0/3] Add support for mailmap in cat-file Junio C Hamano
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
  4 siblings, 3 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-06-30 14:24 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai

git cat-file is not a plumbing command anymore, especially as it gained
more and more high level features like its `--batch-command` mode. So
tools do use it to get commit and tag contents that are then displayed
to users. This content which has author, committer or tagger
information, could benefit from passing through the mailmap mechanism,
before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

At this time, this patch only adds a command line
option, but perhaps a `cat-file.mailmap` config option could be added as
well in the same way as for `git log`.

Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
---
 Documentation/git-cat-file.txt |  6 ++++++
 builtin/cat-file.c             | 32 +++++++++++++++++++++++++++++++-
 t/t4203-mailmap.sh             | 30 ++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..887739c41f 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -63,6 +63,12 @@ OPTIONS
 	or to ask for a "blob" with `<object>` being a tag object that
 	points at it.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+	Use mailmap file to map author, committer and tagger names
+	and email addresses to canonical real names and email addresses.
+	See linkgit:git-shortlog[1].
+
 --textconv::
 	Show the content as transformed by a textconv filter. In this case,
 	`<object>` has to be of the form `<tree-ish>:<path>`, or `:<path>` in
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..fc02b9f487 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "mailmap.h"
 
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
@@ -36,6 +37,20 @@ struct batch_options {
 
 static const char *force_path;
 
+static struct string_list mailmap = STRING_LIST_INIT_NODUP;
+static int use_mailmap;
+
+char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
+	rewrite_ident_line(&sb, "\nauthor ", &mailmap);
+	rewrite_ident_line(&sb, "\ncommitter ", &mailmap);
+	rewrite_ident_line(&sb, "\ntagger ", &mailmap);
+	*size = sb.len;
+	return strbuf_detach(&sb, NULL);
+}
+
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
 			 char **buf, unsigned long *size)
@@ -152,6 +167,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
+		if (use_mailmap)
+			buf = replace_idents_using_mailmap(buf, &size);
+
 		/* otherwise just spit out the data */
 		break;
 
@@ -183,6 +201,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);
+
+		if (use_mailmap)
+			buf = replace_idents_using_mailmap(buf, &size);
 		break;
 	}
 	default:
@@ -348,11 +369,15 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		void *contents;
 
 		contents = read_object_file(oid, &type, &size);
+
+		if (use_mailmap)
+			contents = replace_idents_using_mailmap(contents, &size);
+
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
+		if (data->info.sizep && size != data->size && !use_mailmap)
 			die("object %s changed size!?", oid_to_hex(oid));
 
 		batch_write(opt, contents, size);
@@ -843,6 +868,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		/* Batch mode */
 		OPT_GROUP(N_("Batch objects requested on stdin (or --batch-all-objects)")),
 		OPT_CALLBACK_F(0, "batch", &batch, N_("format"),
@@ -885,6 +912,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	opt_cw = (opt == 'c' || opt == 'w');
 	opt_epts = (opt == 'e' || opt == 'p' || opt == 't' || opt == 's');
 
+	if (use_mailmap)
+		read_mailmap(&mailmap);
+
 	/* --batch-all-objects? */
 	if (opt == 'b')
 		batch.all_objects = 1;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 0b2d21ec55..bfddc35d9d 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -963,4 +963,34 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author Orig <orig@example.com>
+	EOF
+	git cat-file --no-use-mailmap commit HEAD >log &&
+	grep author log >actual &&
+	sed -e "/^author/q" actual >log &&
+	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--use-mailmap enables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author A U Thor <author@example.com>
+	EOF
+	git cat-file --use-mailmap commit HEAD >log &&
+	grep author log >actual &&
+	sed -e "/^author/q" actual >log &&
+	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.37.0.3.g2093cce7fe.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line()
  2022-06-30 14:24 ` [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line() Siddharth Asthana
@ 2022-06-30 15:33   ` Phillip Wood
  2022-06-30 16:55     ` Christian Couder
  2022-06-30 23:31   ` Junio C Hamano
  1 sibling, 1 reply; 68+ messages in thread
From: Phillip Wood @ 2022-06-30 15:33 UTC (permalink / raw)
  To: Siddharth Asthana, git; +Cc: Christian Couder, John Cai

Hi Siddharth

Welcome to the list!

On 30/06/2022 15:24, Siddharth Asthana wrote:
> We will be using commit_rewrite_person() in git-cat-file to rewrite
> ident line in commit/tag object buffers.
> 
> Following are the reason for renaming commit_rewrite_person():
> - the function can be used not only on a commit buffer, but also on a
>    tag object buffer, so having "commit" in its name is misleading.
> - the function works on the ident line in the commit/tag object buffers,
>    just like "split_ident_line()". Since these functions are related they
>    should have similar terms for uniformity.

I'm afraid I'm not sure about this change as the interface for 
split_ident_line() and commit_rewrite_person() are not uniform. 
split_ident_line() takes a pointer to the beginning of the name in an 
ident line and a length. commit_rewrite_person() takes the whole commit 
buffer and searches for the ident line based on the argument "what". I 
agree that having commit in the name of the function is confusing when 
it can be used for a tag, but having line in the name when it takes a 
whole buffer is also confusing. Maybe buffer_rewrite_person() or 
something like that would be clearer?

Best Wishes

Phillip


> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: John Cai <johncai86@gmail.com>
> ---
>   cache.h    | 2 +-
>   ident.c    | 2 +-
>   revision.c | 4 ++--
>   3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 442bfe5f6a..c8a98d8a80 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1694,7 +1694,7 @@ int split_ident_line(struct ident_split *, const char *, int);
>    * name and email using mailmap mechanism. Signals a success with
>    * 1 and failure with a 0.
>    */
> -int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap);
> +int rewrite_ident_line(struct strbuf *buf, const char *what, struct string_list *mailmap);
>   
>   /*
>    * Compare split idents for equality or strict ordering. Note that we
> diff --git a/ident.c b/ident.c
> index 8c890bd474..d15f579fd5 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -347,7 +347,7 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
>   	return 0;
>   }
>   
> -int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> +int rewrite_ident_line(struct strbuf *buf, const char *what, struct string_list *mailmap)
>   {
>   	char *person, *endp;
>   	size_t len, namelen, maillen;
> diff --git a/revision.c b/revision.c
> index da49e73cd6..0c8243a8e0 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3790,8 +3790,8 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>   		if (!buf.len)
>   			strbuf_addstr(&buf, message);
>   
> -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
> -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
> +		rewrite_ident_line(&buf, "\nauthor ", opt->mailmap);
> +		rewrite_ident_line(&buf, "\ncommitter ", opt->mailmap);
>   	}
>   
>   	/* Append "fake" message parts as needed */

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 14:24 ` [PATCH 3/3] cat-file: add mailmap support Siddharth Asthana
@ 2022-06-30 15:50   ` Phillip Wood
  2022-06-30 16:36     ` Phillip Wood
  2022-06-30 17:07     ` Christian Couder
  2022-06-30 23:36   ` Ævar Arnfjörð Bjarmason
  2022-06-30 23:41   ` Junio C Hamano
  2 siblings, 2 replies; 68+ messages in thread
From: Phillip Wood @ 2022-06-30 15:50 UTC (permalink / raw)
  To: Siddharth Asthana, git; +Cc: Christian Couder, John Cai

Hi Siddharth

On 30/06/2022 15:24, Siddharth Asthana wrote:
> git cat-file is not a plumbing command anymore, especially as it gained
> more and more high level features like its `--batch-command` mode. 

cat-file is definitely a plumbing command as it is intended to be used 
by scripts. It has a number of features that are used by porcelain 
commands but that does not make cat-file itself porcelain.

> So
> tools do use it to get commit and tag contents that are then displayed
> to users. This content which has author, committer or tagger
> information, could benefit from passing through the mailmap mechanism,
> before being sent or displayed.
> 
> This patch adds --[no-]use-mailmap command line option to the git
> cat-file command. It also adds --[no-]mailmap option as an alias to
> --[no-]use-mailmap.

I don't think we need an alias for this option, it'll just end up 
confusing people.

> At this time, this patch only adds a command line
> option, but perhaps a `cat-file.mailmap` config option could be added as
> well in the same way as for `git log`.

As cat-file is a plumbing command that is used by scripts we should not 
add a config option for this as it would potentially break those scripts.

I like the idea of adding mailmap support to cat-file and I think this 
patch is definitely going in the right direction.

> +char *replace_idents_using_mailmap(char *object_buf, size_t *size)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	strbuf_attach(&sb, object_buf, *size, *size + 1);

I'm worried by this as I don't think we really own the buffer returned 
by read_object_file(). git maintains a cache of objects it has loaded 
and if this strbuf grows when the author is rewritten then the pointer 
stored in the cache will become invalid. If you look at the code in 
revision.c you'll see that commit_rewrite_person() is called on a copy 
of the original object.

> +	rewrite_ident_line(&sb, "\nauthor ", &mailmap);
> +	rewrite_ident_line(&sb, "\ncommitter ", &mailmap);
> +	rewrite_ident_line(&sb, "\ntagger ", &mailmap);
> +	*size = sb.len;
> +	return strbuf_detach(&sb, NULL);
> +}
> [...]
> +test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
> +	test_when_finished "rm .mailmap" &&
> +	cat >.mailmap <<-EOF &&
> +	A U Thor <author@example.com> Orig <orig@example.com>
> +	EOF
> +	cat >expect <<-EOF &&
> +	author Orig <orig@example.com>
> +	EOF
> +	git cat-file --no-use-mailmap commit HEAD >log &&
> +	grep author log >actual &&
> +	sed -e "/^author/q" actual >log &&

This line does not have any effect on the contents of log

> +	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&

I think you can simplify this series of commands to do
	git cat-file ... >log
	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/3] ident: move commit_rewrite_person() to ident.c
  2022-06-30 14:24 ` [PATCH 1/3] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-06-30 16:00   ` Đoàn Trần Công Danh
  2022-06-30 23:22   ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Đoàn Trần Công Danh @ 2022-06-30 16:00 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai

On 2022-06-30 19:54:42+0530, Siddharth Asthana <siddharthasthana31@gmail.com> wrote:
> commit_rewrite_person() is a static function defined in revision.c.
> As the name suggests, this function can be used to replace author's,
> committer's or tagger's name in the commit/tag object buffer.
> 
> This patch moves this function from revision.c to ident.c which contains
> many other functions related to identification like split_ident_line. By
> moving this function to ident.c, we intend to use it in git-cat-file to
> replace committer's, author's and tagger's names and emails with their
> canonical name and email using the mailmap mechanism. The function
> is moved as is for now to make it clear that there are no other changes,
> but will be renamed in a following commit.
> 
> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: John Cai <johncai86@gmail.com>
> ---
>  cache.h    |  8 ++++++++
>  ident.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  revision.c | 45 ---------------------------------------------
>  3 files changed, 53 insertions(+), 45 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index ac5ab4ef9d..442bfe5f6a 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1688,6 +1688,14 @@ struct ident_split {
>   */
>  int split_ident_line(struct ident_split *, const char *, int);
>  
> +/*
> + * Given a commit or tag object buffer, replaces the person's
> + * (author/committer/tagger) name and email with their canonical
> + * name and email using mailmap mechanism. Signals a success with
> + * 1 and failure with a 0.

I'm not sure if this is important or not.
However, in this project, a function which does something usually
return 0 on success and a negative integer on failure.

However, the old code is there, so ...

> + */
> +int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap);
> +
>  /*
>   * Compare split idents for equality or strict ordering. Note that we
>   * compare only the ident part of the line, ignoring any timestamp.
> diff --git a/ident.c b/ident.c
> index 89ca5b4700..8c890bd474 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -8,6 +8,7 @@
>  #include "cache.h"
>  #include "config.h"
>  #include "date.h"
> +#include "mailmap.h"
>  
>  static struct strbuf git_default_name = STRBUF_INIT;
>  static struct strbuf git_default_email = STRBUF_INIT;
> @@ -346,6 +347,50 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
>  	return 0;
>  }
>  
> +int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> +{
> +	char *person, *endp;
> +	size_t len, namelen, maillen;
> +	const char *name;
> +	const char *mail;
> +	struct ident_split ident;
> +
> +	person = strstr(buf->buf, what);
> +	if (!person)
> +		return 0;
> +
> +	person += strlen(what);
> +	endp = strchr(person, '\n');
> +	if (!endp)
> +		return 0;
> +
> +	len = endp - person;
> +
> +	if (split_ident_line(&ident, person, len))
> +		return 0;
> +
> +	mail = ident.mail_begin;
> +	maillen = ident.mail_end - ident.mail_begin;
> +	name = ident.name_begin;
> +	namelen = ident.name_end - ident.name_begin;
> +
> +	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
> +		struct strbuf namemail = STRBUF_INIT;
> +
> +		strbuf_addf(&namemail, "%.*s <%.*s>",
> +			    (int)namelen, name, (int)maillen, mail);
> +
> +		strbuf_splice(buf, ident.name_begin - buf->buf,
> +			      ident.mail_end - ident.name_begin + 1,
> +			      namemail.buf, namemail.len);
> +
> +		strbuf_release(&namemail);
> +
> +		return 1;
> +	}
> +
> +	return 0;
> +}
>  
>  static void ident_env_hint(enum want_ident whose_ident)
>  {
> diff --git a/revision.c b/revision.c
> index 211352795c..da49e73cd6 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3755,51 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
>  	return 0;
>  }
>  
> -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> -{
> -	char *person, *endp;
> -	size_t len, namelen, maillen;
> -	const char *name;
> -	const char *mail;
> -	struct ident_split ident;
> -
> -	person = strstr(buf->buf, what);
> -	if (!person)
> -		return 0;
> -
> -	person += strlen(what);
> -	endp = strchr(person, '\n');
> -	if (!endp)
> -		return 0;
> -
> -	len = endp - person;
> -
> -	if (split_ident_line(&ident, person, len))
> -		return 0;
> -
> -	mail = ident.mail_begin;
> -	maillen = ident.mail_end - ident.mail_begin;
> -	name = ident.name_begin;
> -	namelen = ident.name_end - ident.name_begin;
> -
> -	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
> -		struct strbuf namemail = STRBUF_INIT;
> -
> -		strbuf_addf(&namemail, "%.*s <%.*s>",
> -			    (int)namelen, name, (int)maillen, mail);
> -
> -		strbuf_splice(buf, ident.name_begin - buf->buf,
> -			      ident.mail_end - ident.name_begin + 1,
> -			      namemail.buf, namemail.len);
> -
> -		strbuf_release(&namemail);
> -
> -		return 1;
> -	}
> -
> -	return 0;
> -}
> -
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;
> -- 
> 2.37.0.3.g2093cce7fe.dirty
> 

-- 
Danh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 15:50   ` Phillip Wood
@ 2022-06-30 16:36     ` Phillip Wood
  2022-06-30 17:07     ` Christian Couder
  1 sibling, 0 replies; 68+ messages in thread
From: Phillip Wood @ 2022-06-30 16:36 UTC (permalink / raw)
  To: Siddharth Asthana, git; +Cc: Christian Couder, John Cai

On 30/06/2022 16:50, Phillip Wood wrote:
>> +char *replace_idents_using_mailmap(char *object_buf, size_t *size)
>> +{
>> +    struct strbuf sb = STRBUF_INIT;
>> +    strbuf_attach(&sb, object_buf, *size, *size + 1);
> 
> I'm worried by this as I don't think we really own the buffer returned 
> by read_object_file(). git maintains a cache of objects it has loaded 
> and if this strbuf grows when the author is rewritten then the pointer 
> stored in the cache will become invalid. If you look at the code in 
> revision.c you'll see that commit_rewrite_person() is called on a copy 
> of the original object.

Sorry ignore that, looking through the code again we do own object_buf 
so this is fine.

Sorry for the confusion

Phillip

>> +    rewrite_ident_line(&sb, "\nauthor ", &mailmap);
>> +    rewrite_ident_line(&sb, "\ncommitter ", &mailmap);
>> +    rewrite_ident_line(&sb, "\ntagger ", &mailmap);
>> +    *size = sb.len;
>> +    return strbuf_detach(&sb, NULL);
>> +}
>> [...]
>> +test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
>> +    test_when_finished "rm .mailmap" &&
>> +    cat >.mailmap <<-EOF &&
>> +    A U Thor <author@example.com> Orig <orig@example.com>
>> +    EOF
>> +    cat >expect <<-EOF &&
>> +    author Orig <orig@example.com>
>> +    EOF
>> +    git cat-file --no-use-mailmap commit HEAD >log &&
>> +    grep author log >actual &&
>> +    sed -e "/^author/q" actual >log &&
> 
> This line does not have any effect on the contents of log
> 
>> +    sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
> 
> I think you can simplify this series of commands to do
>      git cat-file ... >log
>      sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual
> 
> Best Wishes
> 
> Phillip

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line()
  2022-06-30 15:33   ` Phillip Wood
@ 2022-06-30 16:55     ` Christian Couder
  0 siblings, 0 replies; 68+ messages in thread
From: Christian Couder @ 2022-06-30 16:55 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Siddharth Asthana, git, John Cai

Hi Phillip,

On Thu, Jun 30, 2022 at 5:33 PM Phillip Wood <phillip.wood123@gmail.com> wrote:

> On 30/06/2022 15:24, Siddharth Asthana wrote:
> > We will be using commit_rewrite_person() in git-cat-file to rewrite
> > ident line in commit/tag object buffers.

s/line/lines/

> > Following are the reason for renaming commit_rewrite_person():
> > - the function can be used not only on a commit buffer, but also on a
> >    tag object buffer, so having "commit" in its name is misleading.
> > - the function works on the ident line in the commit/tag object buffers,
> >    just like "split_ident_line()". Since these functions are related they
> >    should have similar terms for uniformity.
>
> I'm afraid I'm not sure about this change as the interface for
> split_ident_line() and commit_rewrite_person() are not uniform.
> split_ident_line() takes a pointer to the beginning of the name in an
> ident line and a length. commit_rewrite_person() takes the whole commit
> buffer and searches for the ident line based on the argument "what". I
> agree that having commit in the name of the function is confusing when
> it can be used for a tag, but having line in the name when it takes a
> whole buffer is also confusing.

It takes a whole buffer but it rewrites only ident lines, so maybe
"rewrite_ident_lines()" (so with "lines" instead of "line").

> Maybe buffer_rewrite_person() or
> something like that would be clearer?

I don't think "person" is better than "ident" for this, and I think
it's better to use the same name for it in split_ident_line() and the
function we are renaming.

It's true that we are not rewriting the date, so maybe
"rewrite_person_in_ident_lines()".

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 15:50   ` Phillip Wood
  2022-06-30 16:36     ` Phillip Wood
@ 2022-06-30 17:07     ` Christian Couder
  2022-06-30 21:33       ` Junio C Hamano
  1 sibling, 1 reply; 68+ messages in thread
From: Christian Couder @ 2022-06-30 17:07 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Siddharth Asthana, git, John Cai

Hi Phillip,

On Thu, Jun 30, 2022 at 5:50 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
> On 30/06/2022 15:24, Siddharth Asthana wrote:
> > git cat-file is not a plumbing command anymore, especially as it gained
> > more and more high level features like its `--batch-command` mode.
>
> cat-file is definitely a plumbing command as it is intended to be used
> by scripts. It has a number of features that are used by porcelain
> commands but that does not make cat-file itself porcelain.

Ok, so maybe:

"Even if git cat-file is a plumbing command, it has gained more and
more high level features like its `--batch-command` mode."

> > So
> > tools do use it to get commit and tag contents that are then displayed
> > to users. This content which has author, committer or tagger
> > information, could benefit from passing through the mailmap mechanism,
> > before being sent or displayed.
> >
> > This patch adds --[no-]use-mailmap command line option to the git
> > cat-file command. It also adds --[no-]mailmap option as an alias to
> > --[no-]use-mailmap.
>
> I don't think we need an alias for this option, it'll just end up
> confusing people.

I am not sure if people would be more confused by the alias or by the
fact that the "--[no-]mailmap" alias works for `git log` but not `git
cat-file`.

> > At this time, this patch only adds a command line
> > option, but perhaps a `cat-file.mailmap` config option could be added as
> > well in the same way as for `git log`.
>
> As cat-file is a plumbing command that is used by scripts we should not
> add a config option for this as it would potentially break those scripts.

Yeah, we could either remove this small paragraph or add the
explanation you give.

> I like the idea of adding mailmap support to cat-file and I think this
> patch is definitely going in the right direction.

Thanks!

> > +test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
> > +     test_when_finished "rm .mailmap" &&
> > +     cat >.mailmap <<-EOF &&
> > +     A U Thor <author@example.com> Orig <orig@example.com>
> > +     EOF
> > +     cat >expect <<-EOF &&
> > +     author Orig <orig@example.com>
> > +     EOF
> > +     git cat-file --no-use-mailmap commit HEAD >log &&
> > +     grep author log >actual &&
> > +     sed -e "/^author/q" actual >log &&
>
> This line does not have any effect on the contents of log
>
> > +     sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
>
> I think you can simplify this series of commands to do
>         git cat-file ... >log
>         sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual

Thanks for the suggestion!

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/3] Add support for mailmap in cat-file
  2022-06-30 14:24 [PATCH 0/3] Add support for mailmap in cat-file Siddharth Asthana
                   ` (2 preceding siblings ...)
  2022-06-30 14:24 ` [PATCH 3/3] cat-file: add mailmap support Siddharth Asthana
@ 2022-06-30 21:18 ` Junio C Hamano
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
  4 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-06-30 21:18 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> So, this patch series adds mailmap support to the git cat-file command.
> It does that by adding --[no-]use-mailmap command line option to the
> git cat-file command. It also adds --[no-]mailmap option as an alias to
> --[no-]use-mailmap.

So does this kick in only with "git cat-file commit <object>" and
never with "git cat-file $type" for non-commit object types?  For a
payload like CREDITS file, people may want the blob contents
filtered by applying the mailmap, so limiting it to only commits may
or may not be the best idea.

How does/should this interact with "git cat-file -p"?

Does it also work with the batch mode?

For a single-request-single-answer invocation like "git cat-file
commit <object>", I think a "--[no-]use-mailmap" option is OK, but
for something like the batch mode, we may want a way to obtain both
the original and mapped name(s).  E.g. with this option in effect,
in addition to the "author" and "committer" headers of the commit,
the output may get a "mailmap-author" and "mailmap-committer" fake
headers that show the mapped idents.

Soliciting too many questions mean the cover letter is doing a good
job to pique interest from readers, and is not doing a good job to
explain adequately what it really does ;-)

Let's read on.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 17:07     ` Christian Couder
@ 2022-06-30 21:33       ` Junio C Hamano
  2022-07-07  9:15         ` Christian Couder
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2022-06-30 21:33 UTC (permalink / raw)
  To: Christian Couder; +Cc: Phillip Wood, Siddharth Asthana, git, John Cai

Christian Couder <christian.couder@gmail.com> writes:

> "Even if git cat-file is a plumbing command, it has gained more and
> more high level features like its `--batch-command` mode."

What batch-command and batch do does not sound like "high level" at
all.  It is just "instead of invoking separate process to ask about
each individual object, a single process answers the same low level
requests".

Independent of what this "The output from 'cat-file commit' is
tweaked" feature, I wonder if we want a command that can be used as
a filter.  Just like "git name-rev --stdin" reads a stream of text,
finds commit-looking references in it, and annotates them, the
command (e.g. "git mailmap") would find ident-looking strings and
replaces with the mapped results, or something.

>> > At this time, this patch only adds a command line
>> > option, but perhaps a `cat-file.mailmap` config option could be added as
>> > well in the same way as for `git log`.
>>
>> As cat-file is a plumbing command that is used by scripts we should not
>> add a config option for this as it would potentially break those scripts.

Absolutely.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/3] ident: move commit_rewrite_person() to ident.c
  2022-06-30 14:24 ` [PATCH 1/3] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
  2022-06-30 16:00   ` Đoàn Trần Công Danh
@ 2022-06-30 23:22   ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-06-30 23:22 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> +/*
> + * Given a commit or tag object buffer, replaces the person's

I do not think you should refer to "or tag" at this point.  It is
not designed to be used by anything other than a commit, and nobody
passes a tag in the original code, or even after this patch is
applied.

> + * (author/committer/tagger) name and email with their canonical
> + * name and email using mailmap mechanism. Signals a success with

"using" -> "using the"

> + * 1 and failure with a 0.

I do not think 0 signals a failure.  If it makes changes, then the
function returns a non-zero value.  0 only indicates "we made no
modification to the buffer".

> +int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> +{
> +	char *person, *endp;
> +	size_t len, namelen, maillen;
> +	const char *name;
> +	const char *mail;
> +	struct ident_split ident;
> +
> +	person = strstr(buf->buf, what);
> +	if (!person)
> +		return 0;

I do not think it is a good idea to expose this as a public function
as is, especially given that what you have to pass in "what" is a
bit awkward.  The function is designed to find and replace an ident
string in the header part, and the way it avoids a random occurence
of author Siddharth Asthana <si...@gmail.com> in the text, not
nececessarily in the header, is by insisting "author" to appear at
the beginning of line by passing "\nauthor " as "what".

Also as you can see, the implementation does not make *any* effort
to limit itself to the commit/tag header by locating the blank line
that appears after the header part and stopping the search there.
That may not be *your* bug, but should be fixed before exposing it
as a public function and inflicting its bug to more callers.

Also the interface forces the caller to make multiple calls if it
wants to rewrite idents on multiple headers.  It shouldn't be the
case.

To support the existing caller better, it should be updated to

 * take one or more header names (like "author", "committer"), make
   a single pass in the input buffer to locate these headers and
   replace idents on them;

 * stop at the end of header, ensuring that nothing in the body of
   the commit object is modified.

Alternatively, it may not be a bad idea to simplify the interface so
that _all_ headers are subject to be rewritten, without any need to
the "what" parameter.  If you want to go that route, then you would
probably have a loop over buf->buf that iterates one line at a time,
stopping at the first empty line that denotes the end of header. For
each line, you'd skip to the first SP that is past the header name,
see if split_ident_line() thinks the line it got indeed has an
ident, and use map_user() to the ident if it found.  Do that for
each line and you are done when you reached the end of the header.

Once we fix the external interface like so while being a static
function inside revision.c and update its two callers (which will
become just a single caller), we can move and expose it as a public
function.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line()
  2022-06-30 14:24 ` [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line() Siddharth Asthana
  2022-06-30 15:33   ` Phillip Wood
@ 2022-06-30 23:31   ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-06-30 23:31 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> We will be using commit_rewrite_person() in git-cat-file to rewrite
> ident line in commit/tag object buffers.
>
> Following are the reason for renaming commit_rewrite_person():
> - the function can be used not only on a commit buffer, but also on a
>   tag object buffer, so having "commit" in its name is misleading.
> - the function works on the ident line in the commit/tag object buffers,
>   just like "split_ident_line()". Since these functions are related they
>   should have similar terms for uniformity.

"ident" is good (so is "person" in the original).

"rewrite" is not quite good, as you do not convey what kind of
rewrite you are doing (it is not likely that you are upcasing their
names, but from "rewrite" you cannot tell that is the case).  We
should have "mailmap" somewhere in its name.

"line" is not good at all.  This function receives an entire buffer,
not just a single line that was located by the caller and rewrites
that line.  "buffer" might be an improvement over "line", but as I
alluded to in my review on [1/3], I think this function should limit
itself only to the header part of the commit/tag buffer, so "header"
would be the word you want in its name.

Once we say "mailmap", there is not much point in saying ident or
person (as "mailmap" is only about the ident/person).  So

  int apply_mailmap_to_header(struct strbuf *, struct string_list *mailmap)

or something?

> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: John Cai <johncai86@gmail.com>

I think these are the other way around.  With help from two people,
you wrote the patch and the final step before sending it out to the
list is your sign-off.  Chronologically, in the above, your sign-off
should come at the end.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 14:24 ` [PATCH 3/3] cat-file: add mailmap support Siddharth Asthana
  2022-06-30 15:50   ` Phillip Wood
@ 2022-06-30 23:36   ` Ævar Arnfjörð Bjarmason
  2022-06-30 23:53     ` Junio C Hamano
  2022-07-07  9:02     ` Christian Couder
  2022-06-30 23:41   ` Junio C Hamano
  2 siblings, 2 replies; 68+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-30 23:36 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai


On Thu, Jun 30 2022, Siddharth Asthana wrote:

> git cat-file is not a plumbing command anymore, especially as it gained
> more and more high level features like its `--batch-command` mode. So
> tools do use it to get commit and tag contents that are then displayed
> to users. This content which has author, committer or tagger
> information, could benefit from passing through the mailmap mechanism,
> before being sent or displayed.
>
> This patch adds --[no-]use-mailmap command line option to the git
> cat-file command. It also adds --[no-]mailmap option as an alias to
> --[no-]use-mailmap.

I think I know the answer, but I think it would be helpful to discuss
the underlying motivations too. I.e. an obvious alternative is "why not
just get this information out of git show/log then?".

The "I think I know the answer" being that I suspect this is to cater to
gitaly having persistent "cat-file" processes around, whereas for "git
log" it would entail spinning up a new process per-request.

But maybe I'm missing something :)

So not as a blocker for this change, which I think can be made small
enough to be justified in cat-file, but just for context: If "git log"
had a similar --batch mode, would there be a need for this change, or is
this just adding a common case to "cat-file" to "tide us over" (as it
were) while that sort of thing doesn't exist yet (and maybe never will
:()?


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 14:24 ` [PATCH 3/3] cat-file: add mailmap support Siddharth Asthana
  2022-06-30 15:50   ` Phillip Wood
  2022-06-30 23:36   ` Ævar Arnfjörð Bjarmason
@ 2022-06-30 23:41   ` Junio C Hamano
  2 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-06-30 23:41 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> +char *replace_idents_using_mailmap(char *object_buf, size_t *size)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	strbuf_attach(&sb, object_buf, *size, *size + 1);
> +	rewrite_ident_line(&sb, "\nauthor ", &mailmap);
> +	rewrite_ident_line(&sb, "\ncommitter ", &mailmap);
> +	rewrite_ident_line(&sb, "\ntagger ", &mailmap);

This shows why you want to fix the existing function first before
inflicting its poorly designed API to more callers.  There is no
sensible reason that you have to make three calls to the helper
function.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 23:36   ` Ævar Arnfjörð Bjarmason
@ 2022-06-30 23:53     ` Junio C Hamano
  2022-07-07  9:02     ` Christian Couder
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-06-30 23:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Siddharth Asthana, git, Christian Couder, John Cai

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> So not as a blocker for this change, which I think can be made small
> enough to be justified in cat-file, but just for context: If "git log"
> had a similar --batch mode,

Perhaps "xargs git show" is what you want? ;-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 23:36   ` Ævar Arnfjörð Bjarmason
  2022-06-30 23:53     ` Junio C Hamano
@ 2022-07-07  9:02     ` Christian Couder
  1 sibling, 0 replies; 68+ messages in thread
From: Christian Couder @ 2022-07-07  9:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Siddharth Asthana, git, John Cai

On Fri, Jul 1, 2022 at 1:39 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, Jun 30 2022, Siddharth Asthana wrote:
>
> > git cat-file is not a plumbing command anymore, especially as it gained
> > more and more high level features like its `--batch-command` mode. So
> > tools do use it to get commit and tag contents that are then displayed
> > to users. This content which has author, committer or tagger
> > information, could benefit from passing through the mailmap mechanism,
> > before being sent or displayed.
> >
> > This patch adds --[no-]use-mailmap command line option to the git
> > cat-file command. It also adds --[no-]mailmap option as an alias to
> > --[no-]use-mailmap.
>
> I think I know the answer, but I think it would be helpful to discuss
> the underlying motivations too. I.e. an obvious alternative is "why not
> just get this information out of git show/log then?".
>
> The "I think I know the answer" being that I suspect this is to cater to
> gitaly having persistent "cat-file" processes around, whereas for "git
> log" it would entail spinning up a new process per-request.
>
> But maybe I'm missing something :)

No, you are not missing anything :)

> So not as a blocker for this change, which I think can be made small
> enough to be justified in cat-file, but just for context: If "git log"
> had a similar --batch mode, would there be a need for this change,

One nice thing with `git cat-file` is that it works for any kind of
git object. But yeah we could perhaps use `git show` if it had a
--batch mode, or a mix of `git cat-file` for blobs and trees, and `git
log --batch` for commits and tag objects.

> or is
> this just adding a common case to "cat-file" to "tide us over" (as it
> were) while that sort of thing doesn't exist yet (and maybe never will
> :()?

By the way there have been 2 GSoC contributors working on adding
ref-filter formating support to `git cat-file` and that hasn't
succeeded yet, mostly for performance reasons. So another way would be
to wait until ref-filter formats support everything that pretty
formats support, which Jaydeep is currently working on, including %aN
(author name respecting .mailmap) and %aE (author email respecting
.mailmap), and then wait until `git cat-file`'s --batch formats
support ref-filter formats.

But yeah, we hope the change can be made small enough to be justified
in cat-file.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/3] cat-file: add mailmap support
  2022-06-30 21:33       ` Junio C Hamano
@ 2022-07-07  9:15         ` Christian Couder
  0 siblings, 0 replies; 68+ messages in thread
From: Christian Couder @ 2022-07-07  9:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Siddharth Asthana, git, John Cai

On Thu, Jun 30, 2022 at 11:33 PM Junio C Hamano <gitster@pobox.com> wrote:

> Independent of what this "The output from 'cat-file commit' is
> tweaked" feature, I wonder if we want a command that can be used as
> a filter.  Just like "git name-rev --stdin" reads a stream of text,
> finds commit-looking references in it, and annotates them, the
> command (e.g. "git mailmap") would find ident-looking strings and
> replaces with the mapped results, or something.

Yeah, that might be useful. Especially one issue is that trailers like
"Reviewed-by: ...", "Helped-by: ..." are interesting for statistics
and replacing their content with mapped results could help improve the
stats.

Related to this, I wonder if `git interpret-trailers` should have a
simple way to map trailer content.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/4] Add support for mailmap in cat-file
  2022-06-30 14:24 [PATCH 0/3] Add support for mailmap in cat-file Siddharth Asthana
                   ` (3 preceding siblings ...)
  2022-06-30 21:18 ` [PATCH 0/3] Add support for mailmap in cat-file Junio C Hamano
@ 2022-07-07 16:15 ` Siddharth Asthana
  2022-07-07 16:15   ` [PATCH v2 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
                     ` (5 more replies)
  4 siblings, 6 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-07 16:15 UTC (permalink / raw)
  To: git
  Cc: Siddharth Asthana, phillip.wood, avarab, congdanhqx, gitster,
	christian.couder, johncai86

Thanks a lot for the review and suggestions Phillip, Danh, Ævar and Junio.
Really grateful for that :)

= Description

This patch series adds mailmap support to the git-cat-file command. It
adds the mailmap support only for the commit and tag objects by
replacing the idents for "author", "committer" and "tagger" headers. The
mailmap only takes effect when --[no-]-use-mailmap or --[no-]-mailmap
option is passed to the git cat-file command. The changes will work with
the batch mode as well.

So, if one wants to enable mailmap they can use either of the following
commands:
$ git cat-file --use-mailmap -p <object>
$ git cat-file --use-mailmap <type> <object>

To use it in the batch mode, one can use the following command:
$ git cat-file --use-mailmap --batch

= Patch Organization

- The first patch improves the commit_rewrite_person() by restricting it 
  to traverse only through the header part of the commit object buffer.
  It also adds an argument called headers which the callers can pass. 
  The function will replace idents only on these  passed headers. 
  Thus, the caller won't have to make repeated calls to the function.
- The second patch moves commit_rewrite_person() to ident.c to expose it
  as a public function so that it can be used to replace idents in the
  headers of desired objects.
- The third patch renames commit_rewrite_person() to a name which
  describes its functionality clearly. It is renamed to
  apply_mailmap_to_header().
- The last patch adds mailmap support to the git cat-file command. It
  adds the required documentation and tests as well.

Changes in v2:
- The commit_rewrite_person() has been improved by restricting it to
  traverse only the header part of the object buffers.
- The callers of commit_rewrite_person() now don't require to call it
  multiple times for different headers. They can pass an array of
  headers and commit_rewrite_person() replaces idents only on those
  headers.
- commit_rewrite_person() has been renamed to a suitable name which
  expresses its functionality clearly.
- More tests have been added to test the --[no-]-use-mailmap option for
  the tag objects.
- Redundant operations from the tests have been removed.

Siddharth Asthana (4):
  revision: improve commit_rewrite_person()
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++
 builtin/cat-file.c             | 31 ++++++++++++++-
 cache.h                        |  6 +++
 ident.c                        | 69 ++++++++++++++++++++++++++++++++++
 revision.c                     | 49 +-----------------------
 t/t4203-mailmap.sh             | 54 ++++++++++++++++++++++++++
 6 files changed, 167 insertions(+), 48 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  64e1f750e1 revision: improve commit_rewrite_person()
1:  8749b6024f ! 2:  b18ced0ece ident: move commit_rewrite_person() to ident.c
    @@ Metadata
      ## Commit message ##
         ident: move commit_rewrite_person() to ident.c
     
    -    commit_rewrite_person() is a static function defined in revision.c.
    -    As the name suggests, this function can be used to replace author's,
    -    committer's or tagger's name in the commit/tag object buffer.
    +    commit_rewrite_person() and rewrite_ident_line() are static functions
    +    defined in revision.c.
     
    -    This patch moves this function from revision.c to ident.c which contains
    -    many other functions related to identification like split_ident_line. By
    -    moving this function to ident.c, we intend to use it in git-cat-file to
    -    replace committer's, author's and tagger's names and emails with their
    -    canonical name and email using the mailmap mechanism. The function
    -    is moved as is for now to make it clear that there are no other changes,
    -    but will be renamed in a following commit.
    +    Their usages are as follows:
    +    - commit_rewrite_person() takes a commit buffer and replaces the author
    +      and committer idents with their canonical versions using the mailmap
    +      mechanism
    +    - rewrite_ident_line() takes author/committer header lines from the
    +      commit buffer and replaces the idents with their canonical versions
    +      using the mailmap mechanism.
     
    +    This patch moves commit_rewrite_person() and rewrite_ident_line() to
    +    ident.c which contains many other functions related to idents like
    +    split_ident_line(). By moving commit_rewrite_person() to ident.c, we
    +    also intend to use it in git-cat-file to replace committer and author
    +    idents from the headers to their canonical versions using the mailmap
    +    mechanism. The function is moved as is for now to make it clear that
    +    there are no other changes, but it will be renamed in a following
    +    commit.
    +
    +    Mentored-by: Christian Couder <christian.couder@gmail.com>
    +    Mentored-by: John Cai <johncai86@gmail.com>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## cache.h ##
    @@ cache.h: struct ident_split {
      int split_ident_line(struct ident_split *, const char *, int);
      
     +/*
    -+ * Given a commit or tag object buffer, replaces the person's
    -+ * (author/committer/tagger) name and email with their canonical
    -+ * name and email using mailmap mechanism. Signals a success with
    -+ * 1 and failure with a 0.
    ++ * Given a commit object buffer and the commit headers, replaces the idents
    ++ * in the headers with their canonical versions using the mailmap mechanism.
     + */
    -+int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap);
    ++void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
     +
      /*
       * Compare split idents for equality or strict ordering. Note that we
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
      	return 0;
      }
      
    -+int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
    ++/*
    ++ * Returns the difference between the new and old length of the ident line.
    ++ */
    ++static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
     +{
    -+	char *person, *endp;
    ++	char *endp;
     +	size_t len, namelen, maillen;
     +	const char *name;
     +	const char *mail;
     +	struct ident_split ident;
     +
    -+	person = strstr(buf->buf, what);
    -+	if (!person)
    -+		return 0;
    -+
    -+	person += strlen(what);
     +	endp = strchr(person, '\n');
     +	if (!endp)
     +		return 0;
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +
     +	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
     +		struct strbuf namemail = STRBUF_INIT;
    ++		size_t newlen;
     +
     +		strbuf_addf(&namemail, "%.*s <%.*s>",
     +			    (int)namelen, name, (int)maillen, mail);
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +			      ident.mail_end - ident.name_begin + 1,
     +			      namemail.buf, namemail.len);
     +
    ++		newlen = namemail.len;
    ++
     +		strbuf_release(&namemail);
     +
    -+		return 1;
    ++		return newlen - (ident.mail_end - ident.name_begin + 1);
     +	}
     +
     +	return 0;
    ++}
    ++
    ++void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    ++{
    ++	size_t buf_offset = 0;
    ++
    ++	if (!mailmap)
    ++		return;
    ++
    ++	for (;;) {
    ++		const char *person, *line = buf->buf + buf_offset;
    ++		int i, linelen = strchrnul(line, '\n') - line + 1;
    ++
    ++		if (!linelen || linelen == 1)
    ++			/* End of header */
    ++			return;
    ++
    ++		buf_offset += linelen;
    ++
    ++		for (i = 0; headers[i]; i++)
    ++			if (skip_prefix(line, headers[i], &person))
    ++				buf_offset += rewrite_ident_line(person, buf, mailmap);
    ++	}
     +}
      
      static void ident_env_hint(enum want_ident whose_ident)
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
      	return 0;
      }
      
    --static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
    +-/*
    +- * Returns the difference between the new and old length of the ident line.
    +- */
    +-static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
     -{
    --	char *person, *endp;
    +-	char *endp;
     -	size_t len, namelen, maillen;
     -	const char *name;
     -	const char *mail;
     -	struct ident_split ident;
     -
    --	person = strstr(buf->buf, what);
    --	if (!person)
    --		return 0;
    --
    --	person += strlen(what);
     -	endp = strchr(person, '\n');
     -	if (!endp)
     -		return 0;
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -
     -	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
     -		struct strbuf namemail = STRBUF_INIT;
    +-		size_t newlen;
     -
     -		strbuf_addf(&namemail, "%.*s <%.*s>",
     -			    (int)namelen, name, (int)maillen, mail);
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -			      ident.mail_end - ident.name_begin + 1,
     -			      namemail.buf, namemail.len);
     -
    +-		newlen = namemail.len;
    +-
     -		strbuf_release(&namemail);
     -
    --		return 1;
    +-		return newlen - (ident.mail_end - ident.name_begin + 1);
     -	}
     -
     -	return 0;
     -}
    +-
    +-static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    +-{
    +-	size_t buf_offset = 0;
    +-
    +-	if (!mailmap)
    +-		return;
    +-
    +-	for (;;) {
    +-		const char *person, *line = buf->buf + buf_offset;
    +-		int i, linelen = strchrnul(line, '\n') - line + 1;
    +-
    +-		if (!linelen || linelen == 1)
    +-			/* End of header */
    +-			return;
    +-
    +-		buf_offset += linelen;
    +-
    +-		for (i = 0; headers[i]; i++)
    +-			if (skip_prefix(line, headers[i], &person))
    +-				buf_offset += rewrite_ident_line(person, buf, mailmap);
    +-	}
    +-}
     -
      static int commit_match(struct commit *commit, struct rev_info *opt)
      {
2:  aff60f541b < -:  ---------- ident: rename commit_rewrite_person() to rewrite_ident_line()
-:  ---------- > 3:  2494ce1ed2 ident: rename commit_rewrite_person() to apply_mailmap_to_header()
3:  2a697167db ! 4:  94838a2566 cat-file: add mailmap support
    @@ Metadata
      ## Commit message ##
         cat-file: add mailmap support
     
    -    git cat-file is not a plumbing command anymore, especially as it gained
    -    more and more high level features like its `--batch-command` mode. So
    -    tools do use it to get commit and tag contents that are then displayed
    -    to users. This content which has author, committer or tagger
    -    information, could benefit from passing through the mailmap mechanism,
    -    before being sent or displayed.
    +    git-cat-file is used by tools like GitLab to get commit tag contents
    +    that are then displayed to users. This content which has author,
    +    committer or tagger information, could benefit from passing through the
    +    mailmap mechanism before being sent or displayed.
     
         This patch adds --[no-]use-mailmap command line option to the git
         cat-file command. It also adds --[no-]mailmap option as an alias to
         --[no-]use-mailmap.
     
    -    At this time, this patch only adds a command line
    -    option, but perhaps a `cat-file.mailmap` config option could be added as
    -    well in the same way as for `git log`.
    -
    +    Mentored-by: Christian Couder <christian.couder@gmail.com>
    +    Mentored-by: John Cai <johncai86@gmail.com>
    +    Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## Documentation/git-cat-file.txt ##
    @@ Documentation/git-cat-file.txt: OPTIONS
      
     +--[no-]mailmap::
     +--[no-]use-mailmap::
    -+	Use mailmap file to map author, committer and tagger names
    -+	and email addresses to canonical real names and email addresses.
    -+	See linkgit:git-shortlog[1].
    ++       Use mailmap file to map author, committer and tagger names
    ++       and email addresses to canonical real names and email addresses.
    ++       See linkgit:git-shortlog[1].
     +
      --textconv::
      	Show the content as transformed by a textconv filter. In this case,
    @@ builtin/cat-file.c: struct batch_options {
     +{
     +	struct strbuf sb = STRBUF_INIT;
     +	strbuf_attach(&sb, object_buf, *size, *size + 1);
    -+	rewrite_ident_line(&sb, "\nauthor ", &mailmap);
    -+	rewrite_ident_line(&sb, "\ncommitter ", &mailmap);
    -+	rewrite_ident_line(&sb, "\ntagger ", &mailmap);
    ++	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
    ++	apply_mailmap_to_header(&sb, headers, &mailmap);
     +	*size = sb.len;
     +	return strbuf_detach(&sb, NULL);
     +}
    @@ t/t4203-mailmap.sh: test_expect_success SYMLINKS 'symlinks not respected in-tree
     +	author Orig <orig@example.com>
     +	EOF
     +	git cat-file --no-use-mailmap commit HEAD >log &&
    -+	grep author log >actual &&
    -+	sed -e "/^author/q" actual >log &&
    -+	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
    ++	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
     +	test_cmp expect actual
     +'
     +
    @@ t/t4203-mailmap.sh: test_expect_success SYMLINKS 'symlinks not respected in-tree
     +	author A U Thor <author@example.com>
     +	EOF
     +	git cat-file --use-mailmap commit HEAD >log &&
    -+	grep author log >actual &&
    -+	sed -e "/^author/q" actual >log &&
    -+	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
    ++	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
    ++	test_when_finished "rm .mailmap" &&
    ++	cat >.mailmap <<-EOF &&
    ++	Orig <orig@example.com> C O Mitter <committer@example.com>
    ++	EOF
    ++	cat >expect <<-EOF &&
    ++	tagger C O Mitter <committer@example.com>
    ++	EOF
    ++	git tag -a -m "annotated tag" v1 &&
    ++	git cat-file --no-mailmap -p v1 >log &&
    ++	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
    ++	test_when_finished "rm .mailmap" &&
    ++	cat >.mailmap <<-EOF &&
    ++	Orig <orig@example.com> C O Mitter <committer@example.com>
    ++	EOF
    ++	cat >expect <<-EOF &&
    ++	tagger Orig <orig@example.com>
    ++	EOF
    ++	git tag -a -m "annotated tag" v2 &&
    ++	git cat-file --mailmap -p v2 >log &&
    ++	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
     +	test_cmp expect actual
     +'
     +
-- 
2.37.0.6.ga6a61a26c1.dirty


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/4] revision: improve commit_rewrite_person()
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
@ 2022-07-07 16:15   ` Siddharth Asthana
  2022-07-07 21:52     ` Junio C Hamano
  2022-07-08 14:50     ` Đoàn Trần Công Danh
  2022-07-07 16:15   ` [PATCH v2 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-07 16:15 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai

The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occuranace of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplyfying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 revision.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 211352795c..83e68c1f97 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,19 +3755,17 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
 {
-	char *person, *endp;
+	char *endp;
 	size_t len, namelen, maillen;
 	const char *name;
 	const char *mail;
 	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
 	endp = strchr(person, '\n');
 	if (!endp)
 		return 0;
@@ -3784,6 +3782,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
 
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
@@ -3792,14 +3791,39 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 			      ident.mail_end - ident.name_begin + 1,
 			      namemail.buf, namemail.len);
 
+		newlen = namemail.len;
+
 		strbuf_release(&namemail);
 
-		return 1;
+		return newlen - (ident.mail_end - ident.name_begin + 1);
 	}
 
 	return 0;
 }
 
+static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line = buf->buf + buf_offset;
+		int i, linelen = strchrnul(line, '\n') - line + 1;
+
+		if (!linelen || linelen == 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
+
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
@@ -3835,8 +3859,8 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.6.ga6a61a26c1.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 2/4] ident: move commit_rewrite_person() to ident.c
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
  2022-07-07 16:15   ` [PATCH v2 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-07 16:15   ` Siddharth Asthana
  2022-07-07 16:15   ` [PATCH v2 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-07 16:15 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai

commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    |  6 +++++
 ident.c    | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.c | 69 ------------------------------------------------------
 3 files changed, 75 insertions(+), 69 deletions(-)

diff --git a/cache.h b/cache.h
index ac5ab4ef9d..c9dbe1c29a 100644
--- a/cache.h
+++ b/cache.h
@@ -1688,6 +1688,12 @@ struct ident_split {
  */
 int split_ident_line(struct ident_split *, const char *, int);
 
+/*
+ * Given a commit object buffer and the commit headers, replaces the idents
+ * in the headers with their canonical versions using the mailmap mechanism.
+ */
+void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+
 /*
  * Compare split idents for equality or strict ordering. Note that we
  * compare only the ident part of the line, ignoring any timestamp.
diff --git a/ident.c b/ident.c
index 89ca5b4700..26cc60b2e1 100644
--- a/ident.c
+++ b/ident.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "config.h"
 #include "date.h"
+#include "mailmap.h"
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
@@ -346,6 +347,74 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
+{
+	char *endp;
+	size_t len, namelen, maillen;
+	const char *name;
+	const char *mail;
+	struct ident_split ident;
+
+	endp = strchr(person, '\n');
+	if (!endp)
+		return 0;
+
+	len = endp - person;
+
+	if (split_ident_line(&ident, person, len))
+		return 0;
+
+	mail = ident.mail_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name_begin;
+	namelen = ident.name_end - ident.name_begin;
+
+	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
+		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
+
+		strbuf_addf(&namemail, "%.*s <%.*s>",
+			    (int)namelen, name, (int)maillen, mail);
+
+		strbuf_splice(buf, ident.name_begin - buf->buf,
+			      ident.mail_end - ident.name_begin + 1,
+			      namemail.buf, namemail.len);
+
+		newlen = namemail.len;
+
+		strbuf_release(&namemail);
+
+		return newlen - (ident.mail_end - ident.name_begin + 1);
+	}
+
+	return 0;
+}
+
+void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line = buf->buf + buf_offset;
+		int i, linelen = strchrnul(line, '\n') - line + 1;
+
+		if (!linelen || linelen == 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
 
 static void ident_env_hint(enum want_ident whose_ident)
 {
diff --git a/revision.c b/revision.c
index 83e68c1f97..49d15e74ff 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,75 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-/*
- * Returns the difference between the new and old length of the ident line.
- */
-static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
-{
-	char *endp;
-	size_t len, namelen, maillen;
-	const char *name;
-	const char *mail;
-	struct ident_split ident;
-
-	endp = strchr(person, '\n');
-	if (!endp)
-		return 0;
-
-	len = endp - person;
-
-	if (split_ident_line(&ident, person, len))
-		return 0;
-
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-
-	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
-		struct strbuf namemail = STRBUF_INIT;
-		size_t newlen;
-
-		strbuf_addf(&namemail, "%.*s <%.*s>",
-			    (int)namelen, name, (int)maillen, mail);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      namemail.buf, namemail.len);
-
-		newlen = namemail.len;
-
-		strbuf_release(&namemail);
-
-		return newlen - (ident.mail_end - ident.name_begin + 1);
-	}
-
-	return 0;
-}
-
-static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
-{
-	size_t buf_offset = 0;
-
-	if (!mailmap)
-		return;
-
-	for (;;) {
-		const char *person, *line = buf->buf + buf_offset;
-		int i, linelen = strchrnul(line, '\n') - line + 1;
-
-		if (!linelen || linelen == 1)
-			/* End of header */
-			return;
-
-		buf_offset += linelen;
-
-		for (i = 0; headers[i]; i++)
-			if (skip_prefix(line, headers[i], &person))
-				buf_offset += rewrite_ident_line(person, buf, mailmap);
-	}
-}
-
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
-- 
2.37.0.6.ga6a61a26c1.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
  2022-07-07 16:15   ` [PATCH v2 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-07 16:15   ` [PATCH v2 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-07-07 16:15   ` Siddharth Asthana
  2022-07-07 16:15   ` [PATCH v2 4/4] cat-file: add mailmap support Siddharth Asthana
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-07 16:15 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai

commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    | 6 +++---
 ident.c    | 2 +-
 revision.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index c9dbe1c29a..9edb7fefd3 100644
--- a/cache.h
+++ b/cache.h
@@ -1689,10 +1689,10 @@ struct ident_split {
 int split_ident_line(struct ident_split *, const char *, int);
 
 /*
- * Given a commit object buffer and the commit headers, replaces the idents
- * in the headers with their canonical versions using the mailmap mechanism.
+ * Given a commit or tag object buffer and the commit or tag headers, replaces
+ * the idents in the headers with their canonical versions using the mailmap mechanism.
  */
-void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 26cc60b2e1..8503098f29 100644
--- a/ident.c
+++ b/ident.c
@@ -393,7 +393,7 @@ static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct
 	return 0;
 }
 
-void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap)
 {
 	size_t buf_offset = 0;
 
diff --git a/revision.c b/revision.c
index 49d15e74ff..b561d6b5b5 100644
--- a/revision.c
+++ b/revision.c
@@ -3791,7 +3791,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 			strbuf_addstr(&buf, message);
 
 		const char *commit_headers[] = { "author ", "committer ", NULL };
-		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
+		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.6.ga6a61a26c1.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 4/4] cat-file: add mailmap support
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
                     ` (2 preceding siblings ...)
  2022-07-07 16:15   ` [PATCH v2 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
@ 2022-07-07 16:15   ` Siddharth Asthana
  2022-07-07 21:55     ` Junio C Hamano
  2022-07-08 11:53     ` Johannes Schindelin
  2022-07-07 22:06   ` [PATCH v2 0/4] Add support for mailmap in cat-file Junio C Hamano
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
  5 siblings, 2 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-07 16:15 UTC (permalink / raw)
  To: git; +Cc: Siddharth Asthana, Christian Couder, John Cai, Phillip Wood

git-cat-file is used by tools like GitLab to get commit tag contents
that are then displayed to users. This content which has author,
committer or tagger information, could benefit from passing through the
mailmap mechanism before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 Documentation/git-cat-file.txt |  6 ++++
 builtin/cat-file.c             | 31 ++++++++++++++++++-
 t/t4203-mailmap.sh             | 54 ++++++++++++++++++++++++++++++++++
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..1880e9bba1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -63,6 +63,12 @@ OPTIONS
 	or to ask for a "blob" with `<object>` being a tag object that
 	points at it.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+       Use mailmap file to map author, committer and tagger names
+       and email addresses to canonical real names and email addresses.
+       See linkgit:git-shortlog[1].
+
 --textconv::
 	Show the content as transformed by a textconv filter. In this case,
 	`<object>` has to be of the form `<tree-ish>:<path>`, or `:<path>` in
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..6dc750a367 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "mailmap.h"
 
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
@@ -36,6 +37,19 @@ struct batch_options {
 
 static const char *force_path;
 
+static struct string_list mailmap = STRING_LIST_INIT_NODUP;
+static int use_mailmap;
+
+char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
+	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+	apply_mailmap_to_header(&sb, headers, &mailmap);
+	*size = sb.len;
+	return strbuf_detach(&sb, NULL);
+}
+
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
 			 char **buf, unsigned long *size)
@@ -152,6 +166,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
+		if (use_mailmap)
+			buf = replace_idents_using_mailmap(buf, &size);
+
 		/* otherwise just spit out the data */
 		break;
 
@@ -183,6 +200,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);
+
+		if (use_mailmap)
+			buf = replace_idents_using_mailmap(buf, &size);
 		break;
 	}
 	default:
@@ -348,11 +368,15 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		void *contents;
 
 		contents = read_object_file(oid, &type, &size);
+
+		if (use_mailmap)
+			contents = replace_idents_using_mailmap(contents, &size);
+
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
+		if (data->info.sizep && size != data->size && !use_mailmap)
 			die("object %s changed size!?", oid_to_hex(oid));
 
 		batch_write(opt, contents, size);
@@ -843,6 +867,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		/* Batch mode */
 		OPT_GROUP(N_("Batch objects requested on stdin (or --batch-all-objects)")),
 		OPT_CALLBACK_F(0, "batch", &batch, N_("format"),
@@ -885,6 +911,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	opt_cw = (opt == 'c' || opt == 'w');
 	opt_epts = (opt == 'e' || opt == 'p' || opt == 't' || opt == 's');
 
+	if (use_mailmap)
+		read_mailmap(&mailmap);
+
 	/* --batch-all-objects? */
 	if (opt == 'b')
 		batch.all_objects = 1;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 0b2d21ec55..c60a90615c 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -963,4 +963,58 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author Orig <orig@example.com>
+	EOF
+	git cat-file --no-use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--use-mailmap enables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author A U Thor <author@example.com>
+	EOF
+	git cat-file --use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger C O Mitter <committer@example.com>
+	EOF
+	git tag -a -m "annotated tag" v1 &&
+	git cat-file --no-mailmap -p v1 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger Orig <orig@example.com>
+	EOF
+	git tag -a -m "annotated tag" v2 &&
+	git cat-file --mailmap -p v2 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.37.0.6.ga6a61a26c1.dirty


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/4] revision: improve commit_rewrite_person()
  2022-07-07 16:15   ` [PATCH v2 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-07 21:52     ` Junio C Hamano
  2022-07-08 14:50     ` Đoàn Trần Công Danh
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-07 21:52 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> @@ -3835,8 +3859,8 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  		if (!buf.len)
>  			strbuf_addstr(&buf, message);
>  
> -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
> -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
> +		const char *commit_headers[] = { "author ", "committer ", NULL };

This is decl-after-statement our codebase avoids.

> +		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
>  	}
>  
>  	/* Append "fake" message parts as needed */

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/4] cat-file: add mailmap support
  2022-07-07 16:15   ` [PATCH v2 4/4] cat-file: add mailmap support Siddharth Asthana
@ 2022-07-07 21:55     ` Junio C Hamano
  2022-07-08 11:53     ` Johannes Schindelin
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-07 21:55 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai, Phillip Wood

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> +char *replace_idents_using_mailmap(char *object_buf, size_t *size)

Does this function need to be extern?  If nobody other than callers
in cat-file would call it, perhaps it should be file-scope static.

> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	strbuf_attach(&sb, object_buf, *size, *size + 1);
> +	const char *headers[] = { "author ", "committer ", "tagger ", NULL };

This is decl-after-statement.

> +	apply_mailmap_to_header(&sb, headers, &mailmap);
> +	*size = sb.len;
> +	return strbuf_detach(&sb, NULL);
> +}


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/4] Add support for mailmap in cat-file
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
                     ` (3 preceding siblings ...)
  2022-07-07 16:15   ` [PATCH v2 4/4] cat-file: add mailmap support Siddharth Asthana
@ 2022-07-07 22:06   ` Junio C Hamano
  2022-07-07 22:58     ` Junio C Hamano
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
  5 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2022-07-07 22:06 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood, avarab, congdanhqx, christian.couder,
	johncai86

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> Changes in v2:
> - The commit_rewrite_person() has been improved by restricting it to
>   traverse only the header part of the object buffers.
> - The callers of commit_rewrite_person() now don't require to call it
>   multiple times for different headers. They can pass an array of
>   headers and commit_rewrite_person() replaces idents only on those
>   headers.
> - commit_rewrite_person() has been renamed to a suitable name which
>   expresses its functionality clearly.
> - More tests have been added to test the --[no-]-use-mailmap option for
>   the tag objects.
> - Redundant operations from the tests have been removed.

I agree with the general direction and the implementation strategy.
I've noticed a few decl-after-statement and also at least one public
helper function that does not need to be public.  Are you building
with "make DEVELOPER=YesPlease"?  It enables -pedantic and -Werror,
-Wdeclaration-after-statement, among other options (see the
config.mak.dev file for the complete list) to help you catch these
locally before sendign your patches to the list.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/4] Add support for mailmap in cat-file
  2022-07-07 22:06   ` [PATCH v2 0/4] Add support for mailmap in cat-file Junio C Hamano
@ 2022-07-07 22:58     ` Junio C Hamano
  0 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-07 22:58 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood, avarab, congdanhqx, christian.couder,
	johncai86

Junio C Hamano <gitster@pobox.com> writes:

> Siddharth Asthana <siddharthasthana31@gmail.com> writes:
>
>> Changes in v2:
>> - The commit_rewrite_person() has been improved by restricting it to
>>   traverse only the header part of the object buffers.
>> - The callers of commit_rewrite_person() now don't require to call it
>>   multiple times for different headers. They can pass an array of
>>   headers and commit_rewrite_person() replaces idents only on those
>>   headers.
>> - commit_rewrite_person() has been renamed to a suitable name which
>>   expresses its functionality clearly.
>> - More tests have been added to test the --[no-]-use-mailmap option for
>>   the tag objects.
>> - Redundant operations from the tests have been removed.
>
> I agree with the general direction and the implementation strategy.
> I've noticed a few decl-after-statement and also at least one public
> helper function that does not need to be public.  Are you building
> with "make DEVELOPER=YesPlease"?  It enables -pedantic and -Werror,
> -Wdeclaration-after-statement, among other options (see the
> config.mak.dev file for the complete list) to help you catch these
> locally before sendign your patches to the list.

Here is what I prepared on top of your series to make them compile
while queuing them on a topic branch.

 builtin/cat-file.c | 7 +++++--
 revision.c         | 4 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 6dc750a367..4ca024a018 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -40,11 +40,14 @@ static const char *force_path;
 static struct string_list mailmap = STRING_LIST_INIT_NODUP;
 static int use_mailmap;
 
-char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+static char *replace_idents_using_mailmap(char *, size_t *);
+
+static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
 {
 	struct strbuf sb = STRBUF_INIT;
-	strbuf_attach(&sb, object_buf, *size, *size + 1);
 	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
 	apply_mailmap_to_header(&sb, headers, &mailmap);
 	*size = sb.len;
 	return strbuf_detach(&sb, NULL);
diff --git a/revision.c b/revision.c
index b561d6b5b5..767c6225df 100644
--- a/revision.c
+++ b/revision.c
@@ -3787,10 +3787,10 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
-
-		const char *commit_headers[] = { "author ", "committer ", NULL };
 		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
-- 
2.37.0-211-gafcdf5f063


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/4] cat-file: add mailmap support
  2022-07-07 16:15   ` [PATCH v2 4/4] cat-file: add mailmap support Siddharth Asthana
  2022-07-07 21:55     ` Junio C Hamano
@ 2022-07-08 11:53     ` Johannes Schindelin
  1 sibling, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2022-07-08 11:53 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai, Phillip Wood

Hi Siddarth,

On Thu, 7 Jul 2022, Siddharth Asthana wrote:

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 50cf38999d..6dc750a367 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -36,6 +37,19 @@ struct batch_options {
>
>  static const char *force_path;
>
> +static struct string_list mailmap = STRING_LIST_INIT_NODUP;
> +static int use_mailmap;
> +
> +char *replace_idents_using_mailmap(char *object_buf, size_t *size)

Here, we declare the `size` parameter as a pointer to a `size_t`.

> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	strbuf_attach(&sb, object_buf, *size, *size + 1);
> +	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
> +	apply_mailmap_to_header(&sb, headers, &mailmap);
> +	*size = sb.len;
> +	return strbuf_detach(&sb, NULL);
> +}
> +
>  static int filter_object(const char *path, unsigned mode,
>  			 const struct object_id *oid,
>  			 char **buf, unsigned long *size)
> @@ -152,6 +166,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  		if (!buf)
>  			die("Cannot read object %s", obj_name);
>
> +		if (use_mailmap)
> +			buf = replace_idents_using_mailmap(buf, &size);

But here, we are once more bitten by Git's usage of last century's data
types: the `size` variable is of type `unsigned long`.

Now, you are probably developing this patch on 64-bit Linux or macOS,
where it just so happens that `size_t` is idempotent to `unsigned long`.

But that is not the case on 32-bit Linux nor on Windows, and therefore the
build fails with this patch. I need this to get the build to pass:

-- snipsnap --
From 237c783705b30ed4bcce81aeb860dc7e152fc8bf Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Fri, 8 Jul 2022 13:47:52 +0200
Subject: [PATCH] fixup??? cat-file: add mailmap support

This is needed whenever `unsigned long` is different from `size_t`, e.g.
on 32-bit Linux and on Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/cat-file.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index ac852087a74..baa6aca53ce 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -185,8 +185,13 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);

-		if (use_mailmap)
-			buf = replace_idents_using_mailmap(buf, &size);
+		if (use_mailmap) {
+			size_t s;
+
+			buf = replace_idents_using_mailmap(buf, &s);
+
+			size = cast_size_t_to_ulong(s);
+		}

 		/* otherwise just spit out the data */
 		break;
@@ -222,8 +227,13 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);

-		if (use_mailmap)
-			buf = replace_idents_using_mailmap(buf, &size);
+		if (use_mailmap) {
+			size_t s;
+
+			buf = replace_idents_using_mailmap(buf, &s);
+
+			size = cast_size_t_to_ulong(s);
+		}
 		break;
 	}
 	default:
@@ -392,8 +402,13 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d

 		contents = read_object_file(oid, &type, &size);

-		if (use_mailmap)
-			contents = replace_idents_using_mailmap(contents, &size);
+		if (use_mailmap) {
+			size_t s;
+
+			contents = replace_idents_using_mailmap(contents, &s);
+
+			size = cast_size_t_to_ulong(s);
+		}

 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
--
2.37.0.windows.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/4] revision: improve commit_rewrite_person()
  2022-07-07 16:15   ` [PATCH v2 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-07 21:52     ` Junio C Hamano
@ 2022-07-08 14:50     ` Đoàn Trần Công Danh
       [not found]       ` <CAP8UFD116xMnp27pxW8WNDf6PRJxnnwWtcy2TNHU_KyV2ZVA1g@mail.gmail.com>
  1 sibling, 1 reply; 68+ messages in thread
From: Đoàn Trần Công Danh @ 2022-07-08 14:50 UTC (permalink / raw)
  To: Siddharth Asthana; +Cc: git, Christian Couder, John Cai

[-- Attachment #1: Type: text/plain, Size: 5233 bytes --]

On 2022-07-07 21:45:51+0530, Siddharth Asthana <siddharthasthana31@gmail.com> wrote:
> The function, commit_rewrite_person(), is designed to find and replace
> an ident string in the header part, and the way it avoids a random
> occuranace of "author A U Thor <author@example.com" in the text is by

s/occuranace/occurrence/

> insisting "author" to appear at the beginning of line by passing
> "\nauthor " as "what".
> 
> The implementation also doesn't make any effort to limit itself to the
> commit header by locating the blank line that appears after the header
> part and stopping the search there. Also, the interface forces the
> caller to make multiple calls if it wants to rewrite idents on multiple
> headers. It shouldn't be the case.
> 
> To support the existing caller better, update commit_rewrite_person()
> to:
> - Make a single pass in the input buffer to locate headers named
>   "author" and "committer" and replace idents on them.
> - Stop at the end of the header, ensuring that nothing in the body of
>   the commit object is modified.
> 
> The return type of the function commit_rewrite_person() has also been
> changed from int to void. This has been done because the caller of the
> function doesn't do anything with the return value of the function.
> 
> By simplyfying the interface of the commit_rewrite_person(), we also

s/simplyfying/simplifying/

> intend to expose it as a public function. We will also be renaming the
> function in a future commit to a different name which clearly tells that
> the function replaces idents in the header of the commit buffer.
> 
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: John Cai <johncai86@gmail.com>
> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
> ---
>  revision.c | 44 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 34 insertions(+), 10 deletions(-)
> 
> diff --git a/revision.c b/revision.c
> index 211352795c..83e68c1f97 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3755,19 +3755,17 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
>  	return 0;
>  }
>  
> -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> +/*
> + * Returns the difference between the new and old length of the ident line.
> + */
> +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
>  {
> -	char *person, *endp;
> +	char *endp;
>  	size_t len, namelen, maillen;
>  	const char *name;
>  	const char *mail;
>  	struct ident_split ident;
>  
> -	person = strstr(buf->buf, what);
> -	if (!person)
> -		return 0;
> -
> -	person += strlen(what);
>  	endp = strchr(person, '\n');
>  	if (!endp)
>  		return 0;
> @@ -3784,6 +3782,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
>  
>  	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
>  		struct strbuf namemail = STRBUF_INIT;
> +		size_t newlen;
>  
>  		strbuf_addf(&namemail, "%.*s <%.*s>",
>  			    (int)namelen, name, (int)maillen, mail);
> @@ -3792,14 +3791,39 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
>  			      ident.mail_end - ident.name_begin + 1,
>  			      namemail.buf, namemail.len);
>  
> +		newlen = namemail.len;
> +
>  		strbuf_release(&namemail);
>  
> -		return 1;
> +		return newlen - (ident.mail_end - ident.name_begin + 1);
>  	}
>  
>  	return 0;
>  }
>  
> +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
> +{
> +	size_t buf_offset = 0;
> +
> +	if (!mailmap)
> +		return;
> +
> +	for (;;) {
> +		const char *person, *line = buf->buf + buf_offset;
> +		int i, linelen = strchrnul(line, '\n') - line + 1;

Would you mind to change those lines to avoid mixed of declaration and
expression. Also, I think i and linelen should be ssize_t instead.
Something like:

		const char *person, *line;
		ssize_t i, linelen;

		line = buf->buf + buf_offset;
		linelen = strchrnul(line, '\n') - line + 1;

> +
> +		if (!linelen || linelen == 1)
> +			/* End of header */
> +			return;

And I think linelen will never be 0 or negative,
even if linelen could be 0, I think we want "linelen != 0"
for integer comparision.

> +
> +		buf_offset += linelen;
> +
> +		for (i = 0; headers[i]; i++)
> +			if (skip_prefix(line, headers[i], &person))
> +				buf_offset += rewrite_ident_line(person, buf, mailmap);
> +	}
> +}
> +
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;
> @@ -3835,8 +3859,8 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  		if (!buf.len)
>  			strbuf_addstr(&buf, message);
>  
> -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
> -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
> +		const char *commit_headers[] = { "author ", "committer ", NULL };
> +		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
>  	}
>  
>  	/* Append "fake" message parts as needed */
> -- 
> 2.37.0.6.ga6a61a26c1.dirty
> 

-- 
Danh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/4] revision: improve commit_rewrite_person()
       [not found]       ` <CAP8UFD116xMnp27pxW8WNDf6PRJxnnwWtcy2TNHU_KyV2ZVA1g@mail.gmail.com>
@ 2022-07-09  1:02         ` Đoàn Trần Công Danh
  2022-07-09  5:04           ` Christian Couder
  0 siblings, 1 reply; 68+ messages in thread
From: Đoàn Trần Công Danh @ 2022-07-09  1:02 UTC (permalink / raw)
  To: Christian Couder, siddharthasthana31; +Cc: siddharthasthana31, git

Add list back to cc

On 2022-07-08 23:23:07+0200, Christian Couder <christian.couder@gmail.com> wrote:
> On Fri, Jul 8, 2022 at 4:50 PM Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> >
> > On 2022-07-07 21:45:51+0530, Siddharth Asthana <siddharthasthana31@gmail.com> wrote:
>
> > > By simplyfying the interface of the commit_rewrite_person(), we also
> >
> > s/simplyfying/simplifying/
>
> Thanks for noticing the typos!
>
> > > +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
> > > +{
> > > +     size_t buf_offset = 0;
> > > +
> > > +     if (!mailmap)
> > > +             return;
> > > +
> > > +     for (;;) {
> > > +             const char *person, *line = buf->buf + buf_offset;
> > > +             int i, linelen = strchrnul(line, '\n') - line + 1;
> >
> > Would you mind to change those lines to avoid mixed of declaration and
> > expression.
>
> I am not sure we have some clear guidelines on this.

Yes, we don't have a clear guidelines on this, but this would sure
matches into mixed declaration and expression. And some variables are
initialized and some aren't in the same line. I was confused in my
first glance.

>
> > Also, I think i and linelen should be ssize_t instead.
>
> Could you explain why?
>
> I think 'i' is changed only in:
>
> for (i = 0; headers[i]; i++)
>
> and therefore cannot be négative.
>
> While linelen is set only in:
>
> linelen = strchrnul(line, '\n') - line + 1;
>
> and therefore cannot be négative either.


Yes, both of them can't be negative. As I explained in the part you
removed.  However, I choose ssize_t in my reply because it's
a ptrdiff_t.

So, size_t is an obviously a better choice.
Either size_t and ssize_t could be used in this case, but not int.


>>> +
>>> +             if (!linelen || linelen == 1)
>>> +                     /* End of header */
>>> +                     return;
>>
>>And I think linelen will never be 0 or negative,
>>even if linelen could be 0, I think we want "linelen != 0"
>>for integer comparision.


-- 
Danh

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/4] revision: improve commit_rewrite_person()
  2022-07-09  1:02         ` Đoàn Trần Công Danh
@ 2022-07-09  5:04           ` Christian Couder
  0 siblings, 0 replies; 68+ messages in thread
From: Christian Couder @ 2022-07-09  5:04 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: Siddharth Asthana, git

On Sat, Jul 9, 2022 at 3:02 AM Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
>
> Add list back to cc

Sorry for not keeping the list in Cc by mistake.

> On 2022-07-08 23:23:07+0200, Christian Couder <christian.couder@gmail.com> wrote:
> > On Fri, Jul 8, 2022 at 4:50 PM Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> > >
> > > On 2022-07-07 21:45:51+0530, Siddharth Asthana <siddharthasthana31@gmail.com> wrote:
> >
> > > > By simplyfying the interface of the commit_rewrite_person(), we also
> > >
> > > s/simplyfying/simplifying/
> >
> > Thanks for noticing the typos!
> >
> > > > +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
> > > > +{
> > > > +     size_t buf_offset = 0;
> > > > +
> > > > +     if (!mailmap)
> > > > +             return;
> > > > +
> > > > +     for (;;) {
> > > > +             const char *person, *line = buf->buf + buf_offset;
> > > > +             int i, linelen = strchrnul(line, '\n') - line + 1;
> > >
> > > Would you mind to change those lines to avoid mixed of declaration and
> > > expression.
> >
> > I am not sure we have some clear guidelines on this.
>
> Yes, we don't have a clear guidelines on this, but this would sure
> matches into mixed declaration and expression. And some variables are
> initialized and some aren't in the same line. I was confused in my
> first glance.

Yeah, it might be clearer to avoid having some variables both declared
and initialized while others are only declared on the same line.

> > > Also, I think i and linelen should be ssize_t instead.
> >
> > Could you explain why?
> >
> > I think 'i' is changed only in:
> >
> > for (i = 0; headers[i]; i++)
> >
> > and therefore cannot be négative.
> >
> > While linelen is set only in:
> >
> > linelen = strchrnul(line, '\n') - line + 1;
> >
> > and therefore cannot be négative either.
>
> Yes, both of them can't be negative. As I explained in the part you
> removed.  However, I choose ssize_t in my reply because it's
> a ptrdiff_t.
>
> So, size_t is an obviously a better choice.
> Either size_t and ssize_t could be used in this case, but not int.

I am Ok with size_t. Thanks for the explanations!

> >>> +
> >>> +             if (!linelen || linelen == 1)
> >>> +                     /* End of header */
> >>> +                     return;
> >>
> >>And I think linelen will never be 0 or negative,
> >>even if linelen could be 0, I think we want "linelen != 0"
> >>for integer comparision.

maybe `if (linelen <=1)` is just a simpler check.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 0/4] Add support for mailmap in cat-file
  2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
                     ` (4 preceding siblings ...)
  2022-07-07 22:06   ` [PATCH v2 0/4] Add support for mailmap in cat-file Junio C Hamano
@ 2022-07-09 15:41   ` Siddharth Asthana
  2022-07-09 15:41     ` [PATCH v3 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
                       ` (5 more replies)
  5 siblings, 6 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-09 15:41 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

Thanks a lot for the review and suggestions Junio, Danh and Johannes.
Really grateful for that :)

= Description

This patch series adds mailmap support to the git-cat-file command. It
adds the mailmap support only for the commit and tag objects by
replacing the idents for "author", "committer" and "tagger" headers. The
mailmap only takes effect when --[no-]-use-mailmap or --[no-]-mailmap
option is passed to the git cat-file command. The changes will work with
the batch mode as well.

So, if one wants to enable mailmap they can use either of the following
commands:
$ git cat-file --use-mailmap -p <object>
$ git cat-file --use-mailmap <type> <object>

To use it in the batch mode, one can use the following command:
$ git cat-file --use-mailmap --batch

= Patch Organization

- The first patch improves the commit_rewrite_person() by restricting it 
  to traverse only through the header part of the commit object buffer.
  It also adds an argument called headers which the callers can pass. 
  The function will replace idents only on these  passed headers. 
  Thus, the caller won't have to make repeated calls to the function.
- The second patch moves commit_rewrite_person() to ident.c to expose it
  as a public function so that it can be used to replace idents in the
  headers of desired objects.
- The third patch renames commit_rewrite_person() to a name which
  describes its functionality clearly. It is renamed to
  apply_mailmap_to_header().
- The last patch adds mailmap support to the git cat-file command. It
  adds the required documentation and tests as well.

Changes in v3:
- The decl-after-statement warnings have been fixed in all the patches.
- In commit_rewrite_person(), the data type of linelen and i variables
  have been changed from int to size_t.
- The return type of replace_idents_using_mailmap() function, size_t,
  has been explicitly typecasted to unsigned long using the
  cast_size_t_to_ulong() helper method.

Siddharth Asthana (4):
  revision: improve commit_rewrite_person()
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++
 builtin/cat-file.c             | 43 +++++++++++++++++++-
 cache.h                        |  6 +++
 ident.c                        | 72 ++++++++++++++++++++++++++++++++++
 revision.c                     | 50 ++---------------------
 t/t4203-mailmap.sh             | 54 +++++++++++++++++++++++++
 6 files changed, 183 insertions(+), 48 deletions(-)

Range-diff against v2:
1:  64e1f750e1 ! 1:  9e95326c58 revision: improve commit_rewrite_person()
    @@ Commit message
     
         The function, commit_rewrite_person(), is designed to find and replace
         an ident string in the header part, and the way it avoids a random
    -    occuranace of "author A U Thor <author@example.com" in the text is by
    +    occurrence of "author A U Thor <author@example.com" in the text is by
         insisting "author" to appear at the beginning of line by passing
         "\nauthor " as "what".
     
    @@ Commit message
         changed from int to void. This has been done because the caller of the
         function doesn't do anything with the return value of the function.
     
    -    By simplyfying the interface of the commit_rewrite_person(), we also
    +    By simplifying the interface of the commit_rewrite_person(), we also
         intend to expose it as a public function. We will also be renaming the
         function in a future commit to a different name which clearly tells that
         the function replaces idents in the header of the commit buffer.
     
         Mentored-by: Christian Couder <christian.couder@gmail.com>
         Mentored-by: John Cai <johncai86@gmail.com>
    +    Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## revision.c ##
    @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *wha
     +		return;
     +
     +	for (;;) {
    -+		const char *person, *line = buf->buf + buf_offset;
    -+		int i, linelen = strchrnul(line, '\n') - line + 1;
    ++		const char *person, *line;
    ++		size_t i, linelen;
     +
    -+		if (!linelen || linelen == 1)
    ++		line = buf->buf + buf_offset;
    ++		linelen = strchrnul(line, '\n') - line + 1;
    ++
    ++		if (linelen <= 1)
     +			/* End of header */
     +			return;
     +
    @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *wha
      {
      	int retval;
     @@ revision.c: static int commit_match(struct commit *commit, struct rev_info *opt)
    + 		strbuf_addstr(&buf, message);
    + 
    + 	if (opt->grep_filter.header_list && opt->mailmap) {
    ++		const char *commit_headers[] = { "author ", "committer ", NULL };
    ++
      		if (!buf.len)
      			strbuf_addstr(&buf, message);
      
     -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
     -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
    -+		const char *commit_headers[] = { "author ", "committer ", NULL };
     +		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
      	}
      
2:  b18ced0ece ! 2:  d9395cb8b2 ident: move commit_rewrite_person() to ident.c
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +		return;
     +
     +	for (;;) {
    -+		const char *person, *line = buf->buf + buf_offset;
    -+		int i, linelen = strchrnul(line, '\n') - line + 1;
    ++		const char *person, *line;
    ++		size_t i, linelen;
     +
    -+		if (!linelen || linelen == 1)
    ++		line = buf->buf + buf_offset;
    ++		linelen = strchrnul(line, '\n') - line + 1;
    ++
    ++		if (linelen <= 1)
     +			/* End of header */
     +			return;
     +
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -		return;
     -
     -	for (;;) {
    --		const char *person, *line = buf->buf + buf_offset;
    --		int i, linelen = strchrnul(line, '\n') - line + 1;
    +-		const char *person, *line;
    +-		size_t i, linelen;
    +-
    +-		line = buf->buf + buf_offset;
    +-		linelen = strchrnul(line, '\n') - line + 1;
     -
    --		if (!linelen || linelen == 1)
    +-		if (linelen <= 1)
     -			/* End of header */
     -			return;
     -
3:  2494ce1ed2 ! 3:  355bbda25e ident: rename commit_rewrite_person() to apply_mailmap_to_header()
    @@ ident.c: static ssize_t rewrite_ident_line(const char* person, struct strbuf *bu
     
      ## revision.c ##
     @@ revision.c: static int commit_match(struct commit *commit, struct rev_info *opt)
    + 		if (!buf.len)
      			strbuf_addstr(&buf, message);
      
    - 		const char *commit_headers[] = { "author ", "committer ", NULL };
     -		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
     +		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
      	}
4:  94838a2566 ! 4:  69b7ad898b cat-file: add mailmap support
    @@ Commit message
         Mentored-by: Christian Couder <christian.couder@gmail.com>
         Mentored-by: John Cai <johncai86@gmail.com>
         Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
    +    Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## Documentation/git-cat-file.txt ##
    @@ builtin/cat-file.c: struct batch_options {
     +static struct string_list mailmap = STRING_LIST_INIT_NODUP;
     +static int use_mailmap;
     +
    -+char *replace_idents_using_mailmap(char *object_buf, size_t *size)
    ++static char *replace_idents_using_mailmap(char *, size_t *);
    ++
    ++static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
     +{
     +	struct strbuf sb = STRBUF_INIT;
    -+	strbuf_attach(&sb, object_buf, *size, *size + 1);
     +	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
    ++
    ++	strbuf_attach(&sb, object_buf, *size, *size + 1);
     +	apply_mailmap_to_header(&sb, headers, &mailmap);
     +	*size = sb.len;
     +	return strbuf_detach(&sb, NULL);
    @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
      		if (!buf)
      			die("Cannot read object %s", obj_name);
      
    -+		if (use_mailmap)
    -+			buf = replace_idents_using_mailmap(buf, &size);
    ++		if (use_mailmap) {
    ++			size_t s = size;
    ++			buf = replace_idents_using_mailmap(buf, &s);
    ++			size = cast_size_t_to_ulong(s);
    ++		}
     +
      		/* otherwise just spit out the data */
      		break;
    @@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const
      		buf = read_object_with_reference(the_repository, &oid,
      						 exp_type_id, &size, NULL);
     +
    -+		if (use_mailmap)
    -+			buf = replace_idents_using_mailmap(buf, &size);
    ++		if (use_mailmap) {
    ++			size_t s = size;
    ++			buf = replace_idents_using_mailmap(buf, &s);
    ++			size = cast_size_t_to_ulong(s);
    ++		}
      		break;
      	}
      	default:
    @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
      
      		contents = read_object_file(oid, &type, &size);
     +
    -+		if (use_mailmap)
    -+			contents = replace_idents_using_mailmap(contents, &size);
    ++		if (use_mailmap) {
    ++			size_t s = size;
    ++			contents = replace_idents_using_mailmap(contents, &s);
    ++			size = cast_size_t_to_ulong(s);
    ++		}
     +
      		if (!contents)
      			die("object %s disappeared", oid_to_hex(oid));
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 1/4] revision: improve commit_rewrite_person()
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
@ 2022-07-09 15:41     ` Siddharth Asthana
  2022-07-12 16:29       ` Johannes Schindelin
  2022-07-09 15:41     ` [PATCH v3 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-09 15:41 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 revision.c | 48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 211352795c..1939c56c67 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,19 +3755,17 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
 {
-	char *person, *endp;
+	char *endp;
 	size_t len, namelen, maillen;
 	const char *name;
 	const char *mail;
 	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
 	endp = strchr(person, '\n');
 	if (!endp)
 		return 0;
@@ -3784,6 +3782,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
 
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
@@ -3792,14 +3791,42 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 			      ident.mail_end - ident.name_begin + 1,
 			      namemail.buf, namemail.len);
 
+		newlen = namemail.len;
+
 		strbuf_release(&namemail);
 
-		return 1;
+		return newlen - (ident.mail_end - ident.name_begin + 1);
 	}
 
 	return 0;
 }
 
+static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i, linelen;
+
+		line = buf->buf + buf_offset;
+		linelen = strchrnul(line, '\n') - line + 1;
+
+		if (linelen <= 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
+
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
@@ -3832,11 +3859,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 2/4] ident: move commit_rewrite_person() to ident.c
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
  2022-07-09 15:41     ` [PATCH v3 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-09 15:41     ` Siddharth Asthana
  2022-07-09 15:41     ` [PATCH v3 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-09 15:41 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    |  6 +++++
 ident.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.c | 72 ------------------------------------------------------
 3 files changed, 78 insertions(+), 72 deletions(-)

diff --git a/cache.h b/cache.h
index ac5ab4ef9d..c9dbe1c29a 100644
--- a/cache.h
+++ b/cache.h
@@ -1688,6 +1688,12 @@ struct ident_split {
  */
 int split_ident_line(struct ident_split *, const char *, int);
 
+/*
+ * Given a commit object buffer and the commit headers, replaces the idents
+ * in the headers with their canonical versions using the mailmap mechanism.
+ */
+void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+
 /*
  * Compare split idents for equality or strict ordering. Note that we
  * compare only the ident part of the line, ignoring any timestamp.
diff --git a/ident.c b/ident.c
index 89ca5b4700..9f4f6e9071 100644
--- a/ident.c
+++ b/ident.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "config.h"
 #include "date.h"
+#include "mailmap.h"
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
@@ -346,6 +347,77 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
+{
+	char *endp;
+	size_t len, namelen, maillen;
+	const char *name;
+	const char *mail;
+	struct ident_split ident;
+
+	endp = strchr(person, '\n');
+	if (!endp)
+		return 0;
+
+	len = endp - person;
+
+	if (split_ident_line(&ident, person, len))
+		return 0;
+
+	mail = ident.mail_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name_begin;
+	namelen = ident.name_end - ident.name_begin;
+
+	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
+		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
+
+		strbuf_addf(&namemail, "%.*s <%.*s>",
+			    (int)namelen, name, (int)maillen, mail);
+
+		strbuf_splice(buf, ident.name_begin - buf->buf,
+			      ident.mail_end - ident.name_begin + 1,
+			      namemail.buf, namemail.len);
+
+		newlen = namemail.len;
+
+		strbuf_release(&namemail);
+
+		return newlen - (ident.mail_end - ident.name_begin + 1);
+	}
+
+	return 0;
+}
+
+void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i, linelen;
+
+		line = buf->buf + buf_offset;
+		linelen = strchrnul(line, '\n') - line + 1;
+
+		if (linelen <= 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
 
 static void ident_env_hint(enum want_ident whose_ident)
 {
diff --git a/revision.c b/revision.c
index 1939c56c67..14dca903b6 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,78 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-/*
- * Returns the difference between the new and old length of the ident line.
- */
-static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
-{
-	char *endp;
-	size_t len, namelen, maillen;
-	const char *name;
-	const char *mail;
-	struct ident_split ident;
-
-	endp = strchr(person, '\n');
-	if (!endp)
-		return 0;
-
-	len = endp - person;
-
-	if (split_ident_line(&ident, person, len))
-		return 0;
-
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-
-	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
-		struct strbuf namemail = STRBUF_INIT;
-		size_t newlen;
-
-		strbuf_addf(&namemail, "%.*s <%.*s>",
-			    (int)namelen, name, (int)maillen, mail);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      namemail.buf, namemail.len);
-
-		newlen = namemail.len;
-
-		strbuf_release(&namemail);
-
-		return newlen - (ident.mail_end - ident.name_begin + 1);
-	}
-
-	return 0;
-}
-
-static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
-{
-	size_t buf_offset = 0;
-
-	if (!mailmap)
-		return;
-
-	for (;;) {
-		const char *person, *line;
-		size_t i, linelen;
-
-		line = buf->buf + buf_offset;
-		linelen = strchrnul(line, '\n') - line + 1;
-
-		if (linelen <= 1)
-			/* End of header */
-			return;
-
-		buf_offset += linelen;
-
-		for (i = 0; headers[i]; i++)
-			if (skip_prefix(line, headers[i], &person))
-				buf_offset += rewrite_ident_line(person, buf, mailmap);
-	}
-}
-
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
  2022-07-09 15:41     ` [PATCH v3 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-09 15:41     ` [PATCH v3 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-07-09 15:41     ` Siddharth Asthana
  2022-07-09 15:41     ` [PATCH v3 4/4] cat-file: add mailmap support Siddharth Asthana
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-09 15:41 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    | 6 +++---
 ident.c    | 2 +-
 revision.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index c9dbe1c29a..9edb7fefd3 100644
--- a/cache.h
+++ b/cache.h
@@ -1689,10 +1689,10 @@ struct ident_split {
 int split_ident_line(struct ident_split *, const char *, int);
 
 /*
- * Given a commit object buffer and the commit headers, replaces the idents
- * in the headers with their canonical versions using the mailmap mechanism.
+ * Given a commit or tag object buffer and the commit or tag headers, replaces
+ * the idents in the headers with their canonical versions using the mailmap mechanism.
  */
-void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 9f4f6e9071..5f17bd607d 100644
--- a/ident.c
+++ b/ident.c
@@ -393,7 +393,7 @@ static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct
 	return 0;
 }
 
-void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap)
 {
 	size_t buf_offset = 0;
 
diff --git a/revision.c b/revision.c
index 14dca903b6..6ad3665204 100644
--- a/revision.c
+++ b/revision.c
@@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
+		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 4/4] cat-file: add mailmap support
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
                       ` (2 preceding siblings ...)
  2022-07-09 15:41     ` [PATCH v3 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
@ 2022-07-09 15:41     ` Siddharth Asthana
  2022-07-10  5:34     ` [PATCH v3 0/4] Add support for mailmap in cat-file Junio C Hamano
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
  5 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-09 15:41 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana, Phillip Wood

git-cat-file is used by tools like GitLab to get commit tag contents
that are then displayed to users. This content which has author,
committer or tagger information, could benefit from passing through the
mailmap mechanism before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 Documentation/git-cat-file.txt |  6 ++++
 builtin/cat-file.c             | 43 ++++++++++++++++++++++++++-
 t/t4203-mailmap.sh             | 54 ++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..1880e9bba1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -63,6 +63,12 @@ OPTIONS
 	or to ask for a "blob" with `<object>` being a tag object that
 	points at it.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+       Use mailmap file to map author, committer and tagger names
+       and email addresses to canonical real names and email addresses.
+       See linkgit:git-shortlog[1].
+
 --textconv::
 	Show the content as transformed by a textconv filter. In this case,
 	`<object>` has to be of the form `<tree-ish>:<path>`, or `:<path>` in
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..4b68216b51 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "mailmap.h"
 
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
@@ -36,6 +37,22 @@ struct batch_options {
 
 static const char *force_path;
 
+static struct string_list mailmap = STRING_LIST_INIT_NODUP;
+static int use_mailmap;
+
+static char *replace_idents_using_mailmap(char *, size_t *);
+
+static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
+	apply_mailmap_to_header(&sb, headers, &mailmap);
+	*size = sb.len;
+	return strbuf_detach(&sb, NULL);
+}
+
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
 			 char **buf, unsigned long *size)
@@ -152,6 +169,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		/* otherwise just spit out the data */
 		break;
 
@@ -183,6 +206,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);
+
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
 		break;
 	}
 	default:
@@ -348,11 +377,18 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		void *contents;
 
 		contents = read_object_file(oid, &type, &size);
+
+		if (use_mailmap) {
+			size_t s = size;
+			contents = replace_idents_using_mailmap(contents, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
+		if (data->info.sizep && size != data->size && !use_mailmap)
 			die("object %s changed size!?", oid_to_hex(oid));
 
 		batch_write(opt, contents, size);
@@ -843,6 +879,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		/* Batch mode */
 		OPT_GROUP(N_("Batch objects requested on stdin (or --batch-all-objects)")),
 		OPT_CALLBACK_F(0, "batch", &batch, N_("format"),
@@ -885,6 +923,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	opt_cw = (opt == 'c' || opt == 'w');
 	opt_epts = (opt == 'e' || opt == 'p' || opt == 't' || opt == 's');
 
+	if (use_mailmap)
+		read_mailmap(&mailmap);
+
 	/* --batch-all-objects? */
 	if (opt == 'b')
 		batch.all_objects = 1;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 0b2d21ec55..c60a90615c 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -963,4 +963,58 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author Orig <orig@example.com>
+	EOF
+	git cat-file --no-use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--use-mailmap enables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author A U Thor <author@example.com>
+	EOF
+	git cat-file --use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger C O Mitter <committer@example.com>
+	EOF
+	git tag -a -m "annotated tag" v1 &&
+	git cat-file --no-mailmap -p v1 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger Orig <orig@example.com>
+	EOF
+	git tag -a -m "annotated tag" v2 &&
+	git cat-file --mailmap -p v2 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/4] Add support for mailmap in cat-file
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
                       ` (3 preceding siblings ...)
  2022-07-09 15:41     ` [PATCH v3 4/4] cat-file: add mailmap support Siddharth Asthana
@ 2022-07-10  5:34     ` Junio C Hamano
  2022-07-12 12:34       ` Johannes Schindelin
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
  5 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2022-07-10  5:34 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, avarab,
	Johannes.Schindelin, johncai86

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> = Patch Organization
>
> - The first patch improves the commit_rewrite_person() by restricting it 
>   to traverse only through the header part of the commit object buffer.
>   It also adds an argument called headers which the callers can pass. 
>   The function will replace idents only on these  passed headers. 
>   Thus, the caller won't have to make repeated calls to the function.
> - The second patch moves commit_rewrite_person() to ident.c to expose it
>   as a public function so that it can be used to replace idents in the
>   headers of desired objects.
> - The third patch renames commit_rewrite_person() to a name which
>   describes its functionality clearly. It is renamed to
>   apply_mailmap_to_header().
> - The last patch adds mailmap support to the git cat-file command. It
>   adds the required documentation and tests as well.
>
> Changes in v3:
> - The decl-after-statement warnings have been fixed in all the patches.
> - In commit_rewrite_person(), the data type of linelen and i variables
>   have been changed from int to size_t.
> - The return type of replace_idents_using_mailmap() function, size_t,
>   has been explicitly typecasted to unsigned long using the
>   cast_size_t_to_ulong() helper method.

https://github.com/git/git/actions/runs/2642867380 seems to tell us
that tests added by this series are broken on Windows.  I am not
sure what exactly in this series depends on being LF-only system,
but the symptom makes me suspect that is the cause of the problem.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/4] Add support for mailmap in cat-file
  2022-07-10  5:34     ` [PATCH v3 0/4] Add support for mailmap in cat-file Junio C Hamano
@ 2022-07-12 12:34       ` Johannes Schindelin
  2022-07-12 14:16         ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2022-07-12 12:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Siddharth Asthana, git, phillip.wood123, congdanhqx,
	christian.couder, avarab, johncai86

Hi Junio & Siddarth,

On Sat, 9 Jul 2022, Junio C Hamano wrote:

> Siddharth Asthana <siddharthasthana31@gmail.com> writes:
>
> > = Patch Organization
> >
> > - The first patch improves the commit_rewrite_person() by restricting it
> >   to traverse only through the header part of the commit object buffer.
> >   It also adds an argument called headers which the callers can pass.
> >   The function will replace idents only on these  passed headers.
> >   Thus, the caller won't have to make repeated calls to the function.
> > - The second patch moves commit_rewrite_person() to ident.c to expose it
> >   as a public function so that it can be used to replace idents in the
> >   headers of desired objects.
> > - The third patch renames commit_rewrite_person() to a name which
> >   describes its functionality clearly. It is renamed to
> >   apply_mailmap_to_header().
> > - The last patch adds mailmap support to the git cat-file command. It
> >   adds the required documentation and tests as well.
> >
> > Changes in v3:
> > - The decl-after-statement warnings have been fixed in all the patches.
> > - In commit_rewrite_person(), the data type of linelen and i variables
> >   have been changed from int to size_t.
> > - The return type of replace_idents_using_mailmap() function, size_t,
> >   has been explicitly typecasted to unsigned long using the
> >   cast_size_t_to_ulong() helper method.
>
> https://github.com/git/git/actions/runs/2642867380 seems to tell us
> that tests added by this series are broken on Windows.  I am not
> sure what exactly in this series depends on being LF-only system,
> but the symptom makes me suspect that is the cause of the problem.

It has nothing to do with LF-only, but everything to do with symlinks (I
suspected that when I saw the skipped test cases at
https://github.com/git/git/runs/7292710632?check_suite_focus=true#step:5:195,
and could validate that suspicion via disabling the test cases by
inverting the prereq: this caused the very same symptoms even in a Linux
setup):

-- snipsnap --
From 5bc6d52c95401f60e67312823ed406bd5a3c1026 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Tue, 12 Jul 2022 14:28:05 +0200
Subject: [PATCH] fixup??? cat-file: add mailmap support

This patch introduced new test cases that rely on the side effects of
the earlier test case `set up symlink tests`. However, that test case is
guarded behind the `SYMLINKS` prereq, therefore it is not run e.g. on
Windows.

Let's fix that by removing the prereq from the `set up` test case, and
adjusting its title to reflect its broadened responsibility.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t4203-mailmap.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index c60a90615cc..94afd4717a2 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -932,7 +932,7 @@ test_expect_success 'find top-level mailmap from subdir' '
 	test_cmp expect actual
 '

-test_expect_success SYMLINKS 'set up symlink tests' '
+test_expect_success 'set up symlink/--use-mailmap tests' '
 	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
 	echo "New <new@example.com> <orig@example.com>" >map &&
 	rm -f .mailmap
--
2.37.0.rc2.windows.1.7.g45a475aeb84


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/4] Add support for mailmap in cat-file
  2022-07-12 12:34       ` Johannes Schindelin
@ 2022-07-12 14:16         ` Junio C Hamano
  2022-07-12 16:01           ` Siddharth Asthana
  2022-07-12 16:06           ` Junio C Hamano
  0 siblings, 2 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-12 14:16 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Siddharth Asthana, git, phillip.wood123, congdanhqx,
	christian.couder, avarab, johncai86

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> This patch introduced new test cases that rely on the side effects of
> the earlier test case `set up symlink tests`. However, that test case is
> guarded behind the `SYMLINKS` prereq, therefore it is not run e.g. on
> Windows.

Ah, that explains why it only fails there.

> Let's fix that by removing the prereq from the `set up` test case, and
> adjusting its title to reflect its broadened responsibility.
>
> -test_expect_success SYMLINKS 'set up symlink tests' '
> +test_expect_success 'set up symlink/--use-mailmap tests' '
>  	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
>  	echo "New <new@example.com> <orig@example.com>" >map &&
>  	rm -f .mailmap

OK, this sets up

 * one commit that can be used in a test, authored by "Orig" person;
 * the "map" file that maps the "Orig" person; and
 * ensures .mailmap is not there.

with the intention to make a symbolic link that points at the "map"
to use as the mailmap file in later tests.  This step does not require
symbolic links at all, but because the point of this set-up is to serve
the later tests, all requiring symbolic link support, it was OK to have
the prerequisite.

The cat-file tests does not have to use the "map" file to do its
thing at all.  In fact, these tests prepare their own .mailmap file
inside them.  But because it chose to run in the history prepared by
previous tests, it broke, because without SYMLINKS, the sought-for
commit does not get created.

Makes sense.  I would have retitled it to s/set up/prepare for/ but
that is minor.

Thanks.  Siddharth, please squash the fix in when rerolling.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/4] Add support for mailmap in cat-file
  2022-07-12 14:16         ` Junio C Hamano
@ 2022-07-12 16:01           ` Siddharth Asthana
  2022-07-12 16:06           ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-12 16:01 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: git, phillip.wood123, congdanhqx, christian.couder, avarab,
	johncai86



On 12/07/22 19:46, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
>> This patch introduced new test cases that rely on the side effects of
>> the earlier test case `set up symlink tests`. However, that test case is
>> guarded behind the `SYMLINKS` prereq, therefore it is not run e.g. on
>> Windows.
> 
> Ah, that explains why it only fails there.
> 
>> Let's fix that by removing the prereq from the `set up` test case, and
>> adjusting its title to reflect its broadened responsibility.
>>
>> -test_expect_success SYMLINKS 'set up symlink tests' '
>> +test_expect_success 'set up symlink/--use-mailmap tests' '
>>   	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
>>   	echo "New <new@example.com> <orig@example.com>" >map &&
>>   	rm -f .mailmap
> 
> OK, this sets up
> 
>   * one commit that can be used in a test, authored by "Orig" person;
>   * the "map" file that maps the "Orig" person; and
>   * ensures .mailmap is not there.
> 
> with the intention to make a symbolic link that points at the "map"
> to use as the mailmap file in later tests.  This step does not require
> symbolic links at all, but because the point of this set-up is to serve
> the later tests, all requiring symbolic link support, it was OK to have
> the prerequisite.
> 
> The cat-file tests does not have to use the "map" file to do its
> thing at all.  In fact, these tests prepare their own .mailmap file
> inside them.  But because it chose to run in the history prepared by
> previous tests, it broke, because without SYMLINKS, the sought-for
> commit does not get created.
> 
> Makes sense.  I would have retitled it to s/set up/prepare for/ but
> that is minor.
> 
> Thanks.  Siddharth, please squash the fix in when rerolling.
> 
Thanks a ton Johannes and Junio for helping me fix the test :D
Will squash the fix in v4!

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 0/4] Add support for mailmap in cat-file
  2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
                       ` (4 preceding siblings ...)
  2022-07-10  5:34     ` [PATCH v3 0/4] Add support for mailmap in cat-file Junio C Hamano
@ 2022-07-12 16:06     ` Siddharth Asthana
  2022-07-12 16:06       ` [PATCH v4 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
                         ` (4 more replies)
  5 siblings, 5 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-12 16:06 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

Thanks a lot for helping me fix the broken test Johannes and Junio! :D

= Description

This patch series adds mailmap support to the git-cat-file command. It
adds the mailmap support only for the commit and tag objects by
replacing the idents for "author", "committer" and "tagger" headers. The
mailmap only takes effect when --[no-]-use-mailmap or --[no-]-mailmap
option is passed to the git cat-file command. The changes will work with
the batch mode as well.

So, if one wants to enable mailmap they can use either of the following
commands:
$ git cat-file --use-mailmap -p <object>
$ git cat-file --use-mailmap <type> <object>

To use it in the batch mode, one can use the following command:
$ git cat-file --use-mailmap --batch

= Patch Organization

- The first patch improves the commit_rewrite_person() by restricting it 
  to traverse only through the header part of the commit object buffer.
  It also adds an argument called headers which the callers can pass. 
  The function will replace idents only on these  passed headers. 
  Thus, the caller won't have to make repeated calls to the function.
- The second patch moves commit_rewrite_person() to ident.c to expose it
  as a public function so that it can be used to replace idents in the
  headers of desired objects.
- The third patch renames commit_rewrite_person() to a name which
  describes its functionality clearly. It is renamed to
  apply_mailmap_to_header().
- The last patch adds mailmap support to the git cat-file command. It
  adds the required documentation and tests as well.

Changes in v4:
- This patch series introduces new test cases for testing the mailmap
  mechanism in git-cat-file command. These tests rely on the side
  effects of the earlier test case `set up symlink tests`. However, that
  test case is guarded behind the `SYMLINKS` prereq, therefore it is not
  run e.g. on Windows. This caused the --use-mailmap tests to fail on 
  Windows. So, that has been fixed by removing the prereq from `set up`
  test case.
- The `set up symlink tests` has also been renamed to `prepare for
  symlink/--use-mailmap tests` to reflect its broadened responsibility.

Siddharth Asthana (4):
  revision: improve commit_rewrite_person()
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++
 builtin/cat-file.c             | 43 +++++++++++++++++++-
 cache.h                        |  6 +++
 ident.c                        | 72 ++++++++++++++++++++++++++++++++++
 revision.c                     | 50 ++---------------------
 t/t4203-mailmap.sh             | 56 +++++++++++++++++++++++++-
 6 files changed, 184 insertions(+), 49 deletions(-)

Range-diff against v3:
1:  9e95326c58 = 1:  9e95326c58 revision: improve commit_rewrite_person()
2:  d9395cb8b2 = 2:  d9395cb8b2 ident: move commit_rewrite_person() to ident.c
3:  355bbda25e = 3:  355bbda25e ident: rename commit_rewrite_person() to apply_mailmap_to_header()
4:  69b7ad898b ! 4:  ac532965b4 cat-file: add mailmap support
    @@ Commit message
         cat-file command. It also adds --[no-]mailmap option as an alias to
         --[no-]use-mailmap.
     
    +    This patch also introduces new test cases to test the mailmap mechanism in
    +    git cat-file command.
    +
    +    The tests added in this patch series rely on the side effects of the earlier
    +    test case `set up symlink tests`. However, that test case is guarded behind the
    +    `SYMLINKS` prereq, therefore it is not run e.g. on Windows which can cause the
    +    added tests to fail on Windows. So, fix that by removing the prereq from the
    +    `set up` test case, and adjusting its title to reflect its broadened responsibility.
    +
         Mentored-by: Christian Couder <christian.couder@gmail.com>
         Mentored-by: John Cai <johncai86@gmail.com>
         Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
    @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *pr
      		batch.all_objects = 1;
     
      ## t/t4203-mailmap.sh ##
    +@@ t/t4203-mailmap.sh: test_expect_success 'find top-level mailmap from subdir' '
    + 	test_cmp expect actual
    + '
    + 
    +-test_expect_success SYMLINKS 'set up symlink tests' '
    ++test_expect_success 'prepare for symlink/--use-mailmap tests' '
    + 	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
    + 	echo "New <new@example.com> <orig@example.com>" >map &&
    + 	rm -f .mailmap
     @@ t/t4203-mailmap.sh: test_expect_success SYMLINKS 'symlinks not respected in-tree' '
      	test_cmp expect actual
      '
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 1/4] revision: improve commit_rewrite_person()
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
@ 2022-07-12 16:06       ` Siddharth Asthana
  2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
  2022-07-14 21:02         ` Junio C Hamano
  2022-07-12 16:06       ` [PATCH v4 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
                         ` (3 subsequent siblings)
  4 siblings, 2 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-12 16:06 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 revision.c | 48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 211352795c..1939c56c67 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,19 +3755,17 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
 {
-	char *person, *endp;
+	char *endp;
 	size_t len, namelen, maillen;
 	const char *name;
 	const char *mail;
 	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
 	endp = strchr(person, '\n');
 	if (!endp)
 		return 0;
@@ -3784,6 +3782,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
 
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
@@ -3792,14 +3791,42 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 			      ident.mail_end - ident.name_begin + 1,
 			      namemail.buf, namemail.len);
 
+		newlen = namemail.len;
+
 		strbuf_release(&namemail);
 
-		return 1;
+		return newlen - (ident.mail_end - ident.name_begin + 1);
 	}
 
 	return 0;
 }
 
+static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i, linelen;
+
+		line = buf->buf + buf_offset;
+		linelen = strchrnul(line, '\n') - line + 1;
+
+		if (linelen <= 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
+
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
@@ -3832,11 +3859,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 2/4] ident: move commit_rewrite_person() to ident.c
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
  2022-07-12 16:06       ` [PATCH v4 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-12 16:06       ` Siddharth Asthana
  2022-07-12 16:06       ` [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-12 16:06 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    |  6 +++++
 ident.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.c | 72 ------------------------------------------------------
 3 files changed, 78 insertions(+), 72 deletions(-)

diff --git a/cache.h b/cache.h
index ac5ab4ef9d..c9dbe1c29a 100644
--- a/cache.h
+++ b/cache.h
@@ -1688,6 +1688,12 @@ struct ident_split {
  */
 int split_ident_line(struct ident_split *, const char *, int);
 
+/*
+ * Given a commit object buffer and the commit headers, replaces the idents
+ * in the headers with their canonical versions using the mailmap mechanism.
+ */
+void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+
 /*
  * Compare split idents for equality or strict ordering. Note that we
  * compare only the ident part of the line, ignoring any timestamp.
diff --git a/ident.c b/ident.c
index 89ca5b4700..9f4f6e9071 100644
--- a/ident.c
+++ b/ident.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "config.h"
 #include "date.h"
+#include "mailmap.h"
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
@@ -346,6 +347,77 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
+{
+	char *endp;
+	size_t len, namelen, maillen;
+	const char *name;
+	const char *mail;
+	struct ident_split ident;
+
+	endp = strchr(person, '\n');
+	if (!endp)
+		return 0;
+
+	len = endp - person;
+
+	if (split_ident_line(&ident, person, len))
+		return 0;
+
+	mail = ident.mail_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name_begin;
+	namelen = ident.name_end - ident.name_begin;
+
+	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
+		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
+
+		strbuf_addf(&namemail, "%.*s <%.*s>",
+			    (int)namelen, name, (int)maillen, mail);
+
+		strbuf_splice(buf, ident.name_begin - buf->buf,
+			      ident.mail_end - ident.name_begin + 1,
+			      namemail.buf, namemail.len);
+
+		newlen = namemail.len;
+
+		strbuf_release(&namemail);
+
+		return newlen - (ident.mail_end - ident.name_begin + 1);
+	}
+
+	return 0;
+}
+
+void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i, linelen;
+
+		line = buf->buf + buf_offset;
+		linelen = strchrnul(line, '\n') - line + 1;
+
+		if (linelen <= 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
 
 static void ident_env_hint(enum want_ident whose_ident)
 {
diff --git a/revision.c b/revision.c
index 1939c56c67..14dca903b6 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,78 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-/*
- * Returns the difference between the new and old length of the ident line.
- */
-static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
-{
-	char *endp;
-	size_t len, namelen, maillen;
-	const char *name;
-	const char *mail;
-	struct ident_split ident;
-
-	endp = strchr(person, '\n');
-	if (!endp)
-		return 0;
-
-	len = endp - person;
-
-	if (split_ident_line(&ident, person, len))
-		return 0;
-
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-
-	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
-		struct strbuf namemail = STRBUF_INIT;
-		size_t newlen;
-
-		strbuf_addf(&namemail, "%.*s <%.*s>",
-			    (int)namelen, name, (int)maillen, mail);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      namemail.buf, namemail.len);
-
-		newlen = namemail.len;
-
-		strbuf_release(&namemail);
-
-		return newlen - (ident.mail_end - ident.name_begin + 1);
-	}
-
-	return 0;
-}
-
-static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
-{
-	size_t buf_offset = 0;
-
-	if (!mailmap)
-		return;
-
-	for (;;) {
-		const char *person, *line;
-		size_t i, linelen;
-
-		line = buf->buf + buf_offset;
-		linelen = strchrnul(line, '\n') - line + 1;
-
-		if (linelen <= 1)
-			/* End of header */
-			return;
-
-		buf_offset += linelen;
-
-		for (i = 0; headers[i]; i++)
-			if (skip_prefix(line, headers[i], &person))
-				buf_offset += rewrite_ident_line(person, buf, mailmap);
-	}
-}
-
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
  2022-07-12 16:06       ` [PATCH v4 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-12 16:06       ` [PATCH v4 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-07-12 16:06       ` Siddharth Asthana
  2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
  2022-07-12 16:06       ` [PATCH v4 4/4] cat-file: add mailmap support Siddharth Asthana
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
  4 siblings, 1 reply; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-12 16:06 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    | 6 +++---
 ident.c    | 2 +-
 revision.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index c9dbe1c29a..9edb7fefd3 100644
--- a/cache.h
+++ b/cache.h
@@ -1689,10 +1689,10 @@ struct ident_split {
 int split_ident_line(struct ident_split *, const char *, int);
 
 /*
- * Given a commit object buffer and the commit headers, replaces the idents
- * in the headers with their canonical versions using the mailmap mechanism.
+ * Given a commit or tag object buffer and the commit or tag headers, replaces
+ * the idents in the headers with their canonical versions using the mailmap mechanism.
  */
-void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 9f4f6e9071..5f17bd607d 100644
--- a/ident.c
+++ b/ident.c
@@ -393,7 +393,7 @@ static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct
 	return 0;
 }
 
-void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap)
 {
 	size_t buf_offset = 0;
 
diff --git a/revision.c b/revision.c
index 14dca903b6..6ad3665204 100644
--- a/revision.c
+++ b/revision.c
@@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
+		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 4/4] cat-file: add mailmap support
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
                         ` (2 preceding siblings ...)
  2022-07-12 16:06       ` [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
@ 2022-07-12 16:06       ` Siddharth Asthana
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-12 16:06 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana, Phillip Wood

git-cat-file is used by tools like GitLab to get commit tag contents
that are then displayed to users. This content which has author,
committer or tagger information, could benefit from passing through the
mailmap mechanism before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

This patch also introduces new test cases to test the mailmap mechanism in
git cat-file command.

The tests added in this patch series rely on the side effects of the earlier
test case `set up symlink tests`. However, that test case is guarded behind the
`SYMLINKS` prereq, therefore it is not run e.g. on Windows which can cause the
added tests to fail on Windows. So, fix that by removing the prereq from the
`set up` test case, and adjusting its title to reflect its broadened responsibility.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 Documentation/git-cat-file.txt |  6 ++++
 builtin/cat-file.c             | 43 +++++++++++++++++++++++++-
 t/t4203-mailmap.sh             | 56 +++++++++++++++++++++++++++++++++-
 3 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..1880e9bba1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -63,6 +63,12 @@ OPTIONS
 	or to ask for a "blob" with `<object>` being a tag object that
 	points at it.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+       Use mailmap file to map author, committer and tagger names
+       and email addresses to canonical real names and email addresses.
+       See linkgit:git-shortlog[1].
+
 --textconv::
 	Show the content as transformed by a textconv filter. In this case,
 	`<object>` has to be of the form `<tree-ish>:<path>`, or `:<path>` in
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..4b68216b51 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "mailmap.h"
 
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
@@ -36,6 +37,22 @@ struct batch_options {
 
 static const char *force_path;
 
+static struct string_list mailmap = STRING_LIST_INIT_NODUP;
+static int use_mailmap;
+
+static char *replace_idents_using_mailmap(char *, size_t *);
+
+static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
+	apply_mailmap_to_header(&sb, headers, &mailmap);
+	*size = sb.len;
+	return strbuf_detach(&sb, NULL);
+}
+
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
 			 char **buf, unsigned long *size)
@@ -152,6 +169,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		/* otherwise just spit out the data */
 		break;
 
@@ -183,6 +206,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);
+
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
 		break;
 	}
 	default:
@@ -348,11 +377,18 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		void *contents;
 
 		contents = read_object_file(oid, &type, &size);
+
+		if (use_mailmap) {
+			size_t s = size;
+			contents = replace_idents_using_mailmap(contents, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
+		if (data->info.sizep && size != data->size && !use_mailmap)
 			die("object %s changed size!?", oid_to_hex(oid));
 
 		batch_write(opt, contents, size);
@@ -843,6 +879,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		/* Batch mode */
 		OPT_GROUP(N_("Batch objects requested on stdin (or --batch-all-objects)")),
 		OPT_CALLBACK_F(0, "batch", &batch, N_("format"),
@@ -885,6 +923,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	opt_cw = (opt == 'c' || opt == 'w');
 	opt_epts = (opt == 'e' || opt == 'p' || opt == 't' || opt == 's');
 
+	if (use_mailmap)
+		read_mailmap(&mailmap);
+
 	/* --batch-all-objects? */
 	if (opt == 'b')
 		batch.all_objects = 1;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 0b2d21ec55..dba0a8ac56 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -932,7 +932,7 @@ test_expect_success 'find top-level mailmap from subdir' '
 	test_cmp expect actual
 '
 
-test_expect_success SYMLINKS 'set up symlink tests' '
+test_expect_success 'prepare for symlink/--use-mailmap tests' '
 	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
 	echo "New <new@example.com> <orig@example.com>" >map &&
 	rm -f .mailmap
@@ -963,4 +963,58 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author Orig <orig@example.com>
+	EOF
+	git cat-file --no-use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--use-mailmap enables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author A U Thor <author@example.com>
+	EOF
+	git cat-file --use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger C O Mitter <committer@example.com>
+	EOF
+	git tag -a -m "annotated tag" v1 &&
+	git cat-file --no-mailmap -p v1 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger Orig <orig@example.com>
+	EOF
+	git tag -a -m "annotated tag" v2 &&
+	git cat-file --mailmap -p v2 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.37.0.6.g69b7ad898b


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 0/4] Add support for mailmap in cat-file
  2022-07-12 14:16         ` Junio C Hamano
  2022-07-12 16:01           ` Siddharth Asthana
@ 2022-07-12 16:06           ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-12 16:06 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Siddharth Asthana, git, phillip.wood123, congdanhqx,
	christian.couder, avarab, johncai86

Junio C Hamano <gitster@pobox.com> writes:

> The cat-file tests does not have to use the "map" file to do its
> thing at all.  In fact, these tests prepare their own .mailmap file
> inside them.  But because it chose to run in the history prepared by
> previous tests, it broke, because without SYMLINKS, the sought-for
> commit does not get created.

So, an alternative solution is to keep the existing tests on
symlinks totally unrelated to these new tests.  These cat-file tests
can prepare the commit to munge at the beginning of the sequence,
and then do its thing.  This way, a platform without symlink support
does not have to create the "map" file that nobody uses, something
like the attached.

I do not have strong preference either way, though.

 t/t4203-mailmap.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git c/t/t4203-mailmap.sh w/t/t4203-mailmap.sh
index c60a90615c..cd1cab3e54 100755
--- c/t/t4203-mailmap.sh
+++ w/t/t4203-mailmap.sh
@@ -963,6 +963,11 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success 'prepare for cat-file --mailmap' '
+	rm -f .mailmap &&
+	git commit --allow-empty -m foo --author="Orig <orig@example.com>"
+'
+
 test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
 	test_when_finished "rm .mailmap" &&
 	cat >.mailmap <<-EOF &&

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 1/4] revision: improve commit_rewrite_person()
  2022-07-09 15:41     ` [PATCH v3 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-12 16:29       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2022-07-12 16:29 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, avarab,
	gitster, johncai86

Hi Siddarth,

On Sat, 9 Jul 2022, Siddharth Asthana wrote:

> diff --git a/revision.c b/revision.c
> index 211352795c..1939c56c67 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3792,14 +3791,42 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
>  			      ident.mail_end - ident.name_begin + 1,
>  			      namemail.buf, namemail.len);
>
> +		newlen = namemail.len;
> +
>  		strbuf_release(&namemail);
>
> -		return 1;
> +		return newlen - (ident.mail_end - ident.name_begin + 1);
>  	}
>
>  	return 0;
>  }
>
> +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
> +{
> +	size_t buf_offset = 0;
> +
> +	if (!mailmap)
> +		return;
> +
> +	for (;;) {
> +		const char *person, *line;
> +		size_t i, linelen;
> +
> +		line = buf->buf + buf_offset;
> +		linelen = strchrnul(line, '\n') - line + 1;
> +
> +		if (linelen <= 1)
> +			/* End of header */
> +			return;

This conditional would probably read much better if it was moved up a few
lines and rewritten like this:

	if (!*line || *line == '\n')
		return; /* End of headers */

or even turning the `for (;;)` into

	while (buf->buf[buf_offset] && buf->buf[buf_offset] != '\n')

> +
> +		buf_offset += linelen;

I would prefer to avoid having `linelen` altogether, and instead move the
`buf_offset` assignment _after_ the loop that handles the current header
line (and _not_ modify `buf_offset` inside):

	buf_offset = strchrnul(buf->buf + buf_offset, '\n');
	if (buf->buf[buf_offset] == '\n')
		buf_offset++;

> +
> +		for (i = 0; headers[i]; i++)
> +			if (skip_prefix(line, headers[i], &person))
> +				buf_offset += rewrite_ident_line(person, buf, mailmap);

At this point, we have handled the header and should _not_ continue the
(inner) `for` loop. This is important because `line` is potentially
invalidated by that `rewrite_ident_line()` call. See below for a patch
(which is on top of `shears/seen`, but you get the idea.

This issue could also be avoided by consistently using `buf->buf +
buf_offset` instead of `line`.

> +	}
> +}
> +
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;

-- snipsnap --
From 61dd169def195eee9827a9a670f8dab606279cea Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Tue, 12 Jul 2022 15:10:35 +0200
Subject: [PATCH] fixup??? revision: improve commit_rewrite_person()

When the `linux-musl` job failed in t4203.44 with a segmentation fault,
I became suspicious. From
https://github.com/git-for-windows/git/runs/7301741954?check_suite_focus=true#step:5:1775:

  + test_config mailmap.file complex.map
  + config_dir=
  + test mailmap.file '=' -C
  + test_when_finished 'test_unconfig  '"'"'mailmap.file'"'"
  + test 0 '=' 0
  + test_cleanup='{ test_unconfig  '"'"'mailmap.file'"'"'
  		} && (exit "$eval_ret"); eval_ret=$?; :'
  + git config mailmap.file complex.map
  + git log --use-mailmap --author '<cto@coompany.xx>'
  Segmentation fault (core dumped)
  error: last command exited with $?=139

I suspected a memory corruption, and my go-to tool to investigate those
is `valgrind`, so I ran `t4203-*.sh -ivx --valgrind-only=44`. It
reported the following:

-- snip --
[...]
expecting success of 4203.44 'Only grep replaced author with --use-mailmap':
	test_config mailmap.file complex.map &&
	git log --use-mailmap --author "<cto@coompany.xx>" >actual &&
	test_must_be_empty actual

+ test_config mailmap.file complex.map
+ config_dir=
+ test mailmap.file = -C
+ test_when_finished test_unconfig  'mailmap.file'
+ test 0 = 0
+ test_cleanup={ test_unconfig  'mailmap.file'
		} && (exit "$eval_ret"); eval_ret=$?; :
+ git config mailmap.file complex.map
+ git log --use-mailmap --author <cto@coompany.xx>
==14374== Invalid read of size 1
==14374==    at 0x2EE384: skip_prefix (git-compat-util.h:676)
==14374==    by 0x2EF31D: apply_mailmap_to_header (ident.c:417)
==14374==    by 0x3BB045: commit_match (revision.c:3831)
==14374==    by 0x3BB389: get_commit_action (revision.c:3917)
==14374==    by 0x3BB934: simplify_commit (revision.c:4005)
==14374==    by 0x3BBCAD: get_revision_1 (revision.c:4083)
==14374==    by 0x3BBEF0: get_revision_internal (revision.c:4184)
==14374==    by 0x3BC192: get_revision (revision.c:4262)
==14374==    by 0x1A0B05: cmd_log_walk_no_free (log.c:454)
==14374==    by 0x1A0BCD: cmd_log_walk (log.c:496)
==14374==    by 0x1A1E90: cmd_log (log.c:818)
==14374==    by 0x129A04: run_builtin (git.c:466)
==14374==  Address 0x4b76f4e is 94 bytes inside a block of size 210 free'd
==14374==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==14374==    by 0x42F908: xrealloc (wrapper.c:136)
==14374==    by 0x3E6742: strbuf_grow (strbuf.c:99)
==14374==    by 0x3E6F1F: strbuf_splice (strbuf.c:242)
==14374==    by 0x2EF220: rewrite_ident_line (ident.c:382)
==14374==    by 0x2EF338: apply_mailmap_to_header (ident.c:418)
==14374==    by 0x3BB045: commit_match (revision.c:3831)
==14374==    by 0x3BB389: get_commit_action (revision.c:3917)
==14374==    by 0x3BB934: simplify_commit (revision.c:4005)
==14374==    by 0x3BBCAD: get_revision_1 (revision.c:4083)
==14374==    by 0x3BBEF0: get_revision_internal (revision.c:4184)
==14374==    by 0x3BC192: get_revision (revision.c:4262)
==14374==  Block was alloc'd at
==14374==    at 0x483B723: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==14374==    by 0x483E017: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==14374==    by 0x42F908: xrealloc (wrapper.c:136)
==14374==    by 0x3E6742: strbuf_grow (strbuf.c:99)
==14374==    by 0x3E733D: strbuf_add (strbuf.c:298)
==14374==    by 0x3AF420: strbuf_addstr (strbuf.h:305)
==14374==    by 0x3BB027: commit_match (revision.c:3829)
==14374==    by 0x3BB389: get_commit_action (revision.c:3917)
==14374==    by 0x3BB934: simplify_commit (revision.c:4005)
==14374==    by 0x3BBCAD: get_revision_1 (revision.c:4083)
==14374==    by 0x3BBEF0: get_revision_internal (revision.c:4184)
==14374==    by 0x3BC192: get_revision (revision.c:4262)
==14374==
{
   <insert_a_suppression_name_here>
   Memcheck:Addr1
   fun:skip_prefix
   fun:apply_mailmap_to_header
   fun:commit_match
   fun:get_commit_action
   fun:simplify_commit
   fun:get_revision_1
   fun:get_revision_internal
   fun:get_revision
   fun:cmd_log_walk_no_free
   fun:cmd_log_walk
   fun:cmd_log
   fun:run_builtin
}
error: last command exited with $?=126
not ok 44 - Only grep replaced author with --use-mailmap
1..44
-- snap --

And indeed, we see that the `rewrite_ident_line()` function grows the
strbuf, which can (and does, in this instance) move the buffer to a new
address, which invalidates the `line` pointer, which still points at the
old address.

It might actually make sense to rewrite the entire part of the original
patch where it looks for the end of the header line, so that it avoids
working on pointers altogether, and uses offsets instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 ident.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ident.c b/ident.c
index 5f17bd607dd..fbcf7250aab 100644
--- a/ident.c
+++ b/ident.c
@@ -414,8 +414,10 @@ void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct st
 		buf_offset += linelen;

 		for (i = 0; headers[i]; i++)
-			if (skip_prefix(line, headers[i], &person))
+			if (skip_prefix(line, headers[i], &person)) {
 				buf_offset += rewrite_ident_line(person, buf, mailmap);
+				break;
+			}
 	}
 }

--
2.37.0.rc2.windows.1.7.g45a475aeb84


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/4] revision: improve commit_rewrite_person()
  2022-07-12 16:06       ` [PATCH v4 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
  2022-07-13 12:18           ` Christian Couder
  2022-07-14 21:02         ` Junio C Hamano
  1 sibling, 1 reply; 68+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13  1:25 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, gitster,
	Johannes.Schindelin, johncai86


On Tue, Jul 12 2022, Siddharth Asthana wrote:

> -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> +/*
> + * Returns the difference between the new and old length of the ident line.
> + */
> +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)

All tests pass with this as size_t, instead of size_t. Let's use that
here instead?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-12 16:06       ` [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
@ 2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
  2022-07-13 13:29           ` Christian Couder
  0 siblings, 1 reply; 68+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-13  1:25 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, gitster,
	Johannes.Schindelin, johncai86


On Tue, Jul 12 2022, Siddharth Asthana wrote:

> commit_rewrite_person() takes a commit buffer and replaces the idents
> in the header with their canonical versions using the mailmap mechanism.
> The name "commit_rewrite_person()" is misleading as it doesn't convey
> what kind of rewrite are we going to do to the buffer. It also doesn't
> clearly mention that the function will limit itself to the header part
> of the buffer. The new name, "apply_mailmap_to_header()", expresses the
> functionality of the function pretty clearly.
>
> We intend to use apply_mailmap_to_header() in git-cat-file to replace
> idents in the headers of commit and tag object buffers. So, we will be
> extending this function to take tag objects buffer as well and replace
> idents on the tagger header using the mailmap mechanism.
>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: John Cai <johncai86@gmail.com>
> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
> ---
>  cache.h    | 6 +++---
>  ident.c    | 2 +-
>  revision.c | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index c9dbe1c29a..9edb7fefd3 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1689,10 +1689,10 @@ struct ident_split {
>  int split_ident_line(struct ident_split *, const char *, int);
>  
>  /*
> - * Given a commit object buffer and the commit headers, replaces the idents
> - * in the headers with their canonical versions using the mailmap mechanism.
> + * Given a commit or tag object buffer and the commit or tag headers, replaces
> + * the idents in the headers with their canonical versions using the mailmap mechanism.
>   */
> -void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
> +void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap);
>  
>  /*
>   * Compare split idents for equality or strict ordering. Note that we
> diff --git a/ident.c b/ident.c
> index 9f4f6e9071..5f17bd607d 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -393,7 +393,7 @@ static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct
>  	return 0;
>  }
>  
> -void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
> +void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap)
>  {
>  	size_t buf_offset = 0;
>  
> diff --git a/revision.c b/revision.c
> index 14dca903b6..6ad3665204 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  		if (!buf.len)
>  			strbuf_addstr(&buf, message);
>  
> -		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
> +		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
>  	}
>  
>  	/* Append "fake" message parts as needed */

I can live with this so far, but I really think this is cementing the
wrong approach into place here.

We only use commit_match() to feed a commit to grep.c, which if you look
at the "header_field" struct there we take this pre-formatted output and
parse this out *again*, i.e. find "author", "reflog", "committer" etc.,
and eventually point the regex engine at that buffer.

So we really don't need to get a strbuf here, and munge the whole thing
in place to feed it to grep.c, instead we can:

 1. Not munge it at all, pass it as-is
 2. Pass the mailmap along to grep.c itself
 3. It's already parsing out the headers, so at some point it will have
    "author foo <bar>\n"
 4. In that code, we can just consult the mailmap, and then map the "foo
   <bar>" bart to "Baz <bar>" or whatever
 5. Thean search that string.

So no need for any in-place rewriting, or no?

Even with this approach this seems a bit odd, e.g. isn't your
commit_rewrite_person() largely a re-invention of find_commit_header()
in commit.c, can't we use that function there?

The replace_idents_using_mailmap() in 4/4 seems like it could be
improved in a similar way.

I.e. can't we just loop over the the object, then as we find "author"
consult the mailmap, and potentially emit a replacement, otherwise the
existing content as-is up until the next \n etc.

We should be able to "stream" all of this, instead of in-place modifying
a potentially large commit buffer, which involves memmove() etc.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/4] revision: improve commit_rewrite_person()
  2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
@ 2022-07-13 12:18           ` Christian Couder
  0 siblings, 0 replies; 68+ messages in thread
From: Christian Couder @ 2022-07-13 12:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Siddharth Asthana, git, Phillip Wood,
	Đoàn Trần Công Danh, Junio C Hamano,
	Johannes Schindelin, John Cai

On Wed, Jul 13, 2022 at 3:25 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Jul 12 2022, Siddharth Asthana wrote:
>
> > -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> > +/*
> > + * Returns the difference between the new and old length of the ident line.
> > + */
> > +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
>
> All tests pass with this as size_t, instead of size_t. Let's use that
> here instead?

Do you mean you would like to use size_t instead of ssize_t for the
type of the value returned by the function?

I think it can return a negative value if the new length of the ident
line is shorter than the old one though.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
@ 2022-07-13 13:29           ` Christian Couder
  0 siblings, 0 replies; 68+ messages in thread
From: Christian Couder @ 2022-07-13 13:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Siddharth Asthana, git, Phillip Wood,
	Đoàn Trần Công Danh, Junio C Hamano,
	Johannes Schindelin, John Cai

On Wed, Jul 13, 2022 at 3:39 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Tue, Jul 12 2022, Siddharth Asthana wrote:

> > diff --git a/revision.c b/revision.c> > index 14dca903b6..6ad3665204 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
> >               if (!buf.len)
> >                       strbuf_addstr(&buf, message);
> >
> > -             commit_rewrite_person(&buf, commit_headers, opt->mailmap);
> > +             apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
> >       }
> >
> >       /* Append "fake" message parts as needed */
>
> I can live with this so far, but I really think this is cementing the
> wrong approach into place here.
>
> We only use commit_match() to feed a commit to grep.c, which if you look
> at the "header_field" struct there we take this pre-formatted output and
> parse this out *again*, i.e. find "author", "reflog", "committer" etc.,
> and eventually point the regex engine at that buffer.
>
> So we really don't need to get a strbuf here, and munge the whole thing
> in place to feed it to grep.c, instead we can:
>
>  1. Not munge it at all, pass it as-is
>  2. Pass the mailmap along to grep.c itself
>  3. It's already parsing out the headers, so at some point it will have
>     "author foo <bar>\n"
>  4. In that code, we can just consult the mailmap, and then map the "foo
>    <bar>" bart to "Baz <bar>" or whatever
>  5. Thean search that string.
>
> So no need for any in-place rewriting, or no?

This patch series is about improving `git cat-file` and it seems to be
far fetched to ask it to rewrite how grep handles mailmap first.

> Even with this approach this seems a bit odd, e.g. isn't your
> commit_rewrite_person() largely a re-invention of find_commit_header()
> in commit.c, can't we use that function there?

find_commit_header() seems to be searching for only one header, while
we want to search for more than one. Also we want only one pass to be
made over the object buffer. So I think we cannot really reuse
find_commit_header().

> The replace_idents_using_mailmap() in 4/4 seems like it could be
> improved in a similar way.
>
> I.e. can't we just loop over the the object, then as we find "author"
> consult the mailmap, and potentially emit a replacement, otherwise the
> existing content as-is up until the next \n etc.

That's what we do except that we replace the existing ident instead of
emitting a replacement.

> We should be able to "stream" all of this, instead of in-place modifying
> a potentially large commit buffer, which involves memmove() etc.

I am not sure if streaming is really much better, especially if there
are a small number of commit or tag objects where an ident must be
replaced.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 1/4] revision: improve commit_rewrite_person()
  2022-07-12 16:06       ` [PATCH v4 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
@ 2022-07-14 21:02         ` Junio C Hamano
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-14 21:02 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, avarab,
	Johannes.Schindelin, johncai86

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> +/*
> + * Returns the difference between the new and old length of the ident line.
> + */
> +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)

"const char *person".  Asterisk sticks to the variable, not to the
type.

Wrap such an overly long line.

> +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)

Likewise.

The second parameter should be called "header", not "headers",
because the way this array is used is primarily to access its
individual members, i.e. header[i] and it is more grammatical
to say that header[0] is the zero-th header, not headers[0].

> +{
> +	size_t buf_offset = 0;
> +
> +	if (!mailmap)
> +		return;
> +
> +	for (;;) {
> +		const char *person, *line;
> +		size_t i, linelen;
> +
> +		line = buf->buf + buf_offset;

buf_offset is initialized to 0.  The idea is to keep track of the
position of the current line in buf->buf as the byte offset from the
beginning.  This makes it safe to let rewrite_ident_line()
reallocate buf->buf in the loop.  So "line" points at the beginning
of the current line.

> +		linelen = strchrnul(line, '\n') - line + 1;

This "linelen" counts the LF (or NUL) termination.  Hence ...

> +		if (linelen <= 1)
> +			/* End of header */
> +			return;

... linelen==0 means we are at an empty line.

> +		buf_offset += linelen;

And by adding linelen to buf_offset, we prepare buf_offset to point
at the beginning of the next line.

BUT.

What happens when the buffer has only headers, no body, without an
empty line after the header?  buf_offset will be pointing at the
byte past the final NUL at the end of the buffer.  The next round of
the loop will point line at an invalid piece of memory.

I _think_ that the current generation of high-level tools like "git
commit" and "git tag" may leave an blank line at the end even when
they are told to create a message with no body, but this code should
not depend on that.  An object with no body and without a blank line
after the header is a valid object (cf. fsck.c::verify_headers()).

> +		for (i = 0; headers[i]; i++)
> +			if (skip_prefix(line, headers[i], &person))
> +				buf_offset += rewrite_ident_line(person, buf, mailmap);
> +	}

As the commit_headers[] array the caller gives us does not have
duplicates, as soon as we find a match, we should break out of the
loop, i.e.

		for (i = 0; header[i]; i++) {
			if (skip_prefix(line, header[i], &person))
				buf_offset += rewrite_ident_line(person, buf, maimap);
			break;
		}

> +}
> +
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;
> @@ -3832,11 +3859,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  		strbuf_addstr(&buf, message);
>  
>  	if (opt->grep_filter.header_list && opt->mailmap) {
> +		const char *commit_headers[] = { "author ", "committer ", NULL };

It is OK to call the array "commit_headers", because the way we use
the identifier is only to refer to the collection of headers as a
whole.

>  		if (!buf.len)
>  			strbuf_addstr(&buf, message);
>  
> -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
> -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
> +		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
>  	}
>  
>  	/* Append "fake" message parts as needed */

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 0/4] Add support for mailmap in cat-file
  2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
                         ` (3 preceding siblings ...)
  2022-07-12 16:06       ` [PATCH v4 4/4] cat-file: add mailmap support Siddharth Asthana
@ 2022-07-16  7:40       ` Siddharth Asthana
  2022-07-16  7:40         ` [PATCH v5 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
                           ` (4 more replies)
  4 siblings, 5 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-16  7:40 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

Thanks a lot Johannes and Junio for helping me identify and fix the memory
corruption in commit_rewrite_person()!  :)

= Description

This patch series adds mailmap support to the git-cat-file command. It
adds the mailmap support only for the commit and tag objects by
replacing the idents for "author", "committer" and "tagger" headers. The
mailmap only takes effect when --[no-]-use-mailmap or --[no-]-mailmap
option is passed to the git cat-file command. The changes will work with
the batch mode as well.

So, if one wants to enable mailmap they can use either of the following
commands:
$ git cat-file --use-mailmap -p <object>
$ git cat-file --use-mailmap <type> <object>

To use it in the batch mode, one can use the following command:
$ git cat-file --use-mailmap --batch

= Patch Organization

- The first patch improves the commit_rewrite_person() by restricting it 
  to traverse only through the header part of the commit object buffer.
  It also adds an argument called headers which the callers can pass. 
  The function will replace idents only on these  passed headers. 
  Thus, the caller won't have to make repeated calls to the function.
- The second patch moves commit_rewrite_person() to ident.c to expose it
  as a public function so that it can be used to replace idents in the
  headers of desired objects.
- The third patch renames commit_rewrite_person() to a name which
  describes its functionality clearly. It is renamed to
  apply_mailmap_to_header().
- The last patch adds mailmap support to the git cat-file command. It
  adds the required documentation and tests as well.

Changes in v5:
- In commit_rewrite_person(), we make calls to rewrite_ident_line(),
  where the strbuf can grow. This moves the buffer to a new address,
  which invalidates the `line` pointer, which still points at the same
  address . This issue has been fixed by breaking out of the inner for
  loop as soon as there we find a match for any commit headers that we
  are passing to the function.
- The commit_rewrite_person() no longer has a `linelen` variable and
  instead we now rely on `buf_offset` for navigating through the buffer.
- Some overly long lines have been wrapped.

Siddharth Asthana (4):
  revision: improve commit_rewrite_person()
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++
 builtin/cat-file.c             | 43 ++++++++++++++++++-
 cache.h                        |  6 +++
 ident.c                        | 75 ++++++++++++++++++++++++++++++++++
 revision.c                     | 50 ++---------------------
 t/t4203-mailmap.sh             | 59 ++++++++++++++++++++++++++
 6 files changed, 191 insertions(+), 48 deletions(-)

Range-diff against v4:
1:  9e95326c58 ! 1:  8c29ad9351 revision: improve commit_rewrite_person()
    @@ Commit message
         Mentored-by: Christian Couder <christian.couder@gmail.com>
         Mentored-by: John Cai <johncai86@gmail.com>
         Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
    +    Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## revision.c ##
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     +/*
     + * Returns the difference between the new and old length of the ident line.
     + */
    -+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
    ++static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    ++								  struct string_list *mailmap)
      {
     -	char *person, *endp;
     +	char *endp;
    @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *wha
      	return 0;
      }
      
    -+static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    ++static void commit_rewrite_person(struct strbuf *buf, const char **header,
    ++								  struct string_list *mailmap)
     +{
     +	size_t buf_offset = 0;
     +
    @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *wha
     +
     +	for (;;) {
     +		const char *person, *line;
    -+		size_t i, linelen;
    ++		size_t i;
     +
     +		line = buf->buf + buf_offset;
    -+		linelen = strchrnul(line, '\n') - line + 1;
    ++		if (!*line || *line == '\n')
    ++			return; /* End of header */
     +
    -+		if (linelen <= 1)
    -+			/* End of header */
    -+			return;
    ++		for (i = 0; header[i]; i++)
    ++			if (skip_prefix(line, header[i], &person)) {
    ++				rewrite_ident_line(person, buf, mailmap);
    ++				break;
    ++			}
     +
    -+		buf_offset += linelen;
    -+
    -+		for (i = 0; headers[i]; i++)
    -+			if (skip_prefix(line, headers[i], &person))
    -+				buf_offset += rewrite_ident_line(person, buf, mailmap);
    ++		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
    ++		if (buf->buf[buf_offset] == '\n')
    ++			++buf_offset;
     +	}
     +}
     +
2:  d9395cb8b2 ! 2:  ccb7f72fcb ident: move commit_rewrite_person() to ident.c
    @@ cache.h: struct ident_split {
     + * Given a commit object buffer and the commit headers, replaces the idents
     + * in the headers with their canonical versions using the mailmap mechanism.
     + */
    -+void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
    ++void commit_rewrite_person(struct strbuf *, const char **, struct string_list *);
     +
      /*
       * Compare split idents for equality or strict ordering. Note that we
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +/*
     + * Returns the difference between the new and old length of the ident line.
     + */
    -+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
    ++static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    ++								  struct string_list *mailmap)
     +{
     +	char *endp;
     +	size_t len, namelen, maillen;
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +	return 0;
     +}
     +
    -+void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    ++void commit_rewrite_person(struct strbuf *buf, const char **header,
    ++						   struct string_list *mailmap)
     +{
     +	size_t buf_offset = 0;
     +
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +
     +	for (;;) {
     +		const char *person, *line;
    -+		size_t i, linelen;
    ++		size_t i;
     +
     +		line = buf->buf + buf_offset;
    -+		linelen = strchrnul(line, '\n') - line + 1;
    -+
    -+		if (linelen <= 1)
    -+			/* End of header */
    -+			return;
    -+
    -+		buf_offset += linelen;
    -+
    -+		for (i = 0; headers[i]; i++)
    -+			if (skip_prefix(line, headers[i], &person))
    -+				buf_offset += rewrite_ident_line(person, buf, mailmap);
    ++		if (!*line || *line == '\n')
    ++			return; /* End of header */
    ++
    ++		for (i = 0; header[i]; i++)
    ++			if (skip_prefix(line, header[i], &person)) {
    ++				rewrite_ident_line(person, buf, mailmap);
    ++				break;
    ++			}
    ++
    ++		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
    ++		if (buf->buf[buf_offset] == '\n')
    ++			++buf_offset;
     +	}
     +}
      
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -/*
     - * Returns the difference between the new and old length of the ident line.
     - */
    --static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
    +-static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    +-								  struct string_list *mailmap)
     -{
     -	char *endp;
     -	size_t len, namelen, maillen;
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -	return 0;
     -}
     -
    --static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    +-static void commit_rewrite_person(struct strbuf *buf, const char **header,
    +-								  struct string_list *mailmap)
     -{
     -	size_t buf_offset = 0;
     -
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -
     -	for (;;) {
     -		const char *person, *line;
    --		size_t i, linelen;
    +-		size_t i;
     -
     -		line = buf->buf + buf_offset;
    --		linelen = strchrnul(line, '\n') - line + 1;
    --
    --		if (linelen <= 1)
    --			/* End of header */
    --			return;
    --
    --		buf_offset += linelen;
    --
    --		for (i = 0; headers[i]; i++)
    --			if (skip_prefix(line, headers[i], &person))
    --				buf_offset += rewrite_ident_line(person, buf, mailmap);
    +-		if (!*line || *line == '\n')
    +-			return; /* End of header */
    +-
    +-		for (i = 0; header[i]; i++)
    +-			if (skip_prefix(line, header[i], &person)) {
    +-				rewrite_ident_line(person, buf, mailmap);
    +-				break;
    +-			}
    +-
    +-		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
    +-		if (buf->buf[buf_offset] == '\n')
    +-			++buf_offset;
     -	}
     -}
     -
3:  355bbda25e ! 3:  38c18fd10d ident: rename commit_rewrite_person() to apply_mailmap_to_header()
    @@ cache.h: struct ident_split {
     + * Given a commit or tag object buffer and the commit or tag headers, replaces
     + * the idents in the headers with their canonical versions using the mailmap mechanism.
       */
    --void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
    -+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap);
    +-void commit_rewrite_person(struct strbuf *, const char **, struct string_list *);
    ++void apply_mailmap_to_header(struct strbuf *, const char **, struct string_list *);
      
      /*
       * Compare split idents for equality or strict ordering. Note that we
     
      ## ident.c ##
    -@@ ident.c: static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct
    +@@ ident.c: static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
      	return 0;
      }
      
    --void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    -+void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    +-void commit_rewrite_person(struct strbuf *buf, const char **header,
    +-						   struct string_list *mailmap)
    ++void apply_mailmap_to_header(struct strbuf *buf, const char **header,
    ++							 struct string_list *mailmap)
      {
      	size_t buf_offset = 0;
      
4:  ac532965b4 ! 4:  0a459d4c53 cat-file: add mailmap support
    @@ Commit message
         This patch also introduces new test cases to test the mailmap mechanism in
         git cat-file command.
     
    -    The tests added in this patch series rely on the side effects of the earlier
    -    test case `set up symlink tests`. However, that test case is guarded behind the
    -    `SYMLINKS` prereq, therefore it is not run e.g. on Windows which can cause the
    -    added tests to fail on Windows. So, fix that by removing the prereq from the
    -    `set up` test case, and adjusting its title to reflect its broadened responsibility.
    -
         Mentored-by: Christian Couder <christian.couder@gmail.com>
         Mentored-by: John Cai <johncai86@gmail.com>
         Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
    @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *pr
      		batch.all_objects = 1;
     
      ## t/t4203-mailmap.sh ##
    -@@ t/t4203-mailmap.sh: test_expect_success 'find top-level mailmap from subdir' '
    - 	test_cmp expect actual
    - '
    - 
    --test_expect_success SYMLINKS 'set up symlink tests' '
    -+test_expect_success 'prepare for symlink/--use-mailmap tests' '
    - 	git commit --allow-empty -m foo --author="Orig <orig@example.com>" &&
    - 	echo "New <new@example.com> <orig@example.com>" >map &&
    - 	rm -f .mailmap
     @@ t/t4203-mailmap.sh: test_expect_success SYMLINKS 'symlinks not respected in-tree' '
      	test_cmp expect actual
      '
      
    ++test_expect_success 'prepare for cat-file --mailmap' '
    ++	rm -f .mailmap &&
    ++	git commit --allow-empty -m foo --author="Orig <orig@example.com>"
    ++'
    ++
     +test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
     +	test_when_finished "rm .mailmap" &&
     +	cat >.mailmap <<-EOF &&
-- 
2.37.1.120.g001f220fb8


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v5 1/4] revision: improve commit_rewrite_person()
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
@ 2022-07-16  7:40         ` Siddharth Asthana
  2022-07-17 22:11           ` Junio C Hamano
  2022-07-16  7:40         ` [PATCH v5 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-16  7:40 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 revision.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 211352795c..9909da928e 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,19 +3755,18 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
+								  struct string_list *mailmap)
 {
-	char *person, *endp;
+	char *endp;
 	size_t len, namelen, maillen;
 	const char *name;
 	const char *mail;
 	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
 	endp = strchr(person, '\n');
 	if (!endp)
 		return 0;
@@ -3784,6 +3783,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
 
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
@@ -3792,14 +3792,44 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 			      ident.mail_end - ident.name_begin + 1,
 			      namemail.buf, namemail.len);
 
+		newlen = namemail.len;
+
 		strbuf_release(&namemail);
 
-		return 1;
+		return newlen - (ident.mail_end - ident.name_begin + 1);
 	}
 
 	return 0;
 }
 
+static void commit_rewrite_person(struct strbuf *buf, const char **header,
+								  struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i;
+
+		line = buf->buf + buf_offset;
+		if (!*line || *line == '\n')
+			return; /* End of header */
+
+		for (i = 0; header[i]; i++)
+			if (skip_prefix(line, header[i], &person)) {
+				rewrite_ident_line(person, buf, mailmap);
+				break;
+			}
+
+		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
+		if (buf->buf[buf_offset] == '\n')
+			++buf_offset;
+	}
+}
+
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
@@ -3832,11 +3862,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.1.120.g001f220fb8


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v5 2/4] ident: move commit_rewrite_person() to ident.c
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
  2022-07-16  7:40         ` [PATCH v5 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-16  7:40         ` Siddharth Asthana
  2022-07-16  7:40         ` [PATCH v5 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-16  7:40 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    |  6 +++++
 ident.c    | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.c | 75 ------------------------------------------------------
 3 files changed, 81 insertions(+), 75 deletions(-)

diff --git a/cache.h b/cache.h
index ac5ab4ef9d..16a08aada2 100644
--- a/cache.h
+++ b/cache.h
@@ -1688,6 +1688,12 @@ struct ident_split {
  */
 int split_ident_line(struct ident_split *, const char *, int);
 
+/*
+ * Given a commit object buffer and the commit headers, replaces the idents
+ * in the headers with their canonical versions using the mailmap mechanism.
+ */
+void commit_rewrite_person(struct strbuf *, const char **, struct string_list *);
+
 /*
  * Compare split idents for equality or strict ordering. Note that we
  * compare only the ident part of the line, ignoring any timestamp.
diff --git a/ident.c b/ident.c
index 89ca5b4700..83007e3e5d 100644
--- a/ident.c
+++ b/ident.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "config.h"
 #include "date.h"
+#include "mailmap.h"
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
@@ -346,6 +347,80 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
+								  struct string_list *mailmap)
+{
+	char *endp;
+	size_t len, namelen, maillen;
+	const char *name;
+	const char *mail;
+	struct ident_split ident;
+
+	endp = strchr(person, '\n');
+	if (!endp)
+		return 0;
+
+	len = endp - person;
+
+	if (split_ident_line(&ident, person, len))
+		return 0;
+
+	mail = ident.mail_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name_begin;
+	namelen = ident.name_end - ident.name_begin;
+
+	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
+		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
+
+		strbuf_addf(&namemail, "%.*s <%.*s>",
+			    (int)namelen, name, (int)maillen, mail);
+
+		strbuf_splice(buf, ident.name_begin - buf->buf,
+			      ident.mail_end - ident.name_begin + 1,
+			      namemail.buf, namemail.len);
+
+		newlen = namemail.len;
+
+		strbuf_release(&namemail);
+
+		return newlen - (ident.mail_end - ident.name_begin + 1);
+	}
+
+	return 0;
+}
+
+void commit_rewrite_person(struct strbuf *buf, const char **header,
+						   struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i;
+
+		line = buf->buf + buf_offset;
+		if (!*line || *line == '\n')
+			return; /* End of header */
+
+		for (i = 0; header[i]; i++)
+			if (skip_prefix(line, header[i], &person)) {
+				rewrite_ident_line(person, buf, mailmap);
+				break;
+			}
+
+		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
+		if (buf->buf[buf_offset] == '\n')
+			++buf_offset;
+	}
+}
 
 static void ident_env_hint(enum want_ident whose_ident)
 {
diff --git a/revision.c b/revision.c
index 9909da928e..14dca903b6 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,81 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-/*
- * Returns the difference between the new and old length of the ident line.
- */
-static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
-								  struct string_list *mailmap)
-{
-	char *endp;
-	size_t len, namelen, maillen;
-	const char *name;
-	const char *mail;
-	struct ident_split ident;
-
-	endp = strchr(person, '\n');
-	if (!endp)
-		return 0;
-
-	len = endp - person;
-
-	if (split_ident_line(&ident, person, len))
-		return 0;
-
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-
-	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
-		struct strbuf namemail = STRBUF_INIT;
-		size_t newlen;
-
-		strbuf_addf(&namemail, "%.*s <%.*s>",
-			    (int)namelen, name, (int)maillen, mail);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      namemail.buf, namemail.len);
-
-		newlen = namemail.len;
-
-		strbuf_release(&namemail);
-
-		return newlen - (ident.mail_end - ident.name_begin + 1);
-	}
-
-	return 0;
-}
-
-static void commit_rewrite_person(struct strbuf *buf, const char **header,
-								  struct string_list *mailmap)
-{
-	size_t buf_offset = 0;
-
-	if (!mailmap)
-		return;
-
-	for (;;) {
-		const char *person, *line;
-		size_t i;
-
-		line = buf->buf + buf_offset;
-		if (!*line || *line == '\n')
-			return; /* End of header */
-
-		for (i = 0; header[i]; i++)
-			if (skip_prefix(line, header[i], &person)) {
-				rewrite_ident_line(person, buf, mailmap);
-				break;
-			}
-
-		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
-		if (buf->buf[buf_offset] == '\n')
-			++buf_offset;
-	}
-}
-
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
-- 
2.37.1.120.g001f220fb8


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v5 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
  2022-07-16  7:40         ` [PATCH v5 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-16  7:40         ` [PATCH v5 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-07-16  7:40         ` Siddharth Asthana
  2022-07-16  7:40         ` [PATCH v5 4/4] cat-file: add mailmap support Siddharth Asthana
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-16  7:40 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    | 6 +++---
 ident.c    | 4 ++--
 revision.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index 16a08aada2..4aa1bd079d 100644
--- a/cache.h
+++ b/cache.h
@@ -1689,10 +1689,10 @@ struct ident_split {
 int split_ident_line(struct ident_split *, const char *, int);
 
 /*
- * Given a commit object buffer and the commit headers, replaces the idents
- * in the headers with their canonical versions using the mailmap mechanism.
+ * Given a commit or tag object buffer and the commit or tag headers, replaces
+ * the idents in the headers with their canonical versions using the mailmap mechanism.
  */
-void commit_rewrite_person(struct strbuf *, const char **, struct string_list *);
+void apply_mailmap_to_header(struct strbuf *, const char **, struct string_list *);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 83007e3e5d..c0f56bc752 100644
--- a/ident.c
+++ b/ident.c
@@ -394,8 +394,8 @@ static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
 	return 0;
 }
 
-void commit_rewrite_person(struct strbuf *buf, const char **header,
-						   struct string_list *mailmap)
+void apply_mailmap_to_header(struct strbuf *buf, const char **header,
+							 struct string_list *mailmap)
 {
 	size_t buf_offset = 0;
 
diff --git a/revision.c b/revision.c
index 14dca903b6..6ad3665204 100644
--- a/revision.c
+++ b/revision.c
@@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
+		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.1.120.g001f220fb8


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v5 4/4] cat-file: add mailmap support
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
                           ` (2 preceding siblings ...)
  2022-07-16  7:40         ` [PATCH v5 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
@ 2022-07-16  7:40         ` Siddharth Asthana
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-16  7:40 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana, Phillip Wood

git-cat-file is used by tools like GitLab to get commit tag contents
that are then displayed to users. This content which has author,
committer or tagger information, could benefit from passing through the
mailmap mechanism before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

This patch also introduces new test cases to test the mailmap mechanism in
git cat-file command.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 Documentation/git-cat-file.txt |  6 ++++
 builtin/cat-file.c             | 43 ++++++++++++++++++++++++-
 t/t4203-mailmap.sh             | 59 ++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..1880e9bba1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -63,6 +63,12 @@ OPTIONS
 	or to ask for a "blob" with `<object>` being a tag object that
 	points at it.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+       Use mailmap file to map author, committer and tagger names
+       and email addresses to canonical real names and email addresses.
+       See linkgit:git-shortlog[1].
+
 --textconv::
 	Show the content as transformed by a textconv filter. In this case,
 	`<object>` has to be of the form `<tree-ish>:<path>`, or `:<path>` in
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..4b68216b51 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "mailmap.h"
 
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
@@ -36,6 +37,22 @@ struct batch_options {
 
 static const char *force_path;
 
+static struct string_list mailmap = STRING_LIST_INIT_NODUP;
+static int use_mailmap;
+
+static char *replace_idents_using_mailmap(char *, size_t *);
+
+static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
+	apply_mailmap_to_header(&sb, headers, &mailmap);
+	*size = sb.len;
+	return strbuf_detach(&sb, NULL);
+}
+
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
 			 char **buf, unsigned long *size)
@@ -152,6 +169,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		/* otherwise just spit out the data */
 		break;
 
@@ -183,6 +206,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);
+
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
 		break;
 	}
 	default:
@@ -348,11 +377,18 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		void *contents;
 
 		contents = read_object_file(oid, &type, &size);
+
+		if (use_mailmap) {
+			size_t s = size;
+			contents = replace_idents_using_mailmap(contents, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
+		if (data->info.sizep && size != data->size && !use_mailmap)
 			die("object %s changed size!?", oid_to_hex(oid));
 
 		batch_write(opt, contents, size);
@@ -843,6 +879,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		/* Batch mode */
 		OPT_GROUP(N_("Batch objects requested on stdin (or --batch-all-objects)")),
 		OPT_CALLBACK_F(0, "batch", &batch, N_("format"),
@@ -885,6 +923,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	opt_cw = (opt == 'c' || opt == 'w');
 	opt_epts = (opt == 'e' || opt == 'p' || opt == 't' || opt == 's');
 
+	if (use_mailmap)
+		read_mailmap(&mailmap);
+
 	/* --batch-all-objects? */
 	if (opt == 'b')
 		batch.all_objects = 1;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 0b2d21ec55..cd1cab3e54 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -963,4 +963,63 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success 'prepare for cat-file --mailmap' '
+	rm -f .mailmap &&
+	git commit --allow-empty -m foo --author="Orig <orig@example.com>"
+'
+
+test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author Orig <orig@example.com>
+	EOF
+	git cat-file --no-use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--use-mailmap enables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author A U Thor <author@example.com>
+	EOF
+	git cat-file --use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger C O Mitter <committer@example.com>
+	EOF
+	git tag -a -m "annotated tag" v1 &&
+	git cat-file --no-mailmap -p v1 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger Orig <orig@example.com>
+	EOF
+	git tag -a -m "annotated tag" v2 &&
+	git cat-file --mailmap -p v2 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.37.1.120.g001f220fb8


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v5 1/4] revision: improve commit_rewrite_person()
  2022-07-16  7:40         ` [PATCH v5 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-17 22:11           ` Junio C Hamano
  0 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2022-07-17 22:11 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, avarab,
	Johannes.Schindelin, johncai86

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> +/*
> + * Returns the difference between the new and old length of the ident line.
> + */
> +static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
> +								  struct string_list *mailmap)

Line-folding is a good idea, but why do we use such a deep
indentation?  In this project, tab-width is 8.

> +static void commit_rewrite_person(struct strbuf *buf, const char **header,
> +								  struct string_list *mailmap)

Likewise.

> +{
> +	size_t buf_offset = 0;
> +
> +	if (!mailmap)
> +		return;
> +
> +	for (;;) {
> +		const char *person, *line;
> +		size_t i;
> +
> +		line = buf->buf + buf_offset;
> +		if (!*line || *line == '\n')
> +			return; /* End of header */
> +
> +		for (i = 0; header[i]; i++)
> +			if (skip_prefix(line, header[i], &person)) {
> +				rewrite_ident_line(person, buf, mailmap);

If the return value of rewrite_ident_line() is never used, perhaps
stop computing the return value in that function and make it return
"void".  I personally thought it was clever to return "how much does
the ident part grew/shrunk?" from the helper and use it to adjust,
but I do not mind to scrap the clever-ness if some folks may find it
harder to understand.

> +				break;
> +			}
> +
> +		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;

And this is a "easier-to-understand but need to scan the buffer once
again, only to figure out what we ought to already know" version.

> +		if (buf->buf[buf_offset] == '\n')
> +			++buf_offset;

Prefer post-increment when there is no reason to favor pre-increment
over post-increment, i.e. write this as "buf_offset++".

> +	}
> +}
> +
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;
> @@ -3832,11 +3862,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  		strbuf_addstr(&buf, message);
>  
>  	if (opt->grep_filter.header_list && opt->mailmap) {
> +		const char *commit_headers[] = { "author ", "committer ", NULL };
> +
>  		if (!buf.len)
>  			strbuf_addstr(&buf, message);
>  
> -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
> -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
> +		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
>  	}
>  
>  	/* Append "fake" message parts as needed */

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v6 0/4] Add support for mailmap in cat-file
  2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
                           ` (3 preceding siblings ...)
  2022-07-16  7:40         ` [PATCH v5 4/4] cat-file: add mailmap support Siddharth Asthana
@ 2022-07-18 19:50         ` Siddharth Asthana
  2022-07-18 19:50           ` [PATCH v6 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
                             ` (4 more replies)
  4 siblings, 5 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-18 19:50 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

Thanks a lot Junio for your suggestion :) I have made the suggested
changes.

= Description

This patch series adds mailmap support to the git-cat-file command. It
adds the mailmap support only for the commit and tag objects by
replacing the idents for "author", "committer" and "tagger" headers. The
mailmap only takes effect when --[no-]-use-mailmap or --[no-]-mailmap
option is passed to the git cat-file command. The changes will work with
the batch mode as well.

So, if one wants to enable mailmap they can use either of the following
commands:
$ git cat-file --use-mailmap -p <object>
$ git cat-file --use-mailmap <type> <object>

To use it in the batch mode, one can use the following command:
$ git cat-file --use-mailmap --batch

= Patch Organization

- The first patch improves the commit_rewrite_person() by restricting it 
  to traverse only through the header part of the commit object buffer.
  It also adds an argument called headers which the callers can pass. 
  The function will replace idents only on these  passed headers. 
  Thus, the caller won't have to make repeated calls to the function.
- The second patch moves commit_rewrite_person() to ident.c to expose it
  as a public function so that it can be used to replace idents in the
  headers of desired objects.
- The third patch renames commit_rewrite_person() to a name which
  describes its functionality clearly. It is renamed to
  apply_mailmap_to_header().
- The last patch adds mailmap support to the git cat-file command. It
  adds the required documentation and tests as well.

Changes in v6:
- The function rewrite_ident_line() returns the difference between the
  new and the old length of the ident line. We were not using this
  information and instead parsing the buffer again to look for the line
  ending. This patch set starts using that information to update the
  buf_offset value in commit_rewrite_person().
- This patch set also tweaks the commit_rewrite_person() so that it is
  easier to understand and avoids unnecessary parsing of the buffer
  wherever possible.

Siddharth Asthana (4):
  revision: improve commit_rewrite_person()
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++
 builtin/cat-file.c             | 43 +++++++++++++++++++-
 cache.h                        |  6 +++
 ident.c                        | 74 ++++++++++++++++++++++++++++++++++
 revision.c                     | 50 ++---------------------
 t/t4203-mailmap.sh             | 59 +++++++++++++++++++++++++++
 6 files changed, 190 insertions(+), 48 deletions(-)

Range-diff against v5:
1:  8c29ad9351 ! 1:  7c086e4c8a revision: improve commit_rewrite_person()
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     +/*
     + * Returns the difference between the new and old length of the ident line.
     + */
    -+static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    -+								  struct string_list *mailmap)
    ++static ssize_t rewrite_ident_line(const char *person, size_t len,
    ++				   struct strbuf *buf,
    ++				   struct string_list *mailmap)
      {
     -	char *person, *endp;
    -+	char *endp;
    - 	size_t len, namelen, maillen;
    +-	size_t len, namelen, maillen;
    ++	size_t namelen, maillen;
      	const char *name;
      	const char *mail;
      	struct ident_split ident;
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -		return 0;
     -
     -	person += strlen(what);
    - 	endp = strchr(person, '\n');
    - 	if (!endp)
    +-	endp = strchr(person, '\n');
    +-	if (!endp)
    +-		return 0;
    +-
    +-	len = endp - person;
    +-
    + 	if (split_ident_line(&ident, person, len))
      		return 0;
    + 
     @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
      
      	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
    @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *wha
      		strbuf_addf(&namemail, "%.*s <%.*s>",
      			    (int)namelen, name, (int)maillen, mail);
     @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
    + 		strbuf_splice(buf, ident.name_begin - buf->buf,
      			      ident.mail_end - ident.name_begin + 1,
      			      namemail.buf, namemail.len);
    - 
     +		newlen = namemail.len;
    -+
    + 
      		strbuf_release(&namemail);
      
     -		return 1;
    -+		return newlen - (ident.mail_end - ident.name_begin + 1);
    ++		return newlen - (ident.mail_end - ident.name_begin);
      	}
      
      	return 0;
      }
      
     +static void commit_rewrite_person(struct strbuf *buf, const char **header,
    -+								  struct string_list *mailmap)
    ++				   struct string_list *mailmap)
     +{
     +	size_t buf_offset = 0;
     +
    @@ revision.c: static int commit_rewrite_person(struct strbuf *buf, const char *wha
     +	for (;;) {
     +		const char *person, *line;
     +		size_t i;
    ++		int found_header = 0;
     +
     +		line = buf->buf + buf_offset;
     +		if (!*line || *line == '\n')
    -+			return; /* End of header */
    ++			return; /* End of headers */
     +
     +		for (i = 0; header[i]; i++)
     +			if (skip_prefix(line, header[i], &person)) {
    -+				rewrite_ident_line(person, buf, mailmap);
    ++				const char *endp = strchrnul(person, '\n');
    ++				found_header = 1;
    ++				buf_offset += endp - line;
    ++				buf_offset += rewrite_ident_line(person, endp - person, buf, mailmap);
     +				break;
     +			}
     +
    -+		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
    -+		if (buf->buf[buf_offset] == '\n')
    -+			++buf_offset;
    ++		if (!found_header) {
    ++			buf_offset = strchrnul(line, '\n') - buf->buf;
    ++			if (buf->buf[buf_offset] == '\n')
    ++				buf_offset++;
    ++		}
     +	}
     +}
     +
2:  ccb7f72fcb ! 2:  2f8fba7f57 ident: move commit_rewrite_person() to ident.c
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +/*
     + * Returns the difference between the new and old length of the ident line.
     + */
    -+static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    -+								  struct string_list *mailmap)
    ++static ssize_t rewrite_ident_line(const char *person, size_t len,
    ++				   struct strbuf *buf,
    ++				   struct string_list *mailmap)
     +{
    -+	char *endp;
    -+	size_t len, namelen, maillen;
    ++	size_t namelen, maillen;
     +	const char *name;
     +	const char *mail;
     +	struct ident_split ident;
     +
    -+	endp = strchr(person, '\n');
    -+	if (!endp)
    -+		return 0;
    -+
    -+	len = endp - person;
    -+
     +	if (split_ident_line(&ident, person, len))
     +		return 0;
     +
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +		strbuf_splice(buf, ident.name_begin - buf->buf,
     +			      ident.mail_end - ident.name_begin + 1,
     +			      namemail.buf, namemail.len);
    -+
     +		newlen = namemail.len;
     +
     +		strbuf_release(&namemail);
     +
    -+		return newlen - (ident.mail_end - ident.name_begin + 1);
    ++		return newlen - (ident.mail_end - ident.name_begin);
     +	}
     +
     +	return 0;
     +}
     +
     +void commit_rewrite_person(struct strbuf *buf, const char **header,
    -+						   struct string_list *mailmap)
    ++			    struct string_list *mailmap)
     +{
     +	size_t buf_offset = 0;
     +
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +	for (;;) {
     +		const char *person, *line;
     +		size_t i;
    ++		int found_header = 0;
     +
     +		line = buf->buf + buf_offset;
     +		if (!*line || *line == '\n')
    -+			return; /* End of header */
    ++			return; /* End of headers */
     +
     +		for (i = 0; header[i]; i++)
     +			if (skip_prefix(line, header[i], &person)) {
    -+				rewrite_ident_line(person, buf, mailmap);
    ++				const char *endp = strchrnul(person, '\n');
    ++				found_header = 1;
    ++				buf_offset += endp - line;
    ++				buf_offset += rewrite_ident_line(person, endp - person, buf, mailmap);
     +				break;
     +			}
     +
    -+		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
    -+		if (buf->buf[buf_offset] == '\n')
    -+			++buf_offset;
    ++		if (!found_header) {
    ++			buf_offset = strchrnul(line, '\n') - buf->buf;
    ++			if (buf->buf[buf_offset] == '\n')
    ++				buf_offset++;
    ++		}
     +	}
     +}
      
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -/*
     - * Returns the difference between the new and old length of the ident line.
     - */
    --static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    --								  struct string_list *mailmap)
    +-static ssize_t rewrite_ident_line(const char *person, size_t len,
    +-				   struct strbuf *buf,
    +-				   struct string_list *mailmap)
     -{
    --	char *endp;
    --	size_t len, namelen, maillen;
    +-	size_t namelen, maillen;
     -	const char *name;
     -	const char *mail;
     -	struct ident_split ident;
     -
    --	endp = strchr(person, '\n');
    --	if (!endp)
    --		return 0;
    --
    --	len = endp - person;
    --
     -	if (split_ident_line(&ident, person, len))
     -		return 0;
     -
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -		strbuf_splice(buf, ident.name_begin - buf->buf,
     -			      ident.mail_end - ident.name_begin + 1,
     -			      namemail.buf, namemail.len);
    --
     -		newlen = namemail.len;
     -
     -		strbuf_release(&namemail);
     -
    --		return newlen - (ident.mail_end - ident.name_begin + 1);
    +-		return newlen - (ident.mail_end - ident.name_begin);
     -	}
     -
     -	return 0;
     -}
     -
     -static void commit_rewrite_person(struct strbuf *buf, const char **header,
    --								  struct string_list *mailmap)
    +-				   struct string_list *mailmap)
     -{
     -	size_t buf_offset = 0;
     -
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -	for (;;) {
     -		const char *person, *line;
     -		size_t i;
    +-		int found_header = 0;
     -
     -		line = buf->buf + buf_offset;
     -		if (!*line || *line == '\n')
    --			return; /* End of header */
    +-			return; /* End of headers */
     -
     -		for (i = 0; header[i]; i++)
     -			if (skip_prefix(line, header[i], &person)) {
    --				rewrite_ident_line(person, buf, mailmap);
    +-				const char *endp = strchrnul(person, '\n');
    +-				found_header = 1;
    +-				buf_offset += endp - line;
    +-				buf_offset += rewrite_ident_line(person, endp - person, buf, mailmap);
     -				break;
     -			}
     -
    --		buf_offset = strchrnul(buf->buf + buf_offset, '\n') - buf->buf;
    --		if (buf->buf[buf_offset] == '\n')
    --			++buf_offset;
    +-		if (!found_header) {
    +-			buf_offset = strchrnul(line, '\n') - buf->buf;
    +-			if (buf->buf[buf_offset] == '\n')
    +-				buf_offset++;
    +-		}
     -	}
     -}
     -
3:  38c18fd10d ! 3:  b4f2477b14 ident: rename commit_rewrite_person() to apply_mailmap_to_header()
    @@ cache.h: struct ident_split {
       * Compare split idents for equality or strict ordering. Note that we
     
      ## ident.c ##
    -@@ ident.c: static ssize_t rewrite_ident_line(const char *person, struct strbuf *buf,
    +@@ ident.c: static ssize_t rewrite_ident_line(const char *person, size_t len,
      	return 0;
      }
      
     -void commit_rewrite_person(struct strbuf *buf, const char **header,
    --						   struct string_list *mailmap)
    +-			    struct string_list *mailmap)
     +void apply_mailmap_to_header(struct strbuf *buf, const char **header,
    -+							 struct string_list *mailmap)
    ++			       struct string_list *mailmap)
      {
      	size_t buf_offset = 0;
      
4:  0a459d4c53 = 4:  63d6f8c201 cat-file: add mailmap support
-- 
2.37.1.120.g63d6f8c201


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v6 1/4] revision: improve commit_rewrite_person()
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
@ 2022-07-18 19:50           ` Siddharth Asthana
  2022-07-18 19:51           ` [PATCH v6 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-18 19:50 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 revision.c | 64 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/revision.c b/revision.c
index 211352795c..3418a1b7f1 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,25 +3755,18 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char *person, size_t len,
+				   struct strbuf *buf,
+				   struct string_list *mailmap)
 {
-	char *person, *endp;
-	size_t len, namelen, maillen;
+	size_t namelen, maillen;
 	const char *name;
 	const char *mail;
 	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
-	endp = strchr(person, '\n');
-	if (!endp)
-		return 0;
-
-	len = endp - person;
-
 	if (split_ident_line(&ident, person, len))
 		return 0;
 
@@ -3784,6 +3777,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
 
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
@@ -3791,15 +3785,50 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 		strbuf_splice(buf, ident.name_begin - buf->buf,
 			      ident.mail_end - ident.name_begin + 1,
 			      namemail.buf, namemail.len);
+		newlen = namemail.len;
 
 		strbuf_release(&namemail);
 
-		return 1;
+		return newlen - (ident.mail_end - ident.name_begin);
 	}
 
 	return 0;
 }
 
+static void commit_rewrite_person(struct strbuf *buf, const char **header,
+				   struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i;
+		int found_header = 0;
+
+		line = buf->buf + buf_offset;
+		if (!*line || *line == '\n')
+			return; /* End of headers */
+
+		for (i = 0; header[i]; i++)
+			if (skip_prefix(line, header[i], &person)) {
+				const char *endp = strchrnul(person, '\n');
+				found_header = 1;
+				buf_offset += endp - line;
+				buf_offset += rewrite_ident_line(person, endp - person, buf, mailmap);
+				break;
+			}
+
+		if (!found_header) {
+			buf_offset = strchrnul(line, '\n') - buf->buf;
+			if (buf->buf[buf_offset] == '\n')
+				buf_offset++;
+		}
+	}
+}
+
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
@@ -3832,11 +3861,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.1.120.g63d6f8c201


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v6 2/4] ident: move commit_rewrite_person() to ident.c
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
  2022-07-18 19:50           ` [PATCH v6 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
@ 2022-07-18 19:51           ` Siddharth Asthana
  2022-07-18 19:51           ` [PATCH v6 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-18 19:51 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    |  6 +++++
 ident.c    | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.c | 74 ------------------------------------------------------
 3 files changed, 80 insertions(+), 74 deletions(-)

diff --git a/cache.h b/cache.h
index ac5ab4ef9d..16a08aada2 100644
--- a/cache.h
+++ b/cache.h
@@ -1688,6 +1688,12 @@ struct ident_split {
  */
 int split_ident_line(struct ident_split *, const char *, int);
 
+/*
+ * Given a commit object buffer and the commit headers, replaces the idents
+ * in the headers with their canonical versions using the mailmap mechanism.
+ */
+void commit_rewrite_person(struct strbuf *, const char **, struct string_list *);
+
 /*
  * Compare split idents for equality or strict ordering. Note that we
  * compare only the ident part of the line, ignoring any timestamp.
diff --git a/ident.c b/ident.c
index 89ca5b4700..1eee4fd0e3 100644
--- a/ident.c
+++ b/ident.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "config.h"
 #include "date.h"
+#include "mailmap.h"
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
@@ -346,6 +347,79 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 	return 0;
 }
 
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char *person, size_t len,
+				   struct strbuf *buf,
+				   struct string_list *mailmap)
+{
+	size_t namelen, maillen;
+	const char *name;
+	const char *mail;
+	struct ident_split ident;
+
+	if (split_ident_line(&ident, person, len))
+		return 0;
+
+	mail = ident.mail_begin;
+	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name_begin;
+	namelen = ident.name_end - ident.name_begin;
+
+	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
+		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
+
+		strbuf_addf(&namemail, "%.*s <%.*s>",
+			    (int)namelen, name, (int)maillen, mail);
+
+		strbuf_splice(buf, ident.name_begin - buf->buf,
+			      ident.mail_end - ident.name_begin + 1,
+			      namemail.buf, namemail.len);
+		newlen = namemail.len;
+
+		strbuf_release(&namemail);
+
+		return newlen - (ident.mail_end - ident.name_begin);
+	}
+
+	return 0;
+}
+
+void commit_rewrite_person(struct strbuf *buf, const char **header,
+			    struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i;
+		int found_header = 0;
+
+		line = buf->buf + buf_offset;
+		if (!*line || *line == '\n')
+			return; /* End of headers */
+
+		for (i = 0; header[i]; i++)
+			if (skip_prefix(line, header[i], &person)) {
+				const char *endp = strchrnul(person, '\n');
+				found_header = 1;
+				buf_offset += endp - line;
+				buf_offset += rewrite_ident_line(person, endp - person, buf, mailmap);
+				break;
+			}
+
+		if (!found_header) {
+			buf_offset = strchrnul(line, '\n') - buf->buf;
+			if (buf->buf[buf_offset] == '\n')
+				buf_offset++;
+		}
+	}
+}
 
 static void ident_env_hint(enum want_ident whose_ident)
 {
diff --git a/revision.c b/revision.c
index 3418a1b7f1..14dca903b6 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,80 +3755,6 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-/*
- * Returns the difference between the new and old length of the ident line.
- */
-static ssize_t rewrite_ident_line(const char *person, size_t len,
-				   struct strbuf *buf,
-				   struct string_list *mailmap)
-{
-	size_t namelen, maillen;
-	const char *name;
-	const char *mail;
-	struct ident_split ident;
-
-	if (split_ident_line(&ident, person, len))
-		return 0;
-
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-
-	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
-		struct strbuf namemail = STRBUF_INIT;
-		size_t newlen;
-
-		strbuf_addf(&namemail, "%.*s <%.*s>",
-			    (int)namelen, name, (int)maillen, mail);
-
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
-			      namemail.buf, namemail.len);
-		newlen = namemail.len;
-
-		strbuf_release(&namemail);
-
-		return newlen - (ident.mail_end - ident.name_begin);
-	}
-
-	return 0;
-}
-
-static void commit_rewrite_person(struct strbuf *buf, const char **header,
-				   struct string_list *mailmap)
-{
-	size_t buf_offset = 0;
-
-	if (!mailmap)
-		return;
-
-	for (;;) {
-		const char *person, *line;
-		size_t i;
-		int found_header = 0;
-
-		line = buf->buf + buf_offset;
-		if (!*line || *line == '\n')
-			return; /* End of headers */
-
-		for (i = 0; header[i]; i++)
-			if (skip_prefix(line, header[i], &person)) {
-				const char *endp = strchrnul(person, '\n');
-				found_header = 1;
-				buf_offset += endp - line;
-				buf_offset += rewrite_ident_line(person, endp - person, buf, mailmap);
-				break;
-			}
-
-		if (!found_header) {
-			buf_offset = strchrnul(line, '\n') - buf->buf;
-			if (buf->buf[buf_offset] == '\n')
-				buf_offset++;
-		}
-	}
-}
-
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
-- 
2.37.1.120.g63d6f8c201


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v6 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
  2022-07-18 19:50           ` [PATCH v6 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
  2022-07-18 19:51           ` [PATCH v6 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
@ 2022-07-18 19:51           ` Siddharth Asthana
  2022-07-18 19:51           ` [PATCH v6 4/4] cat-file: add mailmap support Siddharth Asthana
  2022-07-25 18:58           ` [PATCH v6 0/4] Add support for mailmap in cat-file Junio C Hamano
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-18 19:51 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana

commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 cache.h    | 6 +++---
 ident.c    | 4 ++--
 revision.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index 16a08aada2..4aa1bd079d 100644
--- a/cache.h
+++ b/cache.h
@@ -1689,10 +1689,10 @@ struct ident_split {
 int split_ident_line(struct ident_split *, const char *, int);
 
 /*
- * Given a commit object buffer and the commit headers, replaces the idents
- * in the headers with their canonical versions using the mailmap mechanism.
+ * Given a commit or tag object buffer and the commit or tag headers, replaces
+ * the idents in the headers with their canonical versions using the mailmap mechanism.
  */
-void commit_rewrite_person(struct strbuf *, const char **, struct string_list *);
+void apply_mailmap_to_header(struct strbuf *, const char **, struct string_list *);
 
 /*
  * Compare split idents for equality or strict ordering. Note that we
diff --git a/ident.c b/ident.c
index 1eee4fd0e3..7f66beda42 100644
--- a/ident.c
+++ b/ident.c
@@ -387,8 +387,8 @@ static ssize_t rewrite_ident_line(const char *person, size_t len,
 	return 0;
 }
 
-void commit_rewrite_person(struct strbuf *buf, const char **header,
-			    struct string_list *mailmap)
+void apply_mailmap_to_header(struct strbuf *buf, const char **header,
+			       struct string_list *mailmap)
 {
 	size_t buf_offset = 0;
 
diff --git a/revision.c b/revision.c
index 14dca903b6..6ad3665204 100644
--- a/revision.c
+++ b/revision.c
@@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
+		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */
-- 
2.37.1.120.g63d6f8c201


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v6 4/4] cat-file: add mailmap support
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
                             ` (2 preceding siblings ...)
  2022-07-18 19:51           ` [PATCH v6 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
@ 2022-07-18 19:51           ` Siddharth Asthana
  2022-07-25 18:58           ` [PATCH v6 0/4] Add support for mailmap in cat-file Junio C Hamano
  4 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-18 19:51 UTC (permalink / raw)
  To: git
  Cc: phillip.wood123, congdanhqx, christian.couder, avarab, gitster,
	Johannes.Schindelin, johncai86, Siddharth Asthana, Phillip Wood

git-cat-file is used by tools like GitLab to get commit tag contents
that are then displayed to users. This content which has author,
committer or tagger information, could benefit from passing through the
mailmap mechanism before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

This patch also introduces new test cases to test the mailmap mechanism in
git cat-file command.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 Documentation/git-cat-file.txt |  6 ++++
 builtin/cat-file.c             | 43 ++++++++++++++++++++++++-
 t/t4203-mailmap.sh             | 59 ++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 24a811f0ef..1880e9bba1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -63,6 +63,12 @@ OPTIONS
 	or to ask for a "blob" with `<object>` being a tag object that
 	points at it.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+       Use mailmap file to map author, committer and tagger names
+       and email addresses to canonical real names and email addresses.
+       See linkgit:git-shortlog[1].
+
 --textconv::
 	Show the content as transformed by a textconv filter. In this case,
 	`<object>` has to be of the form `<tree-ish>:<path>`, or `:<path>` in
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 50cf38999d..4b68216b51 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "mailmap.h"
 
 enum batch_mode {
 	BATCH_MODE_CONTENTS,
@@ -36,6 +37,22 @@ struct batch_options {
 
 static const char *force_path;
 
+static struct string_list mailmap = STRING_LIST_INIT_NODUP;
+static int use_mailmap;
+
+static char *replace_idents_using_mailmap(char *, size_t *);
+
+static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+{
+	struct strbuf sb = STRBUF_INIT;
+	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
+	apply_mailmap_to_header(&sb, headers, &mailmap);
+	*size = sb.len;
+	return strbuf_detach(&sb, NULL);
+}
+
 static int filter_object(const char *path, unsigned mode,
 			 const struct object_id *oid,
 			 char **buf, unsigned long *size)
@@ -152,6 +169,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		if (!buf)
 			die("Cannot read object %s", obj_name);
 
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		/* otherwise just spit out the data */
 		break;
 
@@ -183,6 +206,12 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 		buf = read_object_with_reference(the_repository, &oid,
 						 exp_type_id, &size, NULL);
+
+		if (use_mailmap) {
+			size_t s = size;
+			buf = replace_idents_using_mailmap(buf, &s);
+			size = cast_size_t_to_ulong(s);
+		}
 		break;
 	}
 	default:
@@ -348,11 +377,18 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 		void *contents;
 
 		contents = read_object_file(oid, &type, &size);
+
+		if (use_mailmap) {
+			size_t s = size;
+			contents = replace_idents_using_mailmap(contents, &s);
+			size = cast_size_t_to_ulong(s);
+		}
+
 		if (!contents)
 			die("object %s disappeared", oid_to_hex(oid));
 		if (type != data->type)
 			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
+		if (data->info.sizep && size != data->size && !use_mailmap)
 			die("object %s changed size!?", oid_to_hex(oid));
 
 		batch_write(opt, contents, size);
@@ -843,6 +879,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		OPT_CMDMODE('s', NULL, &opt, N_("show object size"), 's'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "use-mailmap", &use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		/* Batch mode */
 		OPT_GROUP(N_("Batch objects requested on stdin (or --batch-all-objects)")),
 		OPT_CALLBACK_F(0, "batch", &batch, N_("format"),
@@ -885,6 +923,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	opt_cw = (opt == 'c' || opt == 'w');
 	opt_epts = (opt == 'e' || opt == 'p' || opt == 't' || opt == 's');
 
+	if (use_mailmap)
+		read_mailmap(&mailmap);
+
 	/* --batch-all-objects? */
 	if (opt == 'b')
 		batch.all_objects = 1;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 0b2d21ec55..cd1cab3e54 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -963,4 +963,63 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
 	test_cmp expect actual
 '
 
+test_expect_success 'prepare for cat-file --mailmap' '
+	rm -f .mailmap &&
+	git commit --allow-empty -m foo --author="Orig <orig@example.com>"
+'
+
+test_expect_success '--no-use-mailmap disables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author Orig <orig@example.com>
+	EOF
+	git cat-file --no-use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--use-mailmap enables mailmap in cat-file' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	A U Thor <author@example.com> Orig <orig@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	author A U Thor <author@example.com>
+	EOF
+	git cat-file --use-mailmap commit HEAD >log &&
+	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger C O Mitter <committer@example.com>
+	EOF
+	git tag -a -m "annotated tag" v1 &&
+	git cat-file --no-mailmap -p v1 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	Orig <orig@example.com> C O Mitter <committer@example.com>
+	EOF
+	cat >expect <<-EOF &&
+	tagger Orig <orig@example.com>
+	EOF
+	git tag -a -m "annotated tag" v2 &&
+	git cat-file --mailmap -p v2 >log &&
+	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.37.1.120.g63d6f8c201


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v6 0/4] Add support for mailmap in cat-file
  2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
                             ` (3 preceding siblings ...)
  2022-07-18 19:51           ` [PATCH v6 4/4] cat-file: add mailmap support Siddharth Asthana
@ 2022-07-25 18:58           ` Junio C Hamano
  2022-07-28 19:07             ` Christian Couder
  4 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2022-07-25 18:58 UTC (permalink / raw)
  To: Siddharth Asthana
  Cc: git, phillip.wood123, congdanhqx, christian.couder, avarab,
	Johannes.Schindelin, johncai86

Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> Changes in v6:
> - The function rewrite_ident_line() returns the difference between the
>   new and the old length of the ident line. We were not using this
>   information and instead parsing the buffer again to look for the line
>   ending. This patch set starts using that information to update the
>   buf_offset value in commit_rewrite_person().
> - This patch set also tweaks the commit_rewrite_person() so that it is
>   easier to understand and avoids unnecessary parsing of the buffer
>   wherever possible.
>
> Siddharth Asthana (4):
>   revision: improve commit_rewrite_person()
>   ident: move commit_rewrite_person() to ident.c
>   ident: rename commit_rewrite_person() to apply_mailmap_to_header()
>   cat-file: add mailmap support
>
>  Documentation/git-cat-file.txt |  6 +++
>  builtin/cat-file.c             | 43 +++++++++++++++++++-
>  cache.h                        |  6 +++
>  ident.c                        | 74 ++++++++++++++++++++++++++++++++++
>  revision.c                     | 50 ++---------------------
>  t/t4203-mailmap.sh             | 59 +++++++++++++++++++++++++++
>  6 files changed, 190 insertions(+), 48 deletions(-)

I haven't seen any comments or objections to this round.  Are people
happy about it going forward?  I am planning to merge it to 'next'
and down to 'master' soonish.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6 0/4] Add support for mailmap in cat-file
  2022-07-25 18:58           ` [PATCH v6 0/4] Add support for mailmap in cat-file Junio C Hamano
@ 2022-07-28 19:07             ` Christian Couder
  2022-07-28 19:32               ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Christian Couder @ 2022-07-28 19:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Siddharth Asthana, git, Phillip Wood,
	Đoàn Trần Công Danh,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	John Cai

On Mon, Jul 25, 2022 at 8:58 PM Junio C Hamano <gitster@pobox.com> wrote:
> Siddharth Asthana <siddharthasthana31@gmail.com> writes:
>
> > Changes in v6:
> > - The function rewrite_ident_line() returns the difference between the
> >   new and the old length of the ident line. We were not using this
> >   information and instead parsing the buffer again to look for the line
> >   ending. This patch set starts using that information to update the
> >   buf_offset value in commit_rewrite_person().
> > - This patch set also tweaks the commit_rewrite_person() so that it is
> >   easier to understand and avoids unnecessary parsing of the buffer
> >   wherever possible.
> >
> > Siddharth Asthana (4):
> >   revision: improve commit_rewrite_person()
> >   ident: move commit_rewrite_person() to ident.c
> >   ident: rename commit_rewrite_person() to apply_mailmap_to_header()
> >   cat-file: add mailmap support
> >
> >  Documentation/git-cat-file.txt |  6 +++
> >  builtin/cat-file.c             | 43 +++++++++++++++++++-
> >  cache.h                        |  6 +++
> >  ident.c                        | 74 ++++++++++++++++++++++++++++++++++
> >  revision.c                     | 50 ++---------------------
> >  t/t4203-mailmap.sh             | 59 +++++++++++++++++++++++++++
> >  6 files changed, 190 insertions(+), 48 deletions(-)
>
> I haven't seen any comments or objections to this round.  Are people
> happy about it going forward?  I am planning to merge it to 'next'
> and down to 'master' soonish.

I am biased, but I am happy with the current state of this patch
series. During the last versions of this patch series there were only
comments related to the first patch in the series (revision: improve
commit_rewrite_person()). It seems to me that they were all properly
taken into account, and that the code in that patch is now correct and
relatively simple to understand.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6 0/4] Add support for mailmap in cat-file
  2022-07-28 19:07             ` Christian Couder
@ 2022-07-28 19:32               ` Junio C Hamano
  2022-07-30  7:50                 ` Siddharth Asthana
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2022-07-28 19:32 UTC (permalink / raw)
  To: Christian Couder
  Cc: Siddharth Asthana, git, Phillip Wood,
	Đoàn Trần Công Danh,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	John Cai

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Jul 25, 2022 at 8:58 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Siddharth Asthana <siddharthasthana31@gmail.com> writes:
>>
>> > Changes in v6:
>> > - The function rewrite_ident_line() returns the difference between the
>> >   new and the old length of the ident line. We were not using this
>> >   information and instead parsing the buffer again to look for the line
>> >   ending. This patch set starts using that information to update the
>> >   buf_offset value in commit_rewrite_person().
>> > - This patch set also tweaks the commit_rewrite_person() so that it is
>> >   easier to understand and avoids unnecessary parsing of the buffer
>> >   wherever possible.
>> >
>> > Siddharth Asthana (4):
>> >   revision: improve commit_rewrite_person()
>> >   ident: move commit_rewrite_person() to ident.c
>> >   ident: rename commit_rewrite_person() to apply_mailmap_to_header()
>> >   cat-file: add mailmap support
>> >
>> >  Documentation/git-cat-file.txt |  6 +++
>> >  builtin/cat-file.c             | 43 +++++++++++++++++++-
>> >  cache.h                        |  6 +++
>> >  ident.c                        | 74 ++++++++++++++++++++++++++++++++++
>> >  revision.c                     | 50 ++---------------------
>> >  t/t4203-mailmap.sh             | 59 +++++++++++++++++++++++++++
>> >  6 files changed, 190 insertions(+), 48 deletions(-)
>>
>> I haven't seen any comments or objections to this round.  Are people
>> happy about it going forward?  I am planning to merge it to 'next'
>> and down to 'master' soonish.
>
> I am biased, but I am happy with the current state of this patch
> series. During the last versions of this patch series there were only
> comments related to the first patch in the series (revision: improve
> commit_rewrite_person()). It seems to me that they were all properly
> taken into account, and that the code in that patch is now correct and
> relatively simple to understand.

Thanks, let's move it forward.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v6 0/4] Add support for mailmap in cat-file
  2022-07-28 19:32               ` Junio C Hamano
@ 2022-07-30  7:50                 ` Siddharth Asthana
  0 siblings, 0 replies; 68+ messages in thread
From: Siddharth Asthana @ 2022-07-30  7:50 UTC (permalink / raw)
  To: Junio C Hamano, Christian Couder
  Cc: git, Phillip Wood, Đoàn Trần Công Danh,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	John Cai



On 29/07/22 01:02, Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> 
>> On Mon, Jul 25, 2022 at 8:58 PM Junio C Hamano <gitster@pobox.com> wrote:
>>> Siddharth Asthana <siddharthasthana31@gmail.com> writes:
>>>
>>>> Changes in v6:
>>>> - The function rewrite_ident_line() returns the difference between the
>>>>    new and the old length of the ident line. We were not using this
>>>>    information and instead parsing the buffer again to look for the line
>>>>    ending. This patch set starts using that information to update the
>>>>    buf_offset value in commit_rewrite_person().
>>>> - This patch set also tweaks the commit_rewrite_person() so that it is
>>>>    easier to understand and avoids unnecessary parsing of the buffer
>>>>    wherever possible.
>>>>
>>>> Siddharth Asthana (4):
>>>>    revision: improve commit_rewrite_person()
>>>>    ident: move commit_rewrite_person() to ident.c
>>>>    ident: rename commit_rewrite_person() to apply_mailmap_to_header()
>>>>    cat-file: add mailmap support
>>>>
>>>>   Documentation/git-cat-file.txt |  6 +++
>>>>   builtin/cat-file.c             | 43 +++++++++++++++++++-
>>>>   cache.h                        |  6 +++
>>>>   ident.c                        | 74 ++++++++++++++++++++++++++++++++++
>>>>   revision.c                     | 50 ++---------------------
>>>>   t/t4203-mailmap.sh             | 59 +++++++++++++++++++++++++++
>>>>   6 files changed, 190 insertions(+), 48 deletions(-)
>>>
>>> I haven't seen any comments or objections to this round.  Are people
>>> happy about it going forward?  I am planning to merge it to 'next'
>>> and down to 'master' soonish.
>>
>> I am biased, but I am happy with the current state of this patch
>> series. During the last versions of this patch series there were only
>> comments related to the first patch in the series (revision: improve
>> commit_rewrite_person()). It seems to me that they were all properly
>> taken into account, and that the code in that patch is now correct and
>> relatively simple to understand.
> 
> Thanks, let's move it forward.
Thanks a lot Junio, Phillip, Đoàn, Ævar, Johannes, Christian and John 
for helping me with the reviews and making this patch better. Thanks a 
lot for accepting it :)

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2022-07-30  7:51 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 14:24 [PATCH 0/3] Add support for mailmap in cat-file Siddharth Asthana
2022-06-30 14:24 ` [PATCH 1/3] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
2022-06-30 16:00   ` Đoàn Trần Công Danh
2022-06-30 23:22   ` Junio C Hamano
2022-06-30 14:24 ` [PATCH 2/3] ident: rename commit_rewrite_person() to rewrite_ident_line() Siddharth Asthana
2022-06-30 15:33   ` Phillip Wood
2022-06-30 16:55     ` Christian Couder
2022-06-30 23:31   ` Junio C Hamano
2022-06-30 14:24 ` [PATCH 3/3] cat-file: add mailmap support Siddharth Asthana
2022-06-30 15:50   ` Phillip Wood
2022-06-30 16:36     ` Phillip Wood
2022-06-30 17:07     ` Christian Couder
2022-06-30 21:33       ` Junio C Hamano
2022-07-07  9:15         ` Christian Couder
2022-06-30 23:36   ` Ævar Arnfjörð Bjarmason
2022-06-30 23:53     ` Junio C Hamano
2022-07-07  9:02     ` Christian Couder
2022-06-30 23:41   ` Junio C Hamano
2022-06-30 21:18 ` [PATCH 0/3] Add support for mailmap in cat-file Junio C Hamano
2022-07-07 16:15 ` [PATCH v2 0/4] " Siddharth Asthana
2022-07-07 16:15   ` [PATCH v2 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
2022-07-07 21:52     ` Junio C Hamano
2022-07-08 14:50     ` Đoàn Trần Công Danh
     [not found]       ` <CAP8UFD116xMnp27pxW8WNDf6PRJxnnwWtcy2TNHU_KyV2ZVA1g@mail.gmail.com>
2022-07-09  1:02         ` Đoàn Trần Công Danh
2022-07-09  5:04           ` Christian Couder
2022-07-07 16:15   ` [PATCH v2 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
2022-07-07 16:15   ` [PATCH v2 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
2022-07-07 16:15   ` [PATCH v2 4/4] cat-file: add mailmap support Siddharth Asthana
2022-07-07 21:55     ` Junio C Hamano
2022-07-08 11:53     ` Johannes Schindelin
2022-07-07 22:06   ` [PATCH v2 0/4] Add support for mailmap in cat-file Junio C Hamano
2022-07-07 22:58     ` Junio C Hamano
2022-07-09 15:41   ` [PATCH v3 " Siddharth Asthana
2022-07-09 15:41     ` [PATCH v3 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
2022-07-12 16:29       ` Johannes Schindelin
2022-07-09 15:41     ` [PATCH v3 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
2022-07-09 15:41     ` [PATCH v3 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
2022-07-09 15:41     ` [PATCH v3 4/4] cat-file: add mailmap support Siddharth Asthana
2022-07-10  5:34     ` [PATCH v3 0/4] Add support for mailmap in cat-file Junio C Hamano
2022-07-12 12:34       ` Johannes Schindelin
2022-07-12 14:16         ` Junio C Hamano
2022-07-12 16:01           ` Siddharth Asthana
2022-07-12 16:06           ` Junio C Hamano
2022-07-12 16:06     ` [PATCH v4 " Siddharth Asthana
2022-07-12 16:06       ` [PATCH v4 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
2022-07-13 12:18           ` Christian Couder
2022-07-14 21:02         ` Junio C Hamano
2022-07-12 16:06       ` [PATCH v4 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
2022-07-12 16:06       ` [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
2022-07-13  1:25         ` Ævar Arnfjörð Bjarmason
2022-07-13 13:29           ` Christian Couder
2022-07-12 16:06       ` [PATCH v4 4/4] cat-file: add mailmap support Siddharth Asthana
2022-07-16  7:40       ` [PATCH v5 0/4] Add support for mailmap in cat-file Siddharth Asthana
2022-07-16  7:40         ` [PATCH v5 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
2022-07-17 22:11           ` Junio C Hamano
2022-07-16  7:40         ` [PATCH v5 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
2022-07-16  7:40         ` [PATCH v5 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
2022-07-16  7:40         ` [PATCH v5 4/4] cat-file: add mailmap support Siddharth Asthana
2022-07-18 19:50         ` [PATCH v6 0/4] Add support for mailmap in cat-file Siddharth Asthana
2022-07-18 19:50           ` [PATCH v6 1/4] revision: improve commit_rewrite_person() Siddharth Asthana
2022-07-18 19:51           ` [PATCH v6 2/4] ident: move commit_rewrite_person() to ident.c Siddharth Asthana
2022-07-18 19:51           ` [PATCH v6 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header() Siddharth Asthana
2022-07-18 19:51           ` [PATCH v6 4/4] cat-file: add mailmap support Siddharth Asthana
2022-07-25 18:58           ` [PATCH v6 0/4] Add support for mailmap in cat-file Junio C Hamano
2022-07-28 19:07             ` Christian Couder
2022-07-28 19:32               ` Junio C Hamano
2022-07-30  7:50                 ` Siddharth Asthana

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).