git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH] gitweb: Map names/emails with mailmap
@ 2020-07-30  4:12 Emma Brooks
  2020-07-30 16:20 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Emma Brooks @ 2020-07-30  4:12 UTC (permalink / raw)
  To: git; +Cc: Emma Brooks

Add an option to map names and emails to their canonical forms via a
.mailmap file. This is enabled by default, consistent with the behavior
of Git itself.

Signed-off-by: Emma Brooks <me@pluvano.com>
---

This works, but needs some polish. The read_mailmap code is not
particularly clever.

 Documentation/gitweb.conf.txt |  5 +++
 gitweb/gitweb.perl            | 79 +++++++++++++++++++++++++++++++++--
 2 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index 7963a79ba9..2d7551a6a5 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
 CSS stylesheet in `@stylesheets`), it may be appropriate to change
 these values.
 
+mailmap::
+	Use mailmap to find the canonical name/email for
+	committers/authors (see linkgit:git-shortlog[1]). Enabled by
+	default.
+
 highlight::
 	Server-side syntax highlight support in "blob" view.  It requires
 	`$highlight_bin` program to be available (see the description of
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0959a782ec..00256704a7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -505,6 +505,12 @@ sub evaluate_uri {
 		'override' => 0,
 		'default' => ['']},
 
+	# Enable reading mailmap to determine canonical author
+	# information. Enabled by default.
+	'mailmap' => {
+		'override' => 0,
+		'default' => [1]},
+
 	# Enable displaying how much time and how many git commands
 	# it took to generate and display page.  Disabled by default.
 	# Project specific override is not supported.
@@ -3490,6 +3496,61 @@ sub parse_tag {
 	return %tag
 }
 
+# Contents of mailmap stored as a referance to a hash with keys in the format
+# of "name <email>" or "<email>", and values that are hashes containing a
+# replacement "name" and/or "email". If set (even if empty) the mailmap has
+# already been read.
+my $mailmap;
+
+sub read_mailmap {
+	my %mailmap = ();
+	open my $fd, '-|', git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap'
+		or die_error(500, 'Failed to read mailmap');
+	foreach (split '\n', <$fd>) {
+		next if (/^#/);
+		if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
+		    /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
+			# New Name <new@email> <old@email>
+			# New Name <new@email> Old Name <old@email>
+			$mailmap{$3} = ();
+			$mailmap{$3}{name} = $1;
+			$mailmap{$3}{email} = $2;
+		} elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
+		         /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {
+			# New Name <old@email>
+			# <new@email> <old@email>
+			$mailmap{$3} = ();
+			if ($1) {
+				$mailmap{$3}{email} = $1;
+			} else {
+				$mailmap{$3}{name} = $2;
+			}
+		}
+	}
+	return \%mailmap;
+}
+
+# Map author name and email based on mailmap. A more specific match
+# ("name <email>") is preferred to a less specific one ("<email>").
+sub map_author {
+	my $name = shift;
+	my $email = shift;
+
+	if (!$mailmap) {
+		$mailmap = read_mailmap;
+	}
+
+	if ($mailmap->{"$name <$email>"}) {
+		$name = $mailmap->{"$name <$email>"}{name} || $name;
+		$email = $mailmap->{"$name <$email>"}{email} || $email;
+	} elsif ($mailmap->{"<$email>"}) {
+		$name = $mailmap->{"<$email>"}{name} || $name;
+		$email = $mailmap->{"<$email>"}{email} || $email;
+	}
+
+	return ($name, $email);
+}
+
 sub parse_commit_text {
 	my ($commit_text, $withparents) = @_;
 	my @commit_lines = split '\n', $commit_text;
@@ -3517,8 +3578,13 @@ sub parse_commit_text {
 			$co{'author_epoch'} = $2;
 			$co{'author_tz'} = $3;
 			if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'author_name'}  = $1;
-				$co{'author_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'author'} = "$name <$email>";
+				}
+				$co{'author_name'}  = $name;
+				$co{'author_email'} = $email;
 			} else {
 				$co{'author_name'} = $co{'author'};
 			}
@@ -3527,8 +3593,13 @@ sub parse_commit_text {
 			$co{'committer_epoch'} = $2;
 			$co{'committer_tz'} = $3;
 			if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'committer_name'}  = $1;
-				$co{'committer_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'committer'} = "$name <$email>";
+				}
+				$co{'committer_name'}  = $name;
+				$co{'committer_email'} = $email;
 			} else {
 				$co{'committer_name'} = $co{'committer'};
 			}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] gitweb: Map names/emails with mailmap
  2020-07-30  4:12 [RFC PATCH] gitweb: Map names/emails with mailmap Emma Brooks
@ 2020-07-30 16:20 ` Junio C Hamano
  2020-07-31  1:01 ` Jeff King
  2020-08-08 21:34 ` [PATCH] " Emma Brooks
  2 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2020-07-30 16:20 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git, Emma Brooks

[jc: Cc'ing Jakub, hoping he's still our resident gitweb expert, as
an "RFC" requests help from experts]

Emma Brooks <me@pluvano.com> writes:

> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
>
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
>
> This works, but needs some polish. The read_mailmap code is not
> particularly clever.
>
>  Documentation/gitweb.conf.txt |  5 +++
>  gitweb/gitweb.perl            | 79 +++++++++++++++++++++++++++++++++--
>  2 files changed, 80 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> index 7963a79ba9..2d7551a6a5 100644
> --- a/Documentation/gitweb.conf.txt
> +++ b/Documentation/gitweb.conf.txt
> @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
>  CSS stylesheet in `@stylesheets`), it may be appropriate to change
>  these values.
>  
> +mailmap::
> +	Use mailmap to find the canonical name/email for
> +	committers/authors (see linkgit:git-shortlog[1]). Enabled by
> +	default.
> +
>  highlight::
>  	Server-side syntax highlight support in "blob" view.  It requires
>  	`$highlight_bin` program to be available (see the description of
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 0959a782ec..00256704a7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -505,6 +505,12 @@ sub evaluate_uri {
>  		'override' => 0,
>  		'default' => ['']},
>  
> +	# Enable reading mailmap to determine canonical author
> +	# information. Enabled by default.
> +	'mailmap' => {
> +		'override' => 0,
> +		'default' => [1]},
> +
>  	# Enable displaying how much time and how many git commands
>  	# it took to generate and display page.  Disabled by default.
>  	# Project specific override is not supported.
> @@ -3490,6 +3496,61 @@ sub parse_tag {
>  	return %tag
>  }
>  
> +# Contents of mailmap stored as a referance to a hash with keys in the format
> +# of "name <email>" or "<email>", and values that are hashes containing a
> +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> +# already been read.
> +my $mailmap;
> +
> +sub read_mailmap {
> +	my %mailmap = ();
> +	open my $fd, '-|', git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap'
> +		or die_error(500, 'Failed to read mailmap');
> +	foreach (split '\n', <$fd>) {
> +		next if (/^#/);
> +		if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> +		    /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> +			# New Name <new@email> <old@email>
> +			# New Name <new@email> Old Name <old@email>
> +			$mailmap{$3} = ();
> +			$mailmap{$3}{name} = $1;
> +			$mailmap{$3}{email} = $2;
> +		} elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
> +		         /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {
> +			# New Name <old@email>
> +			# <new@email> <old@email>
> +			$mailmap{$3} = ();
> +			if ($1) {
> +				$mailmap{$3}{email} = $1;
> +			} else {
> +				$mailmap{$3}{name} = $2;
> +			}
> +		}
> +	}
> +	return \%mailmap;
> +}
> +
> +# Map author name and email based on mailmap. A more specific match
> +# ("name <email>") is preferred to a less specific one ("<email>").
> +sub map_author {
> +	my $name = shift;
> +	my $email = shift;
> +
> +	if (!$mailmap) {
> +		$mailmap = read_mailmap;
> +	}
> +
> +	if ($mailmap->{"$name <$email>"}) {
> +		$name = $mailmap->{"$name <$email>"}{name} || $name;
> +		$email = $mailmap->{"$name <$email>"}{email} || $email;
> +	} elsif ($mailmap->{"<$email>"}) {
> +		$name = $mailmap->{"<$email>"}{name} || $name;
> +		$email = $mailmap->{"<$email>"}{email} || $email;
> +	}
> +
> +	return ($name, $email);
> +}
> +
>  sub parse_commit_text {
>  	my ($commit_text, $withparents) = @_;
>  	my @commit_lines = split '\n', $commit_text;
> @@ -3517,8 +3578,13 @@ sub parse_commit_text {
>  			$co{'author_epoch'} = $2;
>  			$co{'author_tz'} = $3;
>  			if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) {
> -				$co{'author_name'}  = $1;
> -				$co{'author_email'} = $2;
> +				my ($name, $email) = @_;
> +				if (gitweb_check_feature('mailmap')) {
> +					($name, $email) = map_author($1, $2);
> +					$co{'author'} = "$name <$email>";
> +				}
> +				$co{'author_name'}  = $name;
> +				$co{'author_email'} = $email;
>  			} else {
>  				$co{'author_name'} = $co{'author'};
>  			}
> @@ -3527,8 +3593,13 @@ sub parse_commit_text {
>  			$co{'committer_epoch'} = $2;
>  			$co{'committer_tz'} = $3;
>  			if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) {
> -				$co{'committer_name'}  = $1;
> -				$co{'committer_email'} = $2;
> +				my ($name, $email) = @_;
> +				if (gitweb_check_feature('mailmap')) {
> +					($name, $email) = map_author($1, $2);
> +					$co{'committer'} = "$name <$email>";
> +				}
> +				$co{'committer_name'}  = $name;
> +				$co{'committer_email'} = $email;
>  			} else {
>  				$co{'committer_name'} = $co{'committer'};
>  			}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] gitweb: Map names/emails with mailmap
  2020-07-30  4:12 [RFC PATCH] gitweb: Map names/emails with mailmap Emma Brooks
  2020-07-30 16:20 ` Junio C Hamano
@ 2020-07-31  1:01 ` Jeff King
  2020-07-31  2:10   ` Junio C Hamano
  2020-08-08 21:34 ` [PATCH] " Emma Brooks
  2 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2020-07-31  1:01 UTC (permalink / raw)
  To: Emma Brooks; +Cc: git

On Thu, Jul 30, 2020 at 04:12:17AM +0000, Emma Brooks wrote:

> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.

I'm quite far from an expert in gitweb, but this seems like a good
feature to have.

Having a separate implementation to read and apply mailmaps makes me
worried that it will behave slightly differently than the C code,
especially around corner cases. Is it possible for us to ask git
programs that are called by gitweb to do the conversion for us (e.g.,
by passing "--use-mailmap" or using "%aE" and "%aN" formatters)?
I won't be surprised if the answer is "no, we access commits using
lower-level plumbing". But it's worth looking into, I think, if you
didn't already.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH] gitweb: Map names/emails with mailmap
  2020-07-31  1:01 ` Jeff King
@ 2020-07-31  2:10   ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2020-07-31  2:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Emma Brooks, git

Jeff King <peff@peff.net> writes:

> On Thu, Jul 30, 2020 at 04:12:17AM +0000, Emma Brooks wrote:
>
>> Add an option to map names and emails to their canonical forms via a
>> .mailmap file. This is enabled by default, consistent with the behavior
>> of Git itself.
>
> I'm quite far from an expert in gitweb, but this seems like a good
> feature to have.
>
> Having a separate implementation to read and apply mailmaps makes me
> worried that it will behave slightly differently than the C code,
> especially around corner cases. Is it possible for us to ask git
> programs that are called by gitweb to do the conversion for us (e.g.,
> by passing "--use-mailmap" or using "%aE" and "%aN" formatters)?
> I won't be surprised if the answer is "no, we access commits using
> lower-level plumbing". But it's worth looking into, I think, if you
> didn't already.

I briefly looked at tweaking "rev-list --header" but because it ends
up calling pretty.c::pp_header() for obvious reasons since we are
doing as little processing as possible in CMIT_FMT_RAW format, we do
not get to pretty.c::pp_user_info() which is where the mailmap
conversion happens for the normal "log" output.

It is tempting to split pp_user_info() into two parts (i.e. the
first few lines up to where map_user() is optionally called, and the
remainder), so that the CMIT_FMT_RAW users can optionally ask for
mailmap to kick in, but I doubt that it is worth it, if the only
potential benefitter is gitweb (which I consider is purely
maintenance mode---I am surprised the world hasn't yet switched to
gitiles, cgit and others).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] gitweb: Map names/emails with mailmap
  2020-07-30  4:12 [RFC PATCH] gitweb: Map names/emails with mailmap Emma Brooks
  2020-07-30 16:20 ` Junio C Hamano
  2020-07-31  1:01 ` Jeff King
@ 2020-08-08 21:34 ` Emma Brooks
  2020-08-09 23:04   ` [PATCH v2] gitweb: map " Emma Brooks
  2 siblings, 1 reply; 18+ messages in thread
From: Emma Brooks @ 2020-08-08 21:34 UTC (permalink / raw)
  To: git; +Cc: Emma Brooks, Jakub Narębski

Add an option to map names and emails to their canonical forms via a
.mailmap file. This is enabled by default, consistent with the behavior
of Git itself.

Signed-off-by: Emma Brooks <me@pluvano.com>
---
 Documentation/gitweb.conf.txt |  5 +++
 gitweb/gitweb.perl            | 81 +++++++++++++++++++++++++++++++++--
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index 7963a79ba9..2d7551a6a5 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
 CSS stylesheet in `@stylesheets`), it may be appropriate to change
 these values.
 
+mailmap::
+	Use mailmap to find the canonical name/email for
+	committers/authors (see linkgit:git-shortlog[1]). Enabled by
+	default.
+
 highlight::
 	Server-side syntax highlight support in "blob" view.  It requires
 	`$highlight_bin` program to be available (see the description of
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0959a782ec..1ca495b8b4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -505,6 +505,12 @@ sub evaluate_uri {
 		'override' => 0,
 		'default' => ['']},
 
+	# Enable reading mailmap to determine canonical author
+	# information. Enabled by default.
+	'mailmap' => {
+		'override' => 0,
+		'default' => [1]},
+
 	# Enable displaying how much time and how many git commands
 	# it took to generate and display page.  Disabled by default.
 	# Project specific override is not supported.
@@ -3490,6 +3496,63 @@ sub parse_tag {
 	return %tag
 }
 
+# Contents of mailmap stored as a referance to a hash with keys in the format
+# of "name <email>" or "<email>", and values that are hashes containing a
+# replacement "name" and/or "email". If set (even if empty) the mailmap has
+# already been read.
+my $mailmap;
+
+sub read_mailmap {
+	my %mailmap = ();
+	open my $fd, '-|', quote_command(
+		git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
+		or die_error(500, 'Failed to read mailmap');
+	return \%mailmap if eof $fd;
+	foreach (split '\n', <$fd>) {
+		next if (/^#/);
+		if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
+		    /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
+			# New Name <new@email> <old@email>
+			# New Name <new@email> Old Name <old@email>
+			$mailmap{$3} = ();
+			($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
+			$mailmap{$3}{email} = $2;
+		} elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
+		         /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {
+			# New Name <old@email>
+			# <new@email> <old@email>
+			$mailmap{$3} = ();
+			if ($1) {
+				$mailmap{$3}{email} = $1;
+			} else {
+				($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
+			}
+		}
+	}
+	return \%mailmap;
+}
+
+# Map author name and email based on mailmap. A more specific match
+# ("name <email>") is preferred to a less specific one ("<email>").
+sub map_author {
+	my $name = shift;
+	my $email = shift;
+
+	if (!$mailmap) {
+		$mailmap = read_mailmap;
+	}
+
+	if ($mailmap->{"$name <$email>"}) {
+		$name = $mailmap->{"$name <$email>"}{name} || $name;
+		$email = $mailmap->{"$name <$email>"}{email} || $email;
+	} elsif ($mailmap->{"<$email>"}) {
+		$name = $mailmap->{"<$email>"}{name} || $name;
+		$email = $mailmap->{"<$email>"}{email} || $email;
+	}
+
+	return ($name, $email);
+}
+
 sub parse_commit_text {
 	my ($commit_text, $withparents) = @_;
 	my @commit_lines = split '\n', $commit_text;
@@ -3517,8 +3580,13 @@ sub parse_commit_text {
 			$co{'author_epoch'} = $2;
 			$co{'author_tz'} = $3;
 			if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'author_name'}  = $1;
-				$co{'author_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'author'} = "$name <$email>";
+				}
+				$co{'author_name'}  = $name;
+				$co{'author_email'} = $email;
 			} else {
 				$co{'author_name'} = $co{'author'};
 			}
@@ -3527,8 +3595,13 @@ sub parse_commit_text {
 			$co{'committer_epoch'} = $2;
 			$co{'committer_tz'} = $3;
 			if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'committer_name'}  = $1;
-				$co{'committer_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'committer'} = "$name <$email>";
+				}
+				$co{'committer_name'}  = $name;
+				$co{'committer_email'} = $email;
 			} else {
 				$co{'committer_name'} = $co{'committer'};
 			}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-08 21:34 ` [PATCH] " Emma Brooks
@ 2020-08-09 23:04   ` Emma Brooks
  2020-08-10  0:49     ` Eric Sunshine
  2020-08-10 10:02     ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Emma Brooks @ 2020-08-09 23:04 UTC (permalink / raw)
  To: git; +Cc: Emma Brooks, Jakub Narębski

Add an option to map names and emails to their canonical forms via a
.mailmap file. This is enabled by default, consistent with the behavior
of Git itself.

Signed-off-by: Emma Brooks <me@pluvano.com>
---

No code changes. I just fixed a typo in the commit subject (made "map"
lower-case).

 Documentation/gitweb.conf.txt |  5 +++
 gitweb/gitweb.perl            | 81 +++++++++++++++++++++++++++++++++--
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index 7963a79ba9..2d7551a6a5 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
 CSS stylesheet in `@stylesheets`), it may be appropriate to change
 these values.
 
+mailmap::
+	Use mailmap to find the canonical name/email for
+	committers/authors (see linkgit:git-shortlog[1]). Enabled by
+	default.
+
 highlight::
 	Server-side syntax highlight support in "blob" view.  It requires
 	`$highlight_bin` program to be available (see the description of
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0959a782ec..1ca495b8b4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -505,6 +505,12 @@ sub evaluate_uri {
 		'override' => 0,
 		'default' => ['']},
 
+	# Enable reading mailmap to determine canonical author
+	# information. Enabled by default.
+	'mailmap' => {
+		'override' => 0,
+		'default' => [1]},
+
 	# Enable displaying how much time and how many git commands
 	# it took to generate and display page.  Disabled by default.
 	# Project specific override is not supported.
@@ -3490,6 +3496,63 @@ sub parse_tag {
 	return %tag
 }
 
+# Contents of mailmap stored as a referance to a hash with keys in the format
+# of "name <email>" or "<email>", and values that are hashes containing a
+# replacement "name" and/or "email". If set (even if empty) the mailmap has
+# already been read.
+my $mailmap;
+
+sub read_mailmap {
+	my %mailmap = ();
+	open my $fd, '-|', quote_command(
+		git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
+		or die_error(500, 'Failed to read mailmap');
+	return \%mailmap if eof $fd;
+	foreach (split '\n', <$fd>) {
+		next if (/^#/);
+		if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
+		    /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
+			# New Name <new@email> <old@email>
+			# New Name <new@email> Old Name <old@email>
+			$mailmap{$3} = ();
+			($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
+			$mailmap{$3}{email} = $2;
+		} elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
+		         /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {
+			# New Name <old@email>
+			# <new@email> <old@email>
+			$mailmap{$3} = ();
+			if ($1) {
+				$mailmap{$3}{email} = $1;
+			} else {
+				($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
+			}
+		}
+	}
+	return \%mailmap;
+}
+
+# Map author name and email based on mailmap. A more specific match
+# ("name <email>") is preferred to a less specific one ("<email>").
+sub map_author {
+	my $name = shift;
+	my $email = shift;
+
+	if (!$mailmap) {
+		$mailmap = read_mailmap;
+	}
+
+	if ($mailmap->{"$name <$email>"}) {
+		$name = $mailmap->{"$name <$email>"}{name} || $name;
+		$email = $mailmap->{"$name <$email>"}{email} || $email;
+	} elsif ($mailmap->{"<$email>"}) {
+		$name = $mailmap->{"<$email>"}{name} || $name;
+		$email = $mailmap->{"<$email>"}{email} || $email;
+	}
+
+	return ($name, $email);
+}
+
 sub parse_commit_text {
 	my ($commit_text, $withparents) = @_;
 	my @commit_lines = split '\n', $commit_text;
@@ -3517,8 +3580,13 @@ sub parse_commit_text {
 			$co{'author_epoch'} = $2;
 			$co{'author_tz'} = $3;
 			if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'author_name'}  = $1;
-				$co{'author_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'author'} = "$name <$email>";
+				}
+				$co{'author_name'}  = $name;
+				$co{'author_email'} = $email;
 			} else {
 				$co{'author_name'} = $co{'author'};
 			}
@@ -3527,8 +3595,13 @@ sub parse_commit_text {
 			$co{'committer_epoch'} = $2;
 			$co{'committer_tz'} = $3;
 			if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'committer_name'}  = $1;
-				$co{'committer_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'committer'} = "$name <$email>";
+				}
+				$co{'committer_name'}  = $name;
+				$co{'committer_email'} = $email;
 			} else {
 				$co{'committer_name'} = $co{'committer'};
 			}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-09 23:04   ` [PATCH v2] gitweb: map " Emma Brooks
@ 2020-08-10  0:49     ` Eric Sunshine
  2020-08-10  3:12       ` Emma Brooks
  2020-08-10 10:02     ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Sunshine @ 2020-08-10  0:49 UTC (permalink / raw)
  To: Emma Brooks; +Cc: Git List, Jakub Narębski

On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
>
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> +mailmap::
> +       Use mailmap to find the canonical name/email for
> +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> +       default.

Is this setting global or per-repository? (I ask because documentation
for other options in this section document whether they can be set
per-repository.)

Should there be any sort of support for functionality similar to the
"mailmap.file" and "mailmap.blob" configuration options in Git itself?
(Genuine question, not a demand for you to implement such support.)

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> +# Contents of mailmap stored as a referance to a hash with keys in the format

s/referance/reference/

> +# of "name <email>" or "<email>", and values that are hashes containing a
> +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> +# already been read.
> +my $mailmap;
> +
> +sub read_mailmap {
> +       my %mailmap = ();
> +       open my $fd, '-|', quote_command(
> +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> +               or die_error(500, 'Failed to read mailmap');

Am I reading this correctly that this will die if the project does not
have a .mailmap file? If so, that seems like harsh behavior since
there are many projects in the wild lacking a .mailmap file.

> +       return \%mailmap if eof $fd;
> +       foreach (split '\n', <$fd>) {

If the .mailmap has no content, then the 'foreach' loop won't be
entered, which means the early 'return' above it is unneeded, correct?
(Not necessarily asking for the early 'return' to be removed, but more
a case of checking that I'm understanding the logic.)

> +               next if (/^#/);
> +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> +                       # New Name <new@email> <old@email>
> +                       # New Name <new@email> Old Name <old@email>

The first regex is intended to handle a trailing "# comment", whereas
the second regex is for lines lacking a comment, correct? However,
because neither of these expressions are anchored, the second regex
will match both types of lines, thus the first regex is redundant. I'm
guessing, therefore, that your intent was actually to anchor the
expressions, perhaps like this:

    if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Also, if you're matching lines of the form:

    name1 <email1> [optional-name] <email2>

in which you expect to see "name1", then is the loose "(.*)\s+"
desirable? Shouldn't it be tighter "(.+)\s+"? For instance:

    if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

> +                       $mailmap{$3} = ();

I wonder if you should be doing some sort of whitespace normalization
on $3 before using it as a hash key. For instance, if someone has a
.mailmap that looks like this (where I've used "." to represent
space):

    name1.<email1>.name2...<email2>

then $3 will have three spaces between 'name2' and '<email2>' when
used as a key, and that won't match later when you construct a "name
<email>" key later in map_author() with a single space.

> +                       ($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
> +                       $mailmap{$3}{email} = $2;
> +               } elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
> +                        /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {

Same comment as above about anchoring these patterns...

> +                       # New Name <old@email>
> +                       # <new@email> <old@email>
> +                       $mailmap{$3} = ();
> +                       if ($1) {
> +                               $mailmap{$3}{email} = $1;
> +                       } else {
> +                               ($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
> +                       }
> +               }
> +       }
> +       return \%mailmap;
> +}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-10  0:49     ` Eric Sunshine
@ 2020-08-10  3:12       ` Emma Brooks
  2020-08-10  5:41         ` Eric Sunshine
  0 siblings, 1 reply; 18+ messages in thread
From: Emma Brooks @ 2020-08-10  3:12 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Jakub Narębski

On 2020-08-09 20:49:59-0400, Eric Sunshine wrote:
> On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> > Add an option to map names and emails to their canonical forms via a
> > .mailmap file. This is enabled by default, consistent with the behavior
> > of Git itself.
> >
> > Signed-off-by: Emma Brooks <me@pluvano.com>
> > ---
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> > +mailmap::
> > +       Use mailmap to find the canonical name/email for
> > +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> > +       default.
> 
> Is this setting global or per-repository? (I ask because documentation
> for other options in this section document whether they can be set
> per-repository.)

Global. I'll add a note that it cannot be set per-project, or I could
add support for setting it per-project if that's wanted.

> Should there be any sort of support for functionality similar to the
> "mailmap.file" and "mailmap.blob" configuration options in Git itself?
> (Genuine question, not a demand for you to implement such support.)

Yes, that would be useful and should probably be supported.

> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > +# Contents of mailmap stored as a referance to a hash with keys in the format
> 
> s/referance/reference/

OK.

> > +# of "name <email>" or "<email>", and values that are hashes containing a
> > +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> > +# already been read.
> > +my $mailmap;
> > +
> > +sub read_mailmap {
> > +       my %mailmap = ();
> > +       open my $fd, '-|', quote_command(
> > +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> > +               or die_error(500, 'Failed to read mailmap');
> 
> Am I reading this correctly that this will die if the project does not
> have a .mailmap file? If so, that seems like harsh behavior since
> there are many projects in the wild lacking a .mailmap file.

No, this error message is misleading. The die_error is called if there
is a problem executing git cat-file, but not if cat-file returns an
error. I'll revise this message to be more accurate.

> > +       return \%mailmap if eof $fd;
> > +       foreach (split '\n', <$fd>) {
> 
> If the .mailmap has no content, then the 'foreach' loop won't be
> entered, which means the early 'return' above it is unneeded, correct?
> (Not necessarily asking for the early 'return' to be removed, but more
> a case of checking that I'm understanding the logic.)

The early return is intended to catch when there is no mailmap, so $fd
does not get initialized. Without it, you would get an error when you
try to split $fd's content:

    Use of uninitialized value $fd in split at [the foreach]

> > +               next if (/^#/);
> > +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> > +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> > +                       # New Name <new@email> <old@email>
> > +                       # New Name <new@email> Old Name <old@email>
> 
> The first regex is intended to handle a trailing "# comment", whereas
> the second regex is for lines lacking a comment, correct? However,
> because neither of these expressions are anchored, the second regex
> will match both types of lines, thus the first regex is redundant. I'm
> guessing, therefore, that your intent was actually to anchor the
> expressions, perhaps like this:
> 
>     if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {
> 
> Also, if you're matching lines of the form:
> 
>     name1 <email1> [optional-name] <email2>
> 
> in which you expect to see "name1", then is the loose "(.*)\s+"
> desirable? Shouldn't it be tighter "(.+)\s+"? For instance:
> 
>     if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Yes and yes. I'll update those.

> > +                       $mailmap{$3} = ();
> 
> I wonder if you should be doing some sort of whitespace normalization
> on $3 before using it as a hash key. For instance, if someone has a
> .mailmap that looks like this (where I've used "." to represent
> space):
> 
>     name1.<email1>.name2...<email2>
> 
> then $3 will have three spaces between 'name2' and '<email2>' when
> used as a key, and that won't match later when you construct a "name
> <email>" key later in map_author() with a single space.

Yes, I hadn't considered that case.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-10  3:12       ` Emma Brooks
@ 2020-08-10  5:41         ` Eric Sunshine
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Sunshine @ 2020-08-10  5:41 UTC (permalink / raw)
  To: Emma Brooks; +Cc: Git List, Jakub Narębski

On Sun, Aug 9, 2020 at 11:12 PM Emma Brooks <me@pluvano.com> wrote:
> On 2020-08-09 20:49:59-0400, Eric Sunshine wrote:
> > On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> > > +mailmap::
> >
> > Is this setting global or per-repository? (I ask because documentation
> > for other options in this section document whether they can be set
> > per-repository.)
>
> Global. I'll add a note that it cannot be set per-project, or I could
> add support for setting it per-project if that's wanted.

If it's not much extra work, it might make sense to support
per-project, if for no other reason, to be consistent with other
nearby options.

> > Should there be any sort of support for functionality similar to the
> > "mailmap.file" and "mailmap.blob" configuration options in Git itself?
> > (Genuine question, not a demand for you to implement such support.)
>
> Yes, that would be useful and should probably be supported.

I don't insist upon it. It can always be added later if someone needs
it. I was asking about it now because it might have an affect on the
design or type of value which the 'mailmap' option you added above can
accept, and I was concerned about getting locked into a design without
taking these other possibilities into account. For instance, rather
than being a simple boolean, perhaps the 'mailmap' option you added
could be more expressive, eventually allowing support for an explicit
file or blob. This is another reason why I asked if 'mailmap' can be
per-project, since an explicit mailmap file, and especially a blob,
would belong to a particular project. It's just something to think
about. (Then again, I'm not a gitweb user, nor am I familiar with its
configuration, so take my observations with a grain of salt.)

> > > +       open my $fd, '-|', quote_command(
> > > +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> > > +               or die_error(500, 'Failed to read mailmap');
> >
> > Am I reading this correctly that this will die if the project does not
> > have a .mailmap file? If so, that seems like harsh behavior since
> > there are many projects in the wild lacking a .mailmap file.
>
> No, this error message is misleading. The die_error is called if there
> is a problem executing git cat-file, but not if cat-file returns an
> error. I'll revise this message to be more accurate.

Okay, that makes sense.

> > > +       return \%mailmap if eof $fd;
> > > +       foreach (split '\n', <$fd>) {
> >
> > If the .mailmap has no content, then the 'foreach' loop won't be
> > entered, which means the early 'return' above it is unneeded, correct?
> > (Not necessarily asking for the early 'return' to be removed, but more
> > a case of checking that I'm understanding the logic.)
>
> The early return is intended to catch when there is no mailmap, so $fd
> does not get initialized. Without it, you would get an error when you
> try to split $fd's content:
>
>     Use of uninitialized value $fd in split at [the foreach]

Right. This follows from my misunderstanding what happened if .mailmap
was missing.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-09 23:04   ` [PATCH v2] gitweb: map " Emma Brooks
  2020-08-10  0:49     ` Eric Sunshine
@ 2020-08-10 10:02     ` Jeff King
  2020-08-11  4:17       ` Emma Brooks
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff King @ 2020-08-10 10:02 UTC (permalink / raw)
  To: Emma Brooks; +Cc: git, Jakub Narębski

On Sun, Aug 09, 2020 at 11:04:37PM +0000, Emma Brooks wrote:

> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
> 
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
> 
> No code changes. I just fixed a typo in the commit subject (made "map"
> lower-case).

There was a little discussion in response to v1 on whether we could
reuse the existing C mailmap code:

  https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/

Did you have any thoughts on that?

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-10 10:02     ` Jeff King
@ 2020-08-11  4:17       ` Emma Brooks
  2020-08-11  4:48         ` Eric Sunshine
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Emma Brooks @ 2020-08-11  4:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jakub Narębski

On 2020-08-10 06:02:49-0400, Jeff King wrote:
> There was a little discussion in response to v1 on whether we could
> reuse the existing C mailmap code:
> 
>   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> 
> Did you have any thoughts on that?

I think it's probably not worth the effort to make the necessary changes
to "rev-list --header" Junio mentioned, just for gitweb.

I agree it's a bit worrisome to have a second parser that could
potentially behave slightly differently than the main implementation.
What if we added tests for gitweb's mailmap parsing based on the same
cases used for Git itself?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-11  4:17       ` Emma Brooks
@ 2020-08-11  4:48         ` Eric Sunshine
  2020-08-11  4:55         ` Jeff King
  2020-08-11  6:17         ` Eric Wong
  2 siblings, 0 replies; 18+ messages in thread
From: Eric Sunshine @ 2020-08-11  4:48 UTC (permalink / raw)
  To: Emma Brooks; +Cc: Jeff King, Git List, Jakub Narębski

On Tue, Aug 11, 2020 at 12:17 AM Emma Brooks <me@pluvano.com> wrote:
> On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > There was a little discussion in response to v1 on whether we could
> > reuse the existing C mailmap code:
>
> I think it's probably not worth the effort to make the necessary changes
> to "rev-list --header" Junio mentioned, just for gitweb.
>
> I agree it's a bit worrisome to have a second parser that could
> potentially behave slightly differently than the main implementation.
> What if we added tests for gitweb's mailmap parsing based on the same
> cases used for Git itself?

Another option which people probably won't like is to have gitweb
start "git check-mailmap --stdin" in the background, leave it running,
and just feed it author/commit info as needed and read back its
replies. The benefit is that you get the .mailmap parsing and
resolution built into Git itself without needing any extra
parsing/resolution Perl code or tests. The downside is that people
might balk at an extra process hanging around for the duration of
gitweb itself. (You could also start up "git check-mailmap" repeatedly
on-demand, but that would probably be too slow and resource intensive
for real-world use.)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-11  4:17       ` Emma Brooks
  2020-08-11  4:48         ` Eric Sunshine
@ 2020-08-11  4:55         ` Jeff King
  2020-09-05  2:55           ` Emma Brooks
  2020-08-11  6:17         ` Eric Wong
  2 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2020-08-11  4:55 UTC (permalink / raw)
  To: Emma Brooks; +Cc: git, Jakub Narębski

On Tue, Aug 11, 2020 at 04:17:28AM +0000, Emma Brooks wrote:

> On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > There was a little discussion in response to v1 on whether we could
> > reuse the existing C mailmap code:
> > 
> >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > 
> > Did you have any thoughts on that?
> 
> I think it's probably not worth the effort to make the necessary changes
> to "rev-list --header" Junio mentioned, just for gitweb.

Yeah, I agree that probably doesn't make sense to change "rev-list
--header". I wonder if git could be using "rev-list --format" instead,
though, and asking for the specific things it wants. That could improve
more than just this case, too (e.g., the C code would be parsing and
normalizing author/committer idents, which could make handling of badly
formatted ones more consistent with other Git tools).

It may be a big change, though. I don't know the gitweb code very well.

> I agree it's a bit worrisome to have a second parser that could
> potentially behave slightly differently than the main implementation.
> What if we added tests for gitweb's mailmap parsing based on the same
> cases used for Git itself?

That would certainly help, though I don't know how easy it would be to
replicate all of the tests in a maintainable way.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-11  4:17       ` Emma Brooks
  2020-08-11  4:48         ` Eric Sunshine
  2020-08-11  4:55         ` Jeff King
@ 2020-08-11  6:17         ` Eric Wong
  2020-08-11  6:33           ` Joe Perches
  2 siblings, 1 reply; 18+ messages in thread
From: Eric Wong @ 2020-08-11  6:17 UTC (permalink / raw)
  To: Emma Brooks; +Cc: Jeff King, git, Jakub Narębski, Joe Perches

Emma Brooks <me@pluvano.com> wrote:
> On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > There was a little discussion in response to v1 on whether we could
> > reuse the existing C mailmap code:
> > 
> >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > 
> > Did you have any thoughts on that?
> 
> I think it's probably not worth the effort to make the necessary changes
> to "rev-list --header" Junio mentioned, just for gitweb.
> 
> I agree it's a bit worrisome to have a second parser that could
> potentially behave slightly differently than the main implementation.

+Cc Joe Perches

Fwiw, there's already a GPL-2.0 Perl .mailmap parser in
scripts/get_maintainer.pl of the Linux kernel which Joe
maintains:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_maintainer.pl

Been thinking about adding mailmap support to public-inbox in
the send-email reply instructions, too.  (but public-inbox is
AGPL-3+, so I can't steal the code w/o permission)

> What if we added tests for gitweb's mailmap parsing based on the same
> cases used for Git itself?

That's probably fine IMHO; especially if it's just for gitweb display
(and not writing anything that's meant to be stored forever).

There's already dozens of different parsers for email addresses,
MIME, mailbox formats, etc. all with slightly different edge cases;
things still mostly work well enough to not be a huge problem.
(Same goes for Markdown, HTML, formats and even JSON :x)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-11  6:17         ` Eric Wong
@ 2020-08-11  6:33           ` Joe Perches
  0 siblings, 0 replies; 18+ messages in thread
From: Joe Perches @ 2020-08-11  6:33 UTC (permalink / raw)
  To: Eric Wong, Emma Brooks, Florian Mickler
  Cc: Jeff King, git, Jakub Narębski

On Tue, 2020-08-11 at 06:17 +0000, Eric Wong wrote:
> Emma Brooks <me@pluvano.com> wrote:
> > On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > > There was a little discussion in response to v1 on whether we could
> > > reuse the existing C mailmap code:
> > > 
> > >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > > 
> > > Did you have any thoughts on that?
> > 
> > I think it's probably not worth the effort to make the necessary changes
> > to "rev-list --header" Junio mentioned, just for gitweb.
> > 
> > I agree it's a bit worrisome to have a second parser that could
> > potentially behave slightly differently than the main implementation.
> 
> +Cc Joe Perches
> 
> Fwiw, there's already a GPL-2.0 Perl .mailmap parser in
> scripts/get_maintainer.pl of the Linux kernel which Joe
> maintains:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_maintainer.pl

+cc Florian Mickler

Might be different behavior, dunno.

Florian Mickler wrote most of that and I
believe I rewrote it a bit, mostly for style.

If the perl code is useful to you, do what
you will with it, I give you my permission.

I don't believe get_maintainer needs to be
changed unless it's shown to be different
than what git does already.  I think it's
the same output.

> Been thinking about adding mailmap support to public-inbox in
> the send-email reply instructions, too.  (but public-inbox is
> AGPL-3+, so I can't steal the code w/o permission)
> 
> > What if we added tests for gitweb's mailmap parsing based on the same
> > cases used for Git itself?
> 
> That's probably fine IMHO; especially if it's just for gitweb display
> (and not writing anything that's meant to be stored forever).
> 
> There's already dozens of different parsers for email addresses,
> MIME, mailbox formats, etc. all with slightly different edge cases;
> things still mostly work well enough to not be a huge problem.
> (Same goes for Markdown, HTML, formats and even JSON :x)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-08-11  4:55         ` Jeff King
@ 2020-09-05  2:55           ` Emma Brooks
  2020-09-05  3:26             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Emma Brooks @ 2020-09-05  2:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jakub Narębski

On 2020-08-11 00:55:09-0400, Jeff King wrote:
> On Tue, Aug 11, 2020 at 04:17:28AM +0000, Emma Brooks wrote:
> 
> > On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > > There was a little discussion in response to v1 on whether we could
> > > reuse the existing C mailmap code:
> > > 
> > >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > > 
> > > Did you have any thoughts on that?
> > 
> > I think it's probably not worth the effort to make the necessary changes
> > to "rev-list --header" Junio mentioned, just for gitweb.
> 
> Yeah, I agree that probably doesn't make sense to change "rev-list
> --header". I wonder if git could be using "rev-list --format" instead,
> though, and asking for the specific things it wants. That could improve
> more than just this case, too (e.g., the C code would be parsing and
> normalizing author/committer idents, which could make handling of badly
> formatted ones more consistent with other Git tools).
> 
> It may be a big change, though. I don't know the gitweb code very well.

This idea works in my testing, and it should be a small change.

However, I couldn't find a way to get "rev-list --format" to separate
commits with NULs. Is there a way to do this that I'm missing? I was
able to try the concept in gitweb by switching the "rev-list --format"
call I would've used to a similar log call, and it seems like a fairly
small change.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-09-05  2:55           ` Emma Brooks
@ 2020-09-05  3:26             ` Junio C Hamano
  2020-09-07 22:10               ` Emma Brooks
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2020-09-05  3:26 UTC (permalink / raw)
  To: Emma Brooks; +Cc: Jeff King, git, Jakub Narębski

Emma Brooks <me@pluvano.com> writes:

> However, I couldn't find a way to get "rev-list --format" to separate
> commits with NULs.

A workaround would be "git rev-list --format='%s%x00'", iow,
manually insert NUL

I would have expected "-z" to replace LF with NUL, but that does not
appear to work X-<.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  2020-09-05  3:26             ` Junio C Hamano
@ 2020-09-07 22:10               ` Emma Brooks
  0 siblings, 0 replies; 18+ messages in thread
From: Emma Brooks @ 2020-09-07 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Jakub Narębski

On 2020-09-04 20:26:11-0700, Junio C Hamano wrote:
> Emma Brooks <me@pluvano.com> writes:
> 
> > However, I couldn't find a way to get "rev-list --format" to separate
> > commits with NULs.
> 
> A workaround would be "git rev-list --format='%s%x00'", iow,
> manually insert NUL
> 
> I would have expected "-z" to replace LF with NUL, but that does not
> appear to work X-<.

Thanks. I'll need to ignore the extra LF when parsing then. Later, "-z"
support could be added/fixed in rev-list (#leftoverbits?) and gitweb
could be updated to use that instead.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-07 22:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30  4:12 [RFC PATCH] gitweb: Map names/emails with mailmap Emma Brooks
2020-07-30 16:20 ` Junio C Hamano
2020-07-31  1:01 ` Jeff King
2020-07-31  2:10   ` Junio C Hamano
2020-08-08 21:34 ` [PATCH] " Emma Brooks
2020-08-09 23:04   ` [PATCH v2] gitweb: map " Emma Brooks
2020-08-10  0:49     ` Eric Sunshine
2020-08-10  3:12       ` Emma Brooks
2020-08-10  5:41         ` Eric Sunshine
2020-08-10 10:02     ` Jeff King
2020-08-11  4:17       ` Emma Brooks
2020-08-11  4:48         ` Eric Sunshine
2020-08-11  4:55         ` Jeff King
2020-09-05  2:55           ` Emma Brooks
2020-09-05  3:26             ` Junio C Hamano
2020-09-07 22:10               ` Emma Brooks
2020-08-11  6:17         ` Eric Wong
2020-08-11  6:33           ` Joe Perches

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).