git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Sunshine <sunshine@sunshineco.com>
To: Emma Brooks <me@pluvano.com>
Cc: "Git List" <git@vger.kernel.org>, "Jakub Narębski" <jnareb@gmail.com>
Subject: Re: [PATCH v2] gitweb: map names/emails with mailmap
Date: Sun, 9 Aug 2020 20:49:59 -0400	[thread overview]
Message-ID: <CAPig+cT3aMUQapU7i0C3jZaLd2XJwP9SbxFEm_tG_1wj8w1PKw@mail.gmail.com> (raw)
In-Reply-To: <20200809230436.2152-1-me@pluvano.com>

On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
>
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> +mailmap::
> +       Use mailmap to find the canonical name/email for
> +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> +       default.

Is this setting global or per-repository? (I ask because documentation
for other options in this section document whether they can be set
per-repository.)

Should there be any sort of support for functionality similar to the
"mailmap.file" and "mailmap.blob" configuration options in Git itself?
(Genuine question, not a demand for you to implement such support.)

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> +# Contents of mailmap stored as a referance to a hash with keys in the format

s/referance/reference/

> +# of "name <email>" or "<email>", and values that are hashes containing a
> +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> +# already been read.
> +my $mailmap;
> +
> +sub read_mailmap {
> +       my %mailmap = ();
> +       open my $fd, '-|', quote_command(
> +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> +               or die_error(500, 'Failed to read mailmap');

Am I reading this correctly that this will die if the project does not
have a .mailmap file? If so, that seems like harsh behavior since
there are many projects in the wild lacking a .mailmap file.

> +       return \%mailmap if eof $fd;
> +       foreach (split '\n', <$fd>) {

If the .mailmap has no content, then the 'foreach' loop won't be
entered, which means the early 'return' above it is unneeded, correct?
(Not necessarily asking for the early 'return' to be removed, but more
a case of checking that I'm understanding the logic.)

> +               next if (/^#/);
> +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> +                       # New Name <new@email> <old@email>
> +                       # New Name <new@email> Old Name <old@email>

The first regex is intended to handle a trailing "# comment", whereas
the second regex is for lines lacking a comment, correct? However,
because neither of these expressions are anchored, the second regex
will match both types of lines, thus the first regex is redundant. I'm
guessing, therefore, that your intent was actually to anchor the
expressions, perhaps like this:

    if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Also, if you're matching lines of the form:

    name1 <email1> [optional-name] <email2>

in which you expect to see "name1", then is the loose "(.*)\s+"
desirable? Shouldn't it be tighter "(.+)\s+"? For instance:

    if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

> +                       $mailmap{$3} = ();

I wonder if you should be doing some sort of whitespace normalization
on $3 before using it as a hash key. For instance, if someone has a
.mailmap that looks like this (where I've used "." to represent
space):

    name1.<email1>.name2...<email2>

then $3 will have three spaces between 'name2' and '<email2>' when
used as a key, and that won't match later when you construct a "name
<email>" key later in map_author() with a single space.

> +                       ($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
> +                       $mailmap{$3}{email} = $2;
> +               } elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
> +                        /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {

Same comment as above about anchoring these patterns...

> +                       # New Name <old@email>
> +                       # <new@email> <old@email>
> +                       $mailmap{$3} = ();
> +                       if ($1) {
> +                               $mailmap{$3}{email} = $1;
> +                       } else {
> +                               ($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
> +                       }
> +               }
> +       }
> +       return \%mailmap;
> +}

  reply	other threads:[~2020-08-10  0:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30  4:12 [RFC PATCH] gitweb: Map names/emails with mailmap Emma Brooks
2020-07-30 16:20 ` Junio C Hamano
2020-07-31  1:01 ` Jeff King
2020-07-31  2:10   ` Junio C Hamano
2020-08-08 21:34 ` [PATCH] " Emma Brooks
2020-08-09 23:04   ` [PATCH v2] gitweb: map " Emma Brooks
2020-08-10  0:49     ` Eric Sunshine [this message]
2020-08-10  3:12       ` Emma Brooks
2020-08-10  5:41         ` Eric Sunshine
2020-08-10 10:02     ` Jeff King
2020-08-11  4:17       ` Emma Brooks
2020-08-11  4:48         ` Eric Sunshine
2020-08-11  4:55         ` Jeff King
2020-09-05  2:55           ` Emma Brooks
2020-09-05  3:26             ` Junio C Hamano
2020-09-07 22:10               ` Emma Brooks
2020-08-11  6:17         ` Eric Wong
2020-08-11  6:33           ` Joe Perches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPig+cT3aMUQapU7i0C3jZaLd2XJwP9SbxFEm_tG_1wj8w1PKw@mail.gmail.com \
    --to=sunshine@sunshineco.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=me@pluvano.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).