git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emma Brooks <me@pluvano.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "Git List" <git@vger.kernel.org>, "Jakub Narębski" <jnareb@gmail.com>
Subject: Re: [PATCH v2] gitweb: map names/emails with mailmap
Date: Mon, 10 Aug 2020 03:12:00 +0000	[thread overview]
Message-ID: <20200810031123.GA3565@pluvano.com> (raw)
In-Reply-To: <CAPig+cT3aMUQapU7i0C3jZaLd2XJwP9SbxFEm_tG_1wj8w1PKw@mail.gmail.com>

On 2020-08-09 20:49:59-0400, Eric Sunshine wrote:
> On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> > Add an option to map names and emails to their canonical forms via a
> > .mailmap file. This is enabled by default, consistent with the behavior
> > of Git itself.
> >
> > Signed-off-by: Emma Brooks <me@pluvano.com>
> > ---
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> > +mailmap::
> > +       Use mailmap to find the canonical name/email for
> > +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> > +       default.
> 
> Is this setting global or per-repository? (I ask because documentation
> for other options in this section document whether they can be set
> per-repository.)

Global. I'll add a note that it cannot be set per-project, or I could
add support for setting it per-project if that's wanted.

> Should there be any sort of support for functionality similar to the
> "mailmap.file" and "mailmap.blob" configuration options in Git itself?
> (Genuine question, not a demand for you to implement such support.)

Yes, that would be useful and should probably be supported.

> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > +# Contents of mailmap stored as a referance to a hash with keys in the format
> 
> s/referance/reference/

OK.

> > +# of "name <email>" or "<email>", and values that are hashes containing a
> > +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> > +# already been read.
> > +my $mailmap;
> > +
> > +sub read_mailmap {
> > +       my %mailmap = ();
> > +       open my $fd, '-|', quote_command(
> > +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> > +               or die_error(500, 'Failed to read mailmap');
> 
> Am I reading this correctly that this will die if the project does not
> have a .mailmap file? If so, that seems like harsh behavior since
> there are many projects in the wild lacking a .mailmap file.

No, this error message is misleading. The die_error is called if there
is a problem executing git cat-file, but not if cat-file returns an
error. I'll revise this message to be more accurate.

> > +       return \%mailmap if eof $fd;
> > +       foreach (split '\n', <$fd>) {
> 
> If the .mailmap has no content, then the 'foreach' loop won't be
> entered, which means the early 'return' above it is unneeded, correct?
> (Not necessarily asking for the early 'return' to be removed, but more
> a case of checking that I'm understanding the logic.)

The early return is intended to catch when there is no mailmap, so $fd
does not get initialized. Without it, you would get an error when you
try to split $fd's content:

    Use of uninitialized value $fd in split at [the foreach]

> > +               next if (/^#/);
> > +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> > +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> > +                       # New Name <new@email> <old@email>
> > +                       # New Name <new@email> Old Name <old@email>
> 
> The first regex is intended to handle a trailing "# comment", whereas
> the second regex is for lines lacking a comment, correct? However,
> because neither of these expressions are anchored, the second regex
> will match both types of lines, thus the first regex is redundant. I'm
> guessing, therefore, that your intent was actually to anchor the
> expressions, perhaps like this:
> 
>     if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {
> 
> Also, if you're matching lines of the form:
> 
>     name1 <email1> [optional-name] <email2>
> 
> in which you expect to see "name1", then is the loose "(.*)\s+"
> desirable? Shouldn't it be tighter "(.+)\s+"? For instance:
> 
>     if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Yes and yes. I'll update those.

> > +                       $mailmap{$3} = ();
> 
> I wonder if you should be doing some sort of whitespace normalization
> on $3 before using it as a hash key. For instance, if someone has a
> .mailmap that looks like this (where I've used "." to represent
> space):
> 
>     name1.<email1>.name2...<email2>
> 
> then $3 will have three spaces between 'name2' and '<email2>' when
> used as a key, and that won't match later when you construct a "name
> <email>" key later in map_author() with a single space.

Yes, I hadn't considered that case.

Thanks.

  reply	other threads:[~2020-08-10  3:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30  4:12 [RFC PATCH] gitweb: Map names/emails with mailmap Emma Brooks
2020-07-30 16:20 ` Junio C Hamano
2020-07-31  1:01 ` Jeff King
2020-07-31  2:10   ` Junio C Hamano
2020-08-08 21:34 ` [PATCH] " Emma Brooks
2020-08-09 23:04   ` [PATCH v2] gitweb: map " Emma Brooks
2020-08-10  0:49     ` Eric Sunshine
2020-08-10  3:12       ` Emma Brooks [this message]
2020-08-10  5:41         ` Eric Sunshine
2020-08-10 10:02     ` Jeff King
2020-08-11  4:17       ` Emma Brooks
2020-08-11  4:48         ` Eric Sunshine
2020-08-11  4:55         ` Jeff King
2020-09-05  2:55           ` Emma Brooks
2020-09-05  3:26             ` Junio C Hamano
2020-09-07 22:10               ` Emma Brooks
2020-08-11  6:17         ` Eric Wong
2020-08-11  6:33           ` Joe Perches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200810031123.GA3565@pluvano.com \
    --to=me@pluvano.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).