From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.1 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,T_RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id BB74B1F404 for ; Thu, 4 Jan 2018 19:15:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751957AbeADTPg (ORCPT ); Thu, 4 Jan 2018 14:15:36 -0500 Received: from mut-mta1-se01a-zose1-fr.yulpa.io ([185.49.21.248]:42026 "EHLO mut-mta1-se01a-zose1-fr.yulpa.io" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750990AbeADTPf (ORCPT ); Thu, 4 Jan 2018 14:15:35 -0500 X-Greylist: delayed 1130 seconds by postgrey-1.27 at vger.kernel.org; Thu, 04 Jan 2018 14:15:34 EST Received: from zose-mx-out01.web4all.fr ([185.49.20.46] helo=zose-mta-hub-out-mua-02.web4all.fr) by mut-mta1-se01b-fr.yulpa.io with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1eXAgx-000TAe-UT; Thu, 04 Jan 2018 19:56:42 +0100 Received: from localhost (localhost [127.0.0.1]) by zose-mta-hub-out-mua-02.web4all.fr (Postfix) with ESMTP id D976E60E7D; Thu, 4 Jan 2018 19:56:38 +0100 (CET) Received: from zose-mta-hub-out-mua-02.web4all.fr ([127.0.0.1]) by localhost (zose-mta-hub-out-mua-02.web4all.fr [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id dYzlTxKLTnIN; Thu, 4 Jan 2018 19:56:37 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zose-mta-hub-out-mua-02.web4all.fr (Postfix) with ESMTP id 7016F60EAF; Thu, 4 Jan 2018 19:56:37 +0100 (CET) X-Virus-Scanned: amavisd-new at zose1.web4all.fr Received: from zose-mta-hub-out-mua-02.web4all.fr ([127.0.0.1]) by localhost (zose-mta-hub-out-mua-02.web4all.fr [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LwrTGY47QarT; Thu, 4 Jan 2018 19:56:37 +0100 (CET) Received: from gaming.numericable.fr (3.136.192.77.rev.sfr.net [77.192.136.3]) (Authenticated sender: matthieu.moy@matthieu-moy.fr) by zose-mta-hub-out-mua-02.web4all.fr (Postfix) with ESMTPA id B425160E7D; Thu, 4 Jan 2018 19:56:35 +0100 (CET) From: Matthieu Moy To: git@vger.kernel.org Cc: gitster@pobox.com, =?UTF-8?q?Alex=20Benn=C3=A9e?= , =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= , Thomas Adam , Matthieu Moy Subject: [RFC PATCH 1/2] add a local copy of Mail::Address from CPAN Date: Thu, 4 Jan 2018 19:55:50 +0100 Message-Id: <1515092151-14423-1-git-send-email-git@matthieu-moy.fr> X-Mailer: git-send-email 2.8.1.116.g7b0d47b X-Originating-IP: 185.49.20.46 X-SpamExperts-Domain: zose1.web4all.fr X-SpamExperts-Username: 185.49.20.46 Authentication-Results: yulpa.io; auth=pass smtp.auth=185.49.20.46@zose1.web4all.fr X-SpamExperts-Outgoing-Class: ham X-SpamExperts-Outgoing-Evidence: SB/global_tokens (1.95671994341e-05) X-Recommended-Action: accept X-Filter-ID: EX5BVjFpneJeBchSMxfU5jvaUHvW3mZArwLaomc+64Dj1g3/PwYZaTCzSym8uE9HPImhfA8k5/zW UdbqUfA/+atZAIrjsCRyglG39ay7s40zkSakW8+kgCjGkeOx1qmz3ywmUacsjxjAsJqLMS98bO5z ycA319z5HV3CMmoHwxoemuNNL7GGp+zW501rEhQ5LqisX5+l+fRB3wYlhH1KyOfHzJ6mVE7ewsip SVIfs4aYzdyjA+zpECS+EnOlqhZVZoJMI94JyDNTYfA50Otxn/jk0+YoVjbPYBbwrj5eWUnn05F9 RrS4QR520oMvm0hNEPKqj4wpSPprxfaL4KUPa+d7ax+3+Jf/9S3laQAPwF4OOPS2BLJ6Do0LA2bx tTWVqi2fY8oW0hCeiKrXsIHuXDVcbwW2vZRbwlTsaou4YdkkTMgPdccxEqDs1YXBmdsdph+zJwql Pd7TjGfWxGLgmXGIZgivQPDIYm4WSM8blQV7LjJv6mzjqKuJJhZmAO87f5jxqIuHmumvO07FITnq MT3dKxLhoxcmaInYbR5vlqHMyA/lkPBbC6c0vlSRGDQGZ79loFWGs6aHjUJ9h7NVQthDlN3ZFexZ fYgAG9qTPTpp5PS1j/T/np9d21JHIowhCFJI24wA8tcqtUtSzhOuQi6pUSf119XuQl7jsFm5sSZN eqci/aawdVLtSFQgz7/W9sJScQuIblFoteKPB32SihYyV5KHfiKBno8OsMogRnWlQ9fDAeXihC1X mRU9VPw70z6bhalFEM/pjPCQA+BAlngHvY0q9Gw5NOloxVW0K/tO3saCFey4hE3JznGRzrizu129 acHYDxLXrLFZ27x/iYY+ffS7L/J3xoHIIC0If0Zc9aUV1oY4fX3W5eOCNA39aaEeNVeYHTQ4qc4p rBw6OWxEnZ+B9jntbFMsB5h9L3Pcq6EXKX3cZxLzrCA9pYo1HvE2dZcCZBCEfYvQUHiJ0yEw/ggH k6NxP0mFXr1FslXeNHk15VolAGHS5rCXQKDyHv3TiDImoWFJv2x8Vhm7/dDfeKdHWm+j+q2hZS63 ia/tIkSE0ngbX0T3LvR1Khr2Eo2bTut7ztDdNHV3VH6Kds23J7LHDHZzGBAiCeuHbrg= X-Report-Abuse-To: spam@mut-mta1-se01a-fr.yulpa.io Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We used to have two versions of the email parsing code. Our parse_mailboxes (in Git.pm), and Mail::Address which we used if installed. Unfortunately, both versions have different sets of bugs, and changing the behavior of git depending on whether Mail::Address is installed was a bad idea. A first attempt to solve this was cc90750 (send-email: don't use Mail::Address, even if available, 2017-08-23), but it turns out our parse_mailboxes is too buggy for some uses. For example the lack of about nested comments support breaks get_maintainer.pl in the Linux kernel tree: https://public-inbox.org/git/20171116154814.23785-1-alex.bennee@linaro.org/ This patch goes the other way: use Mail::Address anyway, but have a local copy as a fallback, when the system one is not available. The duplicated script is small (276 lines of code) and stable in time. Maintaining the local copy should not be an issue, and will certainly be less burden than maintaining our own parse_mailboxes. Another option would be to consider Mail::Address as a hard dependency, but it's easy enough to save the trouble of extra-dependency to the end user or packager. Signed-off-by: Matthieu Moy --- I looked at the perl/Git/Error.pm wrapper, and ended up writting a different, much simpler version. I'm not sure the same approach would apply to Error.pm, but my straightforward version does the job for Mail/Address.pm. I would also be fine with using our local copy unconditionaly. git-send-email.perl | 3 +- perl/Git/FromCPAN/Mail/Address.pm | 276 ++++++++++++++++++++++++++++++++++++++ perl/Git/Mail/Address.pm | 24 ++++ 3 files changed, 302 insertions(+), 1 deletion(-) create mode 100644 perl/Git/FromCPAN/Mail/Address.pm create mode 100755 perl/Git/Mail/Address.pm diff --git a/git-send-email.perl b/git-send-email.perl index 02747b6..d0dcc6d 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -30,6 +30,7 @@ use Git::Error qw(:try); use Cwd qw(abs_path cwd); use Git; use Git::I18N; +use Git::Mail::Address; Getopt::Long::Configure qw/ pass_through /; @@ -489,7 +490,7 @@ my ($repoauthor, $repocommitter); ($repocommitter) = Git::ident_person(@repo, 'committer'); sub parse_address_line { - return Git::parse_mailboxes($_[0]); + return map { $_->format } Mail::Address->parse($_[0]); } sub split_addrs { diff --git a/perl/Git/FromCPAN/Mail/Address.pm b/perl/Git/FromCPAN/Mail/Address.pm new file mode 100644 index 0000000..13b2ff7 --- /dev/null +++ b/perl/Git/FromCPAN/Mail/Address.pm @@ -0,0 +1,276 @@ +# Copyrights 1995-2017 by [Mark Overmeer ]. +# For other contributors see ChangeLog. +# See the manual pages for details on the licensing terms. +# Pod stripped from pm file by OODoc 2.02. +package Mail::Address; +use vars '$VERSION'; +$VERSION = '2.19'; + +use strict; + +use Carp; + +# use locale; removed in version 1.78, because it causes taint problems + +sub Version { our $VERSION } + + + +# given a comment, attempt to extract a person's name +sub _extract_name +{ # This function can be called as method as well + my $self = @_ && ref $_[0] ? shift : undef; + + local $_ = shift + or return ''; + + # Using encodings, too hard. See Mail::Message::Field::Full. + return '' if m/\=\?.*?\?\=/; + + # trim whitespace + s/^\s+//; + s/\s+$//; + s/\s+/ /; + + # Disregard numeric names (e.g. 123456.1234@compuserve.com) + return "" if /^[\d ]+$/; + + s/^\((.*)\)$/$1/; # remove outermost parenthesis + s/^"(.*)"$/$1/; # remove outer quotation marks + s/\(.*?\)//g; # remove minimal embedded comments + s/\\//g; # remove all escapes + s/^"(.*)"$/$1/; # remove internal quotation marks + s/^([^\s]+) ?, ?(.*)$/$2 $1/; # reverse "Last, First M." if applicable + s/,.*//; + + # Change casing only when the name contains only upper or only + # lower cased characters. + unless( m/[A-Z]/ && m/[a-z]/ ) + { # Set the case of the name to first char upper rest lower + s/\b(\w+)/\L\u$1/igo; # Upcase first letter on name + s/\bMc(\w)/Mc\u$1/igo; # Scottish names such as 'McLeod' + s/\bo'(\w)/O'\u$1/igo; # Irish names such as 'O'Malley, O'Reilly' + s/\b(x*(ix)?v*(iv)?i*)\b/\U$1/igo; # Roman numerals, eg 'Level III Support' + } + + # some cleanup + s/\[[^\]]*\]//g; + s/(^[\s'"]+|[\s'"]+$)//g; + s/\s{2,}/ /g; + + $_; +} + +sub _tokenise +{ local $_ = join ',', @_; + my (@words,$snippet,$field); + + s/\A\s+//; + s/[\r\n]+/ /g; + + while ($_ ne '') + { $field = ''; + if(s/^\s*\(/(/ ) # (...) + { my $depth = 0; + + PAREN: while(s/^(\(([^\(\)\\]|\\.)*)//) + { $field .= $1; + $depth++; + while(s/^(([^\(\)\\]|\\.)*\)\s*)//) + { $field .= $1; + last PAREN unless --$depth; + $field .= $1 if s/^(([^\(\)\\]|\\.)+)//; + } + } + + carp "Unmatched () '$field' '$_'" + if $depth; + + $field =~ s/\s+\Z//; + push @words, $field; + + next; + } + + if( s/^("(?:[^"\\]+|\\.)*")\s*// # "..." + || s/^(\[(?:[^\]\\]+|\\.)*\])\s*// # [...] + || s/^([^\s()<>\@,;:\\".[\]]+)\s*// + || s/^([()<>\@,;:\\".[\]])\s*// + ) + { push @words, $1; + next; + } + + croak "Unrecognised line: $_"; + } + + push @words, ","; + \@words; +} + +sub _find_next +{ my ($idx, $tokens, $len) = @_; + + while($idx < $len) + { my $c = $tokens->[$idx]; + return $c if $c eq ',' || $c eq ';' || $c eq '<'; + $idx++; + } + + ""; +} + +sub _complete +{ my ($class, $phrase, $address, $comment) = @_; + + @$phrase || @$comment || @$address + or return undef; + + my $o = $class->new(join(" ",@$phrase), join("",@$address), join(" ",@$comment)); + @$phrase = @$address = @$comment = (); + $o; +} + +#------------ + +sub new(@) +{ my $class = shift; + bless [@_], $class; +} + + +sub parse(@) +{ my $class = shift; + my @line = grep {defined} @_; + my $line = join '', @line; + + my (@phrase, @comment, @address, @objs); + my ($depth, $idx) = (0, 0); + + my $tokens = _tokenise @line; + my $len = @$tokens; + my $next = _find_next $idx, $tokens, $len; + + local $_; + for(my $idx = 0; $idx < $len; $idx++) + { $_ = $tokens->[$idx]; + + if(substr($_,0,1) eq '(') { push @comment, $_ } + elsif($_ eq '<') { $depth++ } + elsif($_ eq '>') { $depth-- if $depth } + elsif($_ eq ',' || $_ eq ';') + { warn "Unmatched '<>' in $line" if $depth; + my $o = $class->_complete(\@phrase, \@address, \@comment); + push @objs, $o if defined $o; + $depth = 0; + $next = _find_next $idx+1, $tokens, $len; + } + elsif($depth) { push @address, $_ } + elsif($next eq '<') { push @phrase, $_ } + elsif( /^[.\@:;]$/ || !@address || $address[-1] =~ /^[.\@:;]$/ ) + { push @address, $_ } + else + { warn "Unmatched '<>' in $line" if $depth; + my $o = $class->_complete(\@phrase, \@address, \@comment); + push @objs, $o if defined $o; + $depth = 0; + push @address, $_; + } + } + @objs; +} + +#------------ + +sub phrase { shift->set_or_get(0, @_) } +sub address { shift->set_or_get(1, @_) } +sub comment { shift->set_or_get(2, @_) } + +sub set_or_get($) +{ my ($self, $i) = (shift, shift); + @_ or return $self->[$i]; + + my $val = $self->[$i]; + $self->[$i] = shift if @_; + $val; +} + + +my $atext = '[\-\w !#$%&\'*+/=?^`{|}~]'; +sub format +{ my @addrs; + + foreach (@_) + { my ($phrase, $email, $comment) = @$_; + my @addr; + + if(defined $phrase && length $phrase) + { push @addr + , $phrase =~ /^(?:\s*$atext\s*)+$/o ? $phrase + : $phrase =~ /(?" + if defined $email && length $email; + } + elsif(defined $email && length $email) + { push @addr, $email; + } + + if(defined $comment && $comment =~ /\S/) + { $comment =~ s/^\s*\(?/(/; + $comment =~ s/\)?\s*$/)/; + } + + push @addr, $comment + if defined $comment && length $comment; + + push @addrs, join(" ", @addr) + if @addr; + } + + join ", ", @addrs; +} + +#------------ + +sub name +{ my $self = shift; + my $phrase = $self->phrase; + my $addr = $self->address; + + $phrase = $self->comment + unless defined $phrase && length $phrase; + + my $name = $self->_extract_name($phrase); + + # first.last@domain address + if($name eq '' && $addr =~ /([^\%\.\@_]+([\._][^\%\.\@_]+)+)[\@\%]/) + { ($name = $1) =~ s/[\._]+/ /g; + $name = _extract_name $name; + } + + if($name eq '' && $addr =~ m#/g=#i) # X400 style address + { my ($f) = $addr =~ m#g=([^/]*)#i; + my ($l) = $addr =~ m#s=([^/]*)#i; + $name = _extract_name "$f $l"; + } + + length $name ? $name : undef; +} + + +sub host +{ my $addr = shift->address || ''; + my $i = rindex $addr, '@'; + $i >= 0 ? substr($addr, $i+1) : undef; +} + + +sub user +{ my $addr = shift->address || ''; + my $i = rindex $addr, '@'; + $i >= 0 ? substr($addr,0,$i) : $addr; +} + +1; diff --git a/perl/Git/Mail/Address.pm b/perl/Git/Mail/Address.pm new file mode 100755 index 0000000..2ce3e84 --- /dev/null +++ b/perl/Git/Mail/Address.pm @@ -0,0 +1,24 @@ +package Git::Mail::Address; +use 5.008; +use strict; +use warnings; + +=head1 NAME + +Git::Mail::Address - Wrapper for the L module, in case it's not installed + +=head1 DESCRIPTION + +This module is only intended to be used for code shipping in the +C repository. Use it for anything else at your peril! + +=cut + +eval { + require Mail::Address; + 1; +} or do { + require Git::FromCPAN::Mail::Address; +}; + +1; -- 2.8.1.116.g7b0d47b