From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, T_FILL_THIS_FORM_SHORT shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 198831FAF4 for ; Thu, 15 Feb 2018 11:08:46 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Subject: [WIP 17/17] import: quiet down warnings from bogus From: lines Date: Thu, 15 Feb 2018 11:08:40 +0000 Message-Id: <20180215110840.30413-18-e@80x24.org> In-Reply-To: <20180215110840.30413-1-e@80x24.org> References: <20180215105509.GA22409@dcvr> <20180215110840.30413-1-e@80x24.org> List-Id: There's a lot of crap in archives and git-fast-import accepts empty names and email addresses for authors just fine. --- lib/PublicInbox/Import.pm | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index 845fbb6..f8d1003 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -246,11 +246,6 @@ sub add { my $from = $mime->header('From'); my ($email) = PublicInbox::Address::emails($from); my ($name) = PublicInbox::Address::names($from); - # git gets confused with: - # "'A U Thor ' via foo" - # ref: - # - $name =~ tr/<>//d; my $date_raw = parse_date($mime); my $subject = $mime->header('Subject'); @@ -297,10 +292,26 @@ sub add { print $w "reset $ref\n" or wfail; } - utf8::encode($email); - utf8::encode($name); + # quiet down wide character warnings with utf8::encode + if (defined $email) { + utf8::encode($email); + } else { + $email = ''; + warn "no email in From: $from\n"; + } + + # git gets confused with: + # "'A U Thor ' via foo" + # ref: + # + if (defined $name) { + $name =~ tr/<>//d; + utf8::encode($name); + } else { + $name = ''; + warn "no name in From: $from\n"; + } utf8::encode($subject); - # quiet down wide character warnings: print $w "commit $ref\nmark :$commit\n", "author $name <$email> $date_raw\n", "committer $self->{ident} ", now_raw(), "\n" or wfail; -- EW