From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id B8B5C1F619; Wed, 26 Feb 2020 10:21:12 +0000 (UTC) Date: Wed, 26 Feb 2020 10:21:12 +0000 From: Eric Wong To: Leah Neukirchen Cc: meta@public-inbox.org Subject: [PATCH] import: drop '<' and '>' characters in addresses Message-ID: <20200226102112.GA28763@dcvr> References: <87h7zfemur.fsf@vuxu.org> <20200225092806.GB382@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200225092806.GB382@dcvr> List-Id: Eric Wong wrote: > Leah Neukirchen wrote: > > 2) Weird From: lines crash the whole import > > > > From: "=?iso-8859-1?Q?Jochen_K=FCpper?= > > > This funny line broke import_maildir: > > > > fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet <"=?iso-8859-1?Q?Jochen_K=FCpper?= 1101853296 +0100 > > fast-import: dumping crash report to /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402 > > EOF from fast-import: at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681. > > > > I fixed it manually. (But I think it's actually a valid mail address, > > even in this botched state.) I'm not sure what added the ">", it's > > not in the original mail. > > > > (I use public-inbox-1.3.0/git-2.25.0 on Void Linux.) > > Gah, this looks like it's because Email::Address::XS leaves a > "<" in the name... Perhaps Import should delete all [<>] > characters unconditionally? (or swap in appropriate Unicode > homographs and assume users have the necessary glyphs...) So we already do `$name =~ tr/<>//d', so I think doing the same with `$email' is appropiate for fast-import. The "correct" address featuring '<' will still be indexed in Xapian, at least. -------------8<------------- Subject: [PATCH] import: drop '<' and '>' characters in addresses Some strange "From:" lines will cause Email::Address::XS to leave '<' (and presumably '>') in the address which git-fast-import won't accept even if quoted. Workaround this problem by deleting '<' and '>' the same way we delete them for the ident name. Reported-by: Leah Neukirchen Link: https://public-inbox.org/meta/87h7zfemur.fsf@vuxu.org/ --- lib/PublicInbox/Import.pm | 4 ++++ t/import.t | 2 ++ 2 files changed, 6 insertions(+) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index d8dc49b8..68dc0c7e 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -293,6 +293,10 @@ sub extract_cmt_info ($) { } } if (defined $email) { + # Email::Address::XS may leave quoted '<' in addresses, + # which git-fast-import doesn't like + $email =~ tr/<>//d; + # quiet down wide character warnings with utf8::encode utf8::encode($email); } else { diff --git a/t/import.t b/t/import.t index e71dd714..b88d308e 100644 --- a/t/import.t +++ b/t/import.t @@ -55,6 +55,8 @@ $im->done; my @revs = $git->qx(qw(rev-list HEAD)); is(scalar @revs, 1, 'one revision created'); +my $odd = '"=?iso-8859-1?Q?J_K=FCpper?= header_set('From', $odd); $mime->header_set('Message-ID', ''); $mime->header_set('Subject', 'msg2'); like($im->add($mime, sub { $mime }), qr/\A:\d+\z/, 'added 2nd message');