user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@yhbt.net>
To: Leah Neukirchen <leah@vuxu.org>
Cc: meta@public-inbox.org
Subject: [PATCH] import: drop '<' and '>' characters in addresses
Date: Wed, 26 Feb 2020 10:21:12 +0000	[thread overview]
Message-ID: <20200226102112.GA28763@dcvr> (raw)
In-Reply-To: <20200225092806.GB382@dcvr>

Eric Wong <e@yhbt.net> wrote:
> Leah Neukirchen <leah@vuxu.org> wrote:
> > 2) Weird From: lines crash the whole import
> > 
> > From: "=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de
> > 
> > This funny line broke import_maildir:
> > 
> > fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet <"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de> 1101853296 +0100
> > fast-import: dumping crash report to /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402
> > EOF from fast-import:  at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681.
> > 
> > I fixed it manually.  (But I think it's actually a valid mail address,
> > even in this botched state.)  I'm not sure what added the ">", it's
> > not in the original mail.
> > 
> > (I use public-inbox-1.3.0/git-2.25.0 on Void Linux.)
> 
> Gah, this looks like it's because Email::Address::XS leaves a
> "<" in the name...   Perhaps Import should delete all [<>]
> characters unconditionally? (or swap in appropriate Unicode
> homographs and assume users have the necessary glyphs...)

So we already do `$name =~ tr/<>//d', so I think doing the same
with `$email' is appropiate for fast-import.  The "correct"
address featuring '<' will still be indexed in Xapian, at least.

-------------8<-------------
Subject: [PATCH] import: drop '<' and '>' characters in addresses

Some strange "From:" lines will cause Email::Address::XS to
leave '<' (and presumably '>') in the address which
git-fast-import won't accept even if quoted.  Workaround this
problem by deleting '<' and '>' the same way we delete them for
the ident name.

Reported-by: Leah Neukirchen <leah@vuxu.org>
Link: https://public-inbox.org/meta/87h7zfemur.fsf@vuxu.org/
---
 lib/PublicInbox/Import.pm | 4 ++++
 t/import.t                | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index d8dc49b8..68dc0c7e 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -293,6 +293,10 @@ sub extract_cmt_info ($) {
 		}
 	}
 	if (defined $email) {
+		# Email::Address::XS may leave quoted '<' in addresses,
+		# which git-fast-import doesn't like
+		$email =~ tr/<>//d;
+
 		# quiet down wide character warnings with utf8::encode
 		utf8::encode($email);
 	} else {
diff --git a/t/import.t b/t/import.t
index e71dd714..b88d308e 100644
--- a/t/import.t
+++ b/t/import.t
@@ -55,6 +55,8 @@ $im->done;
 my @revs = $git->qx(qw(rev-list HEAD));
 is(scalar @revs, 1, 'one revision created');
 
+my $odd = '"=?iso-8859-1?Q?J_K=FCpper?= <usenet"@example.de';
+$mime->header_set('From', $odd);
 $mime->header_set('Message-ID', '<b@example.com>');
 $mime->header_set('Subject', 'msg2');
 like($im->add($mime, sub { $mime }), qr/\A:\d+\z/, 'added 2nd message');

      reply	other threads:[~2020-02-26 10:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 20:45 Two small issues when importing old archives Leah Neukirchen
2020-02-25  9:23 ` [RFC] msgtime: do not require tz offset with Date::Parse fallback Eric Wong
2020-03-01 23:31   ` [pushed] msgtime: assume +0000 if TZ missing when using Date::Parse Eric Wong
2020-02-25  9:28 ` weird From: lines [was: Two small issues when importing old archives] Eric Wong
2020-02-26 10:21   ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200226102112.GA28763@dcvr \
    --to=e@yhbt.net \
    --cc=leah@vuxu.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).