From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 759711FB04 for ; Tue, 6 Mar 2018 08:42:44 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Subject: [PATCH 33/34] scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping Date: Tue, 6 Mar 2018 08:42:41 +0000 Message-Id: <20180306084242.19988-34-e@80x24.org> In-Reply-To: <20180306084242.19988-1-e@80x24.org> References: <20180306084242.19988-1-e@80x24.org> List-Id: It appears most of the mboxes in the archive I've been given are mboxrd (despite having Content-Length:) and needs the escaping. --- scripts/import_vger_from_mbox | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/scripts/import_vger_from_mbox b/scripts/import_vger_from_mbox index 4469887..6a00fae 100644 --- a/scripts/import_vger_from_mbox +++ b/scripts/import_vger_from_mbox @@ -11,11 +11,16 @@ use PublicInbox::Import; my $usage = "usage: $0 NAME EMAIL DIR \$dry_run, 'V|version=i' => \$version, + 'F|format=s' => \$variant, ); GetOptions(%opts) or die $usage; +if ($variant ne 'mboxrd' && $variant ne 'mboxo') { + die "Unsupported mbox variant: $variant\n"; +} my $name = shift or die $usage; # git my $email = shift or die $usage; # git@vger.kernel.org my $mainrepo = shift or die $usage; # /path/to/v2/repo @@ -45,6 +50,11 @@ sub do_add ($$) { my ($im, $msg) = @_; $$msg =~ s/(\r?\n)+\z/$1/s; my $mime = PublicInbox::MIME->new($msg); + if ($variant eq 'mboxrd') { + $$msg =~ s/^>(>*From )/$1/sm; + } elsif ($variant eq 'mboxo') { + $$msg =~ s/^>From /From /sm; + } $mime = $vger->scrub($mime); return unless $im; $im->add($mime) or -- EW