From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Status: No, score=-2.7 required=3.0 tests=AWL,BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7113E1F619 for ; Mon, 24 Feb 2020 20:45:05 +0000 (UTC) Received: by mail-wr1-f50.google.com with SMTP id z15so3703406wrl.1 for ; Mon, 24 Feb 2020 12:45:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:user-agent :mime-version; bh=JVYPgmOJUqbF2YvQi7+w9z05F7HyR34NANMZ64Ws7qY=; b=secK7+MsOHIW+D2oAHsdiLDmqvW6IcGId8e7G4VmcHpnkoOjf0xCagcIG/QS3Bq61O httAgrFLrg4Cn0J0aasveqcdRzGtMKjx9j1Xn57qzLnkav5uyoDPbJkfAAWSH5j33Cz8 YeC6CcVRyhP8Ccl2dkgzceQw2bLYwnLuNhQ7Y+5it5XqAunLP8T6JNmNoRTU8yJccD4T xThcNmGy6QLiqKws+JF+u+14ZLYgcmdrIz01bym0GDtmNEZLifA41T/Lj1tXqeOgI5yn QueGMNx6am8/28UWPGVLUYiwguTXOowzXeiUG5I++J5dm9YgMjhpuAlAiLTjXvJmsJie Zc1Q== X-Gm-Message-State: APjAAAVv5jFpC9anPDLECqydld/h0ry9w22aBUxkII95pqwYF6OlsC2r z86EogapFRhv3r5c9HtpMOB19gxT X-Google-Smtp-Source: APXvYqz+eWypt4dxDl26VBRzMyf2qtPMQ0a/6fJp6/jOXQh0XM7mFQrfUkRruCHnZ/BXuP1jI9v3+g== X-Received: by 2002:a5d:6708:: with SMTP id o8mr71536831wru.296.1582577103375; Mon, 24 Feb 2020 12:45:03 -0800 (PST) Received: from rhea.home.vuxu.org ([2001:4ca0:0:f230:5c16:97e:c20e:9918]) by smtp.gmail.com with ESMTPSA id y7sm1040118wmd.1.2020.02.24.12.45.02 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 24 Feb 2020 12:45:02 -0800 (PST) Received: from localhost (rhea.home.vuxu.org [local]) by rhea.home.vuxu.org (OpenSMTPD) with ESMTPA id 0fd59ddc for ; Mon, 24 Feb 2020 20:45:00 +0000 (UTC) From: Leah Neukirchen To: meta@public-inbox.org Subject: Two small issues when importing old archives Date: Mon, 24 Feb 2020 21:45:00 +0100 Message-ID: <87h7zfemur.fsf@vuxu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain List-Id: Hi, I've recently imported some sizable archives (~100k messages) of old mailing lists and noticed some slight inconveniences: 1) RFC5322/822 invalid Date: headers should be parsed more gracefully Some old mails had Date: headers without time zones, e.g. Date: Sat, 27 Sep 1997 10:02:32 This results in public-inbox asserting this is the current date. But this assumption makes no sense (literally every other guess would be more likely), and also results in these messages showing up on the first page of the archive. Furthermore, sorting is then not stable, pressing F5 make the threads jump around. I'd recommend falling back to +0000 instead. 2) Weird From: lines crash the whole import From: "=?iso-8859-1?Q?Jochen_K=FCpper?= in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet <"=?iso-8859-1?Q?Jochen_K=FCpper?= 1101853296 +0100 fast-import: dumping crash report to /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402 EOF from fast-import: at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681. I fixed it manually. (But I think it's actually a valid mail address, even in this botched state.) I'm not sure what added the ">", it's not in the original mail. (I use public-inbox-1.3.0/git-2.25.0 on Void Linux.) thx, -- Leah Neukirchen https://leahneukirchen.org/