From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 1CD981F55B for ; Sun, 10 May 2020 06:59:59 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] various doc updates ahead of 1.5.0 Date: Sun, 10 May 2020 06:59:59 +0000 Message-Id: <20200510065959.19072-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: --- Documentation/RelNotes/v1.5.0.eml | 40 +++++++++++++++++++-- Documentation/technical/data_structures.txt | 17 +++++---- TODO | 7 ++-- 3 files changed, 53 insertions(+), 11 deletions(-) diff --git a/Documentation/RelNotes/v1.5.0.eml b/Documentation/RelNotes/v1.5.0.eml index c9108c15..a9d8b241 100644 --- a/Documentation/RelNotes/v1.5.0.eml +++ b/Documentation/RelNotes/v1.5.0.eml @@ -5,21 +5,57 @@ MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline +This release introduces a new pure-Perl lazy email parser, +PublicInbox::Eml, which uses roughly 10% less memory and +is up to 2x faster than Email::MIME. This is a major +internal change + +Limits commonly enforced by MTAs are also enforced in the +new parser, as messages may bypass MTA transports. + +Email::MIME and other Email::* modules are no longer +dependencies nor used at all outside of maintainer validation +tests. + * public-inbox-index - `--max-size=SIZE' CLI switch and `publicinbox.indexMaxSize' - config file option added + config file option added to prevent indexing of overly + large messages. + + - List-Id headers are indexed in new messages, old messages + can be found after `--reindex'. * public-inbox-watch - multiple values of `publicinbox..watchheader' are - supported, thanks to Kyle Meyer + now supported, thanks to Kyle Meyer + + - List-Id headers are matched case-insensitively as specified + by RFC 2919 * PublicInbox::WWW - $INBOX_DIR/description and $INBOX_DIR/cloneurl are not memoized if missing + - improved display of threads, thanks to Kyle Meyer + + - search for List-Id is available via `l:' prefix if indexed + + - all encodings are preloaded at startup to reduce fragmentation + + - diffstat linkification and highlighting are stricter and + less likely to linkify tables in cover letters + + - fix hunk header links to solver which were off-by-one line, + thanks again to Kyle Meyer + +Release tarball available for download over HTTPS or Tor .onion: + +https://yhbt.net/public-inbox.git/snapshot/public-inbox-1.5.0.tar.gz +http://ou63pmih66umazou.onion/public-inbox.git/snapshot/public-inbox-1.5.0.tar.gz + Please report bugs via plain-text mail to: meta@public-inbox.org See archives at https://public-inbox.org/meta/ for all history. diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt index 46d5acff..8776a67b 100644 --- a/Documentation/technical/data_structures.txt +++ b/Documentation/technical/data_structures.txt @@ -28,14 +28,13 @@ Outside of tests, this is typically a singleton. Per-message classes ------------------- -* PublicInbox::MIME - Email::MIME subclass - Common abbreviation: $mime +* PublicInbox::Eml - Email::MIME-like class + Common abbreviation: $mime, $eml Used by: PublicInbox::WWW, PublicInbox::SearchIdx - An representation of an entire email, multipart or not. It's - a subclass of Email::MIME to workaround bugs in old - Email::MIME versions. An option to use libgmime or libmailutils - may be supported in the future for performance and memory use. + An representation of an entire email, multipart or not. + An option to use libgmime or libmailutils may be supported + in the future for performance and memory use. This can be a memory hog with big messages and giant attachments, so our PublicInbox::WWW interface only keeps @@ -47,6 +46,12 @@ Per-message classes Our PublicInbox::V2Writable class may have two objects of this type in memory at-a-time for deduplication. + In public-inbox 1.4 and earlier, Email::MIME and its subclass, + PublicInbox::MIME were used. Despite still slurping, + PublicInbox::Eml is faster and uses less memory due to + lazy header parsing and lazy subpart instantiation with + shorter object lifetimes. + * PublicInbox::Smsg - small message skeleton Used by: PublicInbox::{NNTP,WWW,SearchIdx} Common abbreviation: $smsg diff --git a/TODO b/TODO index 4c4e8e00..16de36bf 100644 --- a/TODO +++ b/TODO @@ -42,6 +42,7 @@ all need to be considered for everything we introduce) while retaining compatibility with old versions. * Support more of RFC 3977 (NNTP) + Is there anything left for read-only support? * Combined "super server" for NNTP/HTTP/POP3 to reduce memory overhead @@ -75,9 +76,9 @@ all need to be considered for everything we introduce) * linkify thread skeletons better https://public-inbox.org/git/6E3699DEA672430CAEA6DEFEDE6918F4@PhilipOakley/ -* low-memory Email::MIME replacement: currently we generate many - allocations/strings for headers we never look at and slurp - entire message bodies into memory. GMime+Inline::C could work. +* Further lower mail parser memory usage. We still slurp entire + message bodies into memory and incur 2-3x overhead on + multipart messages. Inline::C (and maybe gmime) could work. * use REQUEST_URI properly for CGI / mod_perl2 compatibility with Message-IDs which include '%' (done?)