about summary refs log tree commit
diff options
authorEric Wong <e@yhbt.net>2020-05-09 09:09:00 +0000
committerEric Wong <e@yhbt.net>2020-05-10 07:00:16 +0000
commitcc5d9ec286f758de07b57087cfd537759b93dabe (patch)
parent8b44e99ec009508d7e050ee44d34a1cf0f111dd5 (diff)
3 files changed, 53 insertions, 11 deletions
diff --git a/Documentation/RelNotes/v1.5.0.eml b/Documentation/RelNotes/v1.5.0.eml
index c9108c15..a9d8b241 100644
--- a/Documentation/RelNotes/v1.5.0.eml
+++ b/Documentation/RelNotes/v1.5.0.eml
@@ -5,21 +5,57 @@ MIME-Version: 1.0
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
+This release introduces a new pure-Perl lazy email parser,
+PublicInbox::Eml, which uses roughly 10% less memory and
+is up to 2x faster than Email::MIME.   This is a major
+internal change
+Limits commonly enforced by MTAs are also enforced in the
+new parser, as messages may bypass MTA transports.
+Email::MIME and other Email::* modules are no longer
+dependencies nor used at all outside of maintainer validation
 * public-inbox-index
   - `--max-size=SIZE' CLI switch and `publicinbox.indexMaxSize'
-     config file option added
+    config file option added to prevent indexing of overly
+    large messages.
+  - List-Id headers are indexed in new messages, old messages
+    can be found after `--reindex'.
 * public-inbox-watch
   - multiple values of `publicinbox.<name>.watchheader' are
-    supported, thanks to Kyle Meyer
+    now supported, thanks to Kyle Meyer
+  - List-Id headers are matched case-insensitively as specified
+    by RFC 2919
 * PublicInbox::WWW
   - $INBOX_DIR/description and $INBOX_DIR/cloneurl are not
     memoized if missing
+  - improved display of threads, thanks to Kyle Meyer
+  - search for List-Id is available via `l:' prefix if indexed
+  - all encodings are preloaded at startup to reduce fragmentation
+  - diffstat linkification and highlighting are stricter and
+    less likely to linkify tables in cover letters
+  - fix hunk header links to solver which were off-by-one line,
+    thanks again to Kyle Meyer
+Release tarball available for download over HTTPS or Tor .onion:
 Please report bugs via plain-text mail to: meta@public-inbox.org
 See archives at https://public-inbox.org/meta/ for all history.
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt
index 46d5acff..8776a67b 100644
--- a/Documentation/technical/data_structures.txt
+++ b/Documentation/technical/data_structures.txt
@@ -28,14 +28,13 @@ Outside of tests, this is typically a singleton.
 Per-message classes
-* PublicInbox::MIME - Email::MIME subclass
-  Common abbreviation: $mime
+* PublicInbox::Eml - Email::MIME-like class
+  Common abbreviation: $mime, $eml
   Used by: PublicInbox::WWW, PublicInbox::SearchIdx
-  An representation of an entire email, multipart or not.  It's
-  a subclass of Email::MIME to workaround bugs in old
-  Email::MIME versions.  An option to use libgmime or libmailutils
-  may be supported in the future for performance and memory use.
+  An representation of an entire email, multipart or not.
+  An option to use libgmime or libmailutils may be supported
+  in the future for performance and memory use.
   This can be a memory hog with big messages and giant
   attachments, so our PublicInbox::WWW interface only keeps
@@ -47,6 +46,12 @@ Per-message classes
   Our PublicInbox::V2Writable class may have two objects of this
   type in memory at-a-time for deduplication.
+  In public-inbox 1.4 and earlier, Email::MIME and its subclass,
+  PublicInbox::MIME were used.  Despite still slurping,
+  PublicInbox::Eml is faster and uses less memory due to
+  lazy header parsing and lazy subpart instantiation with
+  shorter object lifetimes.
 * PublicInbox::Smsg - small message skeleton
   Used by: PublicInbox::{NNTP,WWW,SearchIdx}
   Common abbreviation: $smsg
diff --git a/TODO b/TODO
index 4c4e8e00..16de36bf 100644
--- a/TODO
+++ b/TODO
@@ -42,6 +42,7 @@ all need to be considered for everything we introduce)
   while retaining compatibility with old versions.
 * Support more of RFC 3977 (NNTP)
+  Is there anything left for read-only support?
 * Combined "super server" for NNTP/HTTP/POP3 to reduce memory overhead
@@ -75,9 +76,9 @@ all need to be considered for everything we introduce)
 * linkify thread skeletons better
-* low-memory Email::MIME replacement: currently we generate many
-  allocations/strings for headers we never look at and slurp
-  entire message bodies into memory.  GMime+Inline::C could work.
+* Further lower mail parser memory usage.  We still slurp entire
+  message bodies into memory and incur 2-3x overhead on
+  multipart messages.  Inline::C (and maybe gmime) could work.
 * use REQUEST_URI properly for CGI / mod_perl2 compatibility
   with Message-IDs which include '%' (done?)