user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <>
Subject: [PATCH] various doc updates ahead of 1.5.0
Date: Sun, 10 May 2020 06:59:59 +0000	[thread overview]
Message-ID: <> (raw)

 Documentation/RelNotes/v1.5.0.eml           | 40 +++++++++++++++++++--
 Documentation/technical/data_structures.txt | 17 +++++----
 TODO                                        |  7 ++--
 3 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/Documentation/RelNotes/v1.5.0.eml b/Documentation/RelNotes/v1.5.0.eml
index c9108c15..a9d8b241 100644
--- a/Documentation/RelNotes/v1.5.0.eml
+++ b/Documentation/RelNotes/v1.5.0.eml
@@ -5,21 +5,57 @@ MIME-Version: 1.0
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
+This release introduces a new pure-Perl lazy email parser,
+PublicInbox::Eml, which uses roughly 10% less memory and
+is up to 2x faster than Email::MIME.   This is a major
+internal change
+Limits commonly enforced by MTAs are also enforced in the
+new parser, as messages may bypass MTA transports.
+Email::MIME and other Email::* modules are no longer
+dependencies nor used at all outside of maintainer validation
 * public-inbox-index
   - `--max-size=SIZE' CLI switch and `publicinbox.indexMaxSize'
-     config file option added
+    config file option added to prevent indexing of overly
+    large messages.
+  - List-Id headers are indexed in new messages, old messages
+    can be found after `--reindex'.
 * public-inbox-watch
   - multiple values of `publicinbox.<name>.watchheader' are
-    supported, thanks to Kyle Meyer
+    now supported, thanks to Kyle Meyer
+  - List-Id headers are matched case-insensitively as specified
+    by RFC 2919
 * PublicInbox::WWW
   - $INBOX_DIR/description and $INBOX_DIR/cloneurl are not
     memoized if missing
+  - improved display of threads, thanks to Kyle Meyer
+  - search for List-Id is available via `l:' prefix if indexed
+  - all encodings are preloaded at startup to reduce fragmentation
+  - diffstat linkification and highlighting are stricter and
+    less likely to linkify tables in cover letters
+  - fix hunk header links to solver which were off-by-one line,
+    thanks again to Kyle Meyer
+Release tarball available for download over HTTPS or Tor .onion:
 Please report bugs via plain-text mail to:
 See archives at for all history.
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt
index 46d5acff..8776a67b 100644
--- a/Documentation/technical/data_structures.txt
+++ b/Documentation/technical/data_structures.txt
@@ -28,14 +28,13 @@ Outside of tests, this is typically a singleton.
 Per-message classes
-* PublicInbox::MIME - Email::MIME subclass
-  Common abbreviation: $mime
+* PublicInbox::Eml - Email::MIME-like class
+  Common abbreviation: $mime, $eml
   Used by: PublicInbox::WWW, PublicInbox::SearchIdx
-  An representation of an entire email, multipart or not.  It's
-  a subclass of Email::MIME to workaround bugs in old
-  Email::MIME versions.  An option to use libgmime or libmailutils
-  may be supported in the future for performance and memory use.
+  An representation of an entire email, multipart or not.
+  An option to use libgmime or libmailutils may be supported
+  in the future for performance and memory use.
   This can be a memory hog with big messages and giant
   attachments, so our PublicInbox::WWW interface only keeps
@@ -47,6 +46,12 @@ Per-message classes
   Our PublicInbox::V2Writable class may have two objects of this
   type in memory at-a-time for deduplication.
+  In public-inbox 1.4 and earlier, Email::MIME and its subclass,
+  PublicInbox::MIME were used.  Despite still slurping,
+  PublicInbox::Eml is faster and uses less memory due to
+  lazy header parsing and lazy subpart instantiation with
+  shorter object lifetimes.
 * PublicInbox::Smsg - small message skeleton
   Used by: PublicInbox::{NNTP,WWW,SearchIdx}
   Common abbreviation: $smsg
diff --git a/TODO b/TODO
index 4c4e8e00..16de36bf 100644
--- a/TODO
+++ b/TODO
@@ -42,6 +42,7 @@ all need to be considered for everything we introduce)
   while retaining compatibility with old versions.
 * Support more of RFC 3977 (NNTP)
+  Is there anything left for read-only support?
 * Combined "super server" for NNTP/HTTP/POP3 to reduce memory overhead
@@ -75,9 +76,9 @@ all need to be considered for everything we introduce)
 * linkify thread skeletons better
-* low-memory Email::MIME replacement: currently we generate many
-  allocations/strings for headers we never look at and slurp
-  entire message bodies into memory.  GMime+Inline::C could work.
+* Further lower mail parser memory usage.  We still slurp entire
+  message bodies into memory and incur 2-3x overhead on
+  multipart messages.  Inline::C (and maybe gmime) could work.
 * use REQUEST_URI properly for CGI / mod_perl2 compatibility
   with Message-IDs which include '%' (done?)

                 reply	other threads:[~2020-05-10  6:59 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).