* [PATCH] various doc updates ahead of 1.5.0
@ 2020-05-10 6:59 Eric Wong
0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2020-05-10 6:59 UTC (permalink / raw)
To: meta
---
Documentation/RelNotes/v1.5.0.eml | 40 +++++++++++++++++++--
Documentation/technical/data_structures.txt | 17 +++++----
TODO | 7 ++--
3 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/Documentation/RelNotes/v1.5.0.eml b/Documentation/RelNotes/v1.5.0.eml
index c9108c15..a9d8b241 100644
--- a/Documentation/RelNotes/v1.5.0.eml
+++ b/Documentation/RelNotes/v1.5.0.eml
@@ -5,21 +5,57 @@ MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
+This release introduces a new pure-Perl lazy email parser,
+PublicInbox::Eml, which uses roughly 10% less memory and
+is up to 2x faster than Email::MIME. This is a major
+internal change
+
+Limits commonly enforced by MTAs are also enforced in the
+new parser, as messages may bypass MTA transports.
+
+Email::MIME and other Email::* modules are no longer
+dependencies nor used at all outside of maintainer validation
+tests.
+
* public-inbox-index
- `--max-size=SIZE' CLI switch and `publicinbox.indexMaxSize'
- config file option added
+ config file option added to prevent indexing of overly
+ large messages.
+
+ - List-Id headers are indexed in new messages, old messages
+ can be found after `--reindex'.
* public-inbox-watch
- multiple values of `publicinbox.<name>.watchheader' are
- supported, thanks to Kyle Meyer
+ now supported, thanks to Kyle Meyer
+
+ - List-Id headers are matched case-insensitively as specified
+ by RFC 2919
* PublicInbox::WWW
- $INBOX_DIR/description and $INBOX_DIR/cloneurl are not
memoized if missing
+ - improved display of threads, thanks to Kyle Meyer
+
+ - search for List-Id is available via `l:' prefix if indexed
+
+ - all encodings are preloaded at startup to reduce fragmentation
+
+ - diffstat linkification and highlighting are stricter and
+ less likely to linkify tables in cover letters
+
+ - fix hunk header links to solver which were off-by-one line,
+ thanks again to Kyle Meyer
+
+Release tarball available for download over HTTPS or Tor .onion:
+
+https://yhbt.net/public-inbox.git/snapshot/public-inbox-1.5.0.tar.gz
+http://ou63pmih66umazou.onion/public-inbox.git/snapshot/public-inbox-1.5.0.tar.gz
+
Please report bugs via plain-text mail to: meta@public-inbox.org
See archives at https://public-inbox.org/meta/ for all history.
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt
index 46d5acff..8776a67b 100644
--- a/Documentation/technical/data_structures.txt
+++ b/Documentation/technical/data_structures.txt
@@ -28,14 +28,13 @@ Outside of tests, this is typically a singleton.
Per-message classes
-------------------
-* PublicInbox::MIME - Email::MIME subclass
- Common abbreviation: $mime
+* PublicInbox::Eml - Email::MIME-like class
+ Common abbreviation: $mime, $eml
Used by: PublicInbox::WWW, PublicInbox::SearchIdx
- An representation of an entire email, multipart or not. It's
- a subclass of Email::MIME to workaround bugs in old
- Email::MIME versions. An option to use libgmime or libmailutils
- may be supported in the future for performance and memory use.
+ An representation of an entire email, multipart or not.
+ An option to use libgmime or libmailutils may be supported
+ in the future for performance and memory use.
This can be a memory hog with big messages and giant
attachments, so our PublicInbox::WWW interface only keeps
@@ -47,6 +46,12 @@ Per-message classes
Our PublicInbox::V2Writable class may have two objects of this
type in memory at-a-time for deduplication.
+ In public-inbox 1.4 and earlier, Email::MIME and its subclass,
+ PublicInbox::MIME were used. Despite still slurping,
+ PublicInbox::Eml is faster and uses less memory due to
+ lazy header parsing and lazy subpart instantiation with
+ shorter object lifetimes.
+
* PublicInbox::Smsg - small message skeleton
Used by: PublicInbox::{NNTP,WWW,SearchIdx}
Common abbreviation: $smsg
diff --git a/TODO b/TODO
index 4c4e8e00..16de36bf 100644
--- a/TODO
+++ b/TODO
@@ -42,6 +42,7 @@ all need to be considered for everything we introduce)
while retaining compatibility with old versions.
* Support more of RFC 3977 (NNTP)
+ Is there anything left for read-only support?
* Combined "super server" for NNTP/HTTP/POP3 to reduce memory overhead
@@ -75,9 +76,9 @@ all need to be considered for everything we introduce)
* linkify thread skeletons better
https://public-inbox.org/git/6E3699DEA672430CAEA6DEFEDE6918F4@PhilipOakley/
-* low-memory Email::MIME replacement: currently we generate many
- allocations/strings for headers we never look at and slurp
- entire message bodies into memory. GMime+Inline::C could work.
+* Further lower mail parser memory usage. We still slurp entire
+ message bodies into memory and incur 2-3x overhead on
+ multipart messages. Inline::C (and maybe gmime) could work.
* use REQUEST_URI properly for CGI / mod_perl2 compatibility
with Message-IDs which include '%' (done?)
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2020-05-10 6:59 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 6:59 [PATCH] various doc updates ahead of 1.5.0 Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).