about summary refs log tree commit homepage
path: root/lib/PublicInbox/MsgIter.pm
DateCommit message (Collapse)
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-12www: discard multipart parent on iteration
We're often iterating through messages while writing to another buffer in our WWW interface, causing memory usage to multiply. Since we know we won't need to keep the MIME object around in some cases, and can tell msg_iter to clobber the on-stack variable while it operates on subparts of multipart messages. With xt/mem-msgview.t switched to multipart from the previous commit, this shows a 13 MB memory reduction on that test.
2019-12-26msg_iter: provide means to stop using anonymous subs
And remove the last anonymous sub in SolverGit itself.
2019-12-19msgiter: msg_part_text returns undef on text/html
We want HTML parts to be downloadable, but not displayed as unreadable (but injection-safe) HTML source in our own web and Atom interfaces. This affects indexing, too, as HTML tags/comments won't be indexed anymore, but existing indices are only cleaned after --reindex. HTML-only mail won't be indexed at all, but we won't cross that bridge until somebody cares about that crap. We'll continue to actively discourage such waste of CPU cycles, bandwidth, cache and storage. Fixes: 7d82a8bc04ce2e68 (handle "multipart/mixed" messages which are not multipart')
2019-10-31msgiter: attempt to decode all text/* bodies
We want to index text/x-patch and text/x-diff, at least, since "git format-patch" can generate a patch series as attachments using --attach.
2019-10-31msgiter: do not assume UTF-8 if Email::MIME->body_str succeeds
ISO-2202-JP and other non-UTF-8 messages need to be displayed correctly. Fixes: 7d82a8bc04ce ('handle "multipart/mixed" messages which are not multipart')
2019-09-09run update-copyrights from gnulib for 2019
2019-01-09doc: various overview-level module comments
Hopefully this helps people familiarize themselves with the source code.
2018-12-30handle "multipart/mixed" messages which are not multipart
I've found two examples on https://lore.kernel.org/lkml/ where the messages declared themselves to be "multipart/mixed" but were actually plain text: <87llgalspt.fsf@free.fr> <200308111450.h7BEoOu20077@mail.osdl.org> With the mboxrd downloaded, mutt is able to view them without difficulty. Note: this change would require reindexing of Xapian to pick up the changes. But it's only two ancient messages, the first was resent by the original sender and the second is too old to be relevant.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-06-17msg_iter: support read-only elements
Apparently, it's possible to have read-only bodies in Email::MIME objects. Haven't gotten a chance to reliably reproduce it, though...
2016-05-19msg_iter: workaround broken Email::MIME versions
Email::MIME >= 1.923 and < 1.935 would drop too many newlines in attachments. This would lead to ugly text files without a proper trailing newline if using quoted-printable, 7bit, or 8bit. Attachments encoded with base64 were not affected. These versions of Email::MIME are widely available in Debian 8 (Jessie) and even Ubuntu LTS distros so we will need to support this workaround for a while.
2016-05-19msg_iter: new internal API for iterating through MIME
Unlike Email::MIME::walk_parts, this is non-recursive and gives depth + index offset information about the part for creating links for later retrieval It is intended for read-only access and changes are not propagated to the parent; however future versions of it may clobber bodies or the original version as it iterates to reduce memory overhead. It is intended for making it easy to locate attachments within a message in the WWW view.