Date | Commit message (Collapse) |
|
The regexp in split_quotes relies on the presence of a
final "\n", so add it wherever we need to instead of
making it the responsibility of every caller.
This probably doesn't matter in practice since every
email seems to have a "\n" as the final byte (due to
the way SMTP works), but maybe there's some odd ones
that'll get imported via lei.
|
|
Some poorly-configured MUAs will send application/octet-stream
even for text-only attachments. We can't make expect all MUAs
are configured with proper MIME types, and there is plenty of
historical mail that falls into this unfortunate criteria.
v2: simplify the check and ensures returned text is Perl "utf8"
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Instead of counts starting at 0, we start the single-part
message at 1 like we do with subparts of a multipart message.
This will make it easier to map offsets for "BODY[$SECTION]"
when using IMAP FETCH, since $SECTION must contain non-zero
numbers according to RFC 3501.
This doesn't make any difference for WWW URLs, since single part
messages cannot have downloadable attachments.
|
|
Email::MIME never supported this properly, but there's real
instances of forwarded messages as message/rfc822 attachments.
message/news is legacy thing which we'll see in archives, and
message/global appears to be the new thing.
gmime also supports message/rfc2822, so we'll support it anyways
despite lacking other evidence of its existence.
Existing attachments remain downloadable as a whole message,
but individual attachments of subparts are now downloadable
and can be displayed in HTML, too.
Furthermore, ensure Xapian can now search for common headers
inside those messages as well as the message bodies.
|
|
This doesn't make any difference for most multipart
messages (or any single part messages). However,
this starts having space savings when parts start
nesting.
It also slightly simplifies callers.
|
|
The reliance on Email::MIME->subparts is a tad inefficient with
a work-in-progress module to replace Email::MIME. So move
towards using ->each_part as a class-specific iterator which can
take advantage of more class-specific optimizations in the
yet-to-be-revealed PublicInbox::Eml and PublicInbox::Gmime
classes.
The msg_iter() sub remains for compatibility with existing
3rd-party scripts/modules which use our small public Perl API
and Email::MIME.
|
|
These seem mostly harmless since Perl will just truncate the
match and start a new one on a newline boundary in our case.
The only downside is we'd end up with redundant <span> tags in
HTML.
Limiting the number of line matched ourselves with `{1,$NUM}'
doesn't seem prudent since lines vary in length, so we continue
to defer the job of limiting matches to the Perl regexp engine.
I've noticed this warning in practice on 100K+ line patches to
locale data.
|
|
I didn't wait until September to do it, this year!
|
|
We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.
With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
|
|
And remove the last anonymous sub in SolverGit itself.
|
|
We want HTML parts to be downloadable, but not displayed as
unreadable (but injection-safe) HTML source in our own web
and Atom interfaces.
This affects indexing, too, as HTML tags/comments won't be
indexed anymore, but existing indices are only cleaned after
--reindex. HTML-only mail won't be indexed at all, but we won't
cross that bridge until somebody cares about that crap. We'll
continue to actively discourage such waste of CPU cycles,
bandwidth, cache and storage.
Fixes: 7d82a8bc04ce2e68 (handle "multipart/mixed" messages which are not multipart')
|
|
We want to index text/x-patch and text/x-diff, at least,
since "git format-patch" can generate a patch series as
attachments using --attach.
|
|
ISO-2202-JP and other non-UTF-8 messages need to be displayed
correctly.
Fixes: 7d82a8bc04ce ('handle "multipart/mixed" messages which are not multipart')
|
|
|
|
Hopefully this helps people familiarize themselves with
the source code.
|
|
I've found two examples on https://lore.kernel.org/lkml/
where the messages declared themselves to be "multipart/mixed"
but were actually plain text:
<87llgalspt.fsf@free.fr>
<200308111450.h7BEoOu20077@mail.osdl.org>
With the mboxrd downloaded, mutt is able to view them without
difficulty.
Note: this change would require reindexing of Xapian to pick up
the changes. But it's only two ancient messages, the first was
resent by the original sender and the second is too old to be
relevant.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
This should fix problems with multipart messages where
text/plain parts lack a header.
cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
refs/pull/28/head
In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.
|
|
Apparently, it's possible to have read-only bodies in
Email::MIME objects. Haven't gotten a chance to reliably
reproduce it, though...
|
|
Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments. This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit. Attachments encoded with base64 were not affected.
These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.
|
|
Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval
It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.
It is intended for making it easy to locate attachments within a
message in the WWW view.
|