user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH] msgiter: msg_part_text returns undef on text/html
Date: Wed, 18 Dec 2019 09:14:43 +0000	[thread overview]
Message-ID: <20191218091443.12551-1-e@80x24.org> (raw)

We want HTML parts to be downloadable, but not displayed as
unreadable (but injection-safe) HTML source in our own web
and Atom interfaces.

This affects indexing, too, as HTML tags/comments won't be
indexed anymore, but existing indices are only cleaned after
--reindex.  HTML-only mail won't be indexed at all, but we won't
cross that bridge until somebody cares about that crap.   We'll
continue to actively discourage such waste of CPU cycles,
bandwidth, cache and storage.

Fixes: 7d82a8bc04ce2e68 (handle "multipart/mixed" messages which are not multipart')
---
 lib/PublicInbox/MsgIter.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index d9df32ab..6453d9f1 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -38,6 +38,11 @@ sub msg_iter ($$) {
 sub msg_part_text ($$) {
 	my ($part, $ct) = @_;
 
+	# TODO: we may offer a separate sub for people who need to index
+	# HTML-only mail, but the majority of HTML mail is multipart/alternative
+	# with a text part which we don't have to waste cycles decoding
+	return if $ct =~ m!\btext/x?html\b!;
+
 	my $s = eval { $part->body_str };
 	my $err = $@;
 

                 reply	other threads:[~2019-12-18  9:14 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191218091443.12551-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).