From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 26E4C1F4B5; Tue, 12 Nov 2019 22:29:33 +0000 (UTC) Date: Tue, 12 Nov 2019 22:29:32 +0000 From: Eric Wong To: Florian Weimer Cc: meta@public-inbox.org Subject: Re: Archiving HTML mail Message-ID: <20191112222932.GA9643@dcvr> References: <87r22ddxly.fsf@mid.deneb.enyo.de> <20191112210923.GA9729@dcvr> <874kz8eqwf.fsf@mid.deneb.enyo.de> <20191112215307.GA20307@dcvr> <871rucda03.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <871rucda03.fsf@mid.deneb.enyo.de> List-Id: Florian Weimer wrote: > * Eric Wong: > > >> My feeling is that it would need some post-processing, maybe stripping > >> image links and forms (and Javascript of course). Plus the separate > >> domain thing for additional XSS protection (like bugzilla.mozilla.org > >> does, IIRC). But presumably you could put the entire list archive > >> under its own domain to avoid having to write code for that. > > > > That would mess up DKIM verifications if somebody is trying to > > verify archives. > > You have to rewrite the HTML parts anyway, to resolve RFC 2392 cid: > links, prior to handing them to web browsers. I don't think web > browsers support them. Neither over HTTP, nor browsing locally. Yeah. I guess it could be done on-the-fly at the WWW layer. Parsing HTML is crazy expensive, though :< > >> > Also, public-inbox-watch is designed to work in parallel with > >> > existing mailing lists. I archive several lists (including > >> > libc-alpha@sourceware and git@vger) this way with no special > >> > permissions or access aside from being a regular subscriber. > >> > >> I feel we need to change libc-alpha to accept text/html email. > > > > Given there's some cross-posting to vger lists which reject HTML, > > that could do more harm than good. > > Maybe. But do newcomers tend to cross-post that heavily? If they do, > that's probably another problem. *shrug* But I do wish it's easier to work and share ideas across different projects and loop in folks as needed. > > My goal is not just to get hackers into using plain-text mail, > > but having them influence non-hackers into using plain-text > > mail, too. > > On the other hand, if we reject their email, we lose a chance to > interact with them directly and influence them. Fwiw, the admins of that server do get the original HTML messages in ~/.public-inbox/emergency/ (or whatever PI_EMERGENCY is). emergency/ could be considered a "moderation queue" so the admins could send personalized replies to legitimate senders who got rejected. Such a message could be easier-to-digest than whatever postfix sends, even with the PublicInbox::Filter::Base rejection message. The emergency/ for public-inbox.org is 99.9% spam and I have a cronjob that removes messages after a few days. When somebody does send an HTML message to meta or test@public-inbox.org or another one of the lists I run, they usually figure out HTML is rejected and followup with a text message after a few minutes. That said, I don't attract a lot of users to any of my projects (I hate marketing and evangelism), so the folks that show up tend to be like-minded and willing to look past things like the "homepage" :>