user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Thomas Weißschuh" <thomas@t-8ch.de>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: Add "generator" information to HTML pages
Date: Sun, 8 Jan 2023 21:54:19 +0000	[thread overview]
Message-ID: <20230108215419.gcnbpk7er7f7davy@snowball.t-8ch.de> (raw)
In-Reply-To: <20230108205804.M144044@dcvr>

On Sun, Jan 08, 2023 at 08:58:04PM +0000, Eric Wong wrote:
> Thomas Weißschuh <thomas@t-8ch.de> wrote:
> > On Sun, Jan 08, 2023 at 07:47:38PM +0000, Eric Wong wrote:
> > > Thomas Weißschuh <thomas@t-8ch.de> wrote:
> > > > it would be nice if public-inbox could extend the HTML pages it
> > > > generates with the "generator" meta tag [0].
> > > > Especially the version would be useful.
> > > > 
> > > > This would help users during debugging to see the specific version of
> > > > public-inbox they are looking at.
> > > 
> > > What would users be debugging?
> > > Admins would be the only ones who care, I think...
> > 
> > Since recently my mails to linux-kernel@vger.kernel.org that should end
> > up on public-inbox on https://lore.kernel.org/lkml/ don't do so.
> > They are accepted by the mail server on vger.kernel.org but never end up
> > in the archives.
> > I suspect some interactions between b4 which is used to generate the
> > mails, the unicode characters in my name and public-inbox to be the
> > culprit.
> 
> Your mail seem fine to my server, but coming from an IPv6
> address has caused problems with some other servers in the past.
> Another potential thing might be your use of utf-8 in the From:
> header, while your Content-Type: is iso-8859-1 for the body.

I think I found the culprit. And it is indeed the b4 tool, or rather the
Python email library it is using.
Posting it here because you might know if this is standards conform or
if it would be reasonable to carry a workaround inside public-inbox.

When b4 passes the message to Pythons email.message.EmailMessage the
'To' header is just a long, unencoded string containing all recipients
and their unicode names.
EmailMessage then makes sure that this string conforms to legal email
header values. It performs linewrapping and the special header utf-8
encoding/escaping.

However IFF a header line contains unicode character and IFF the first
character of a linewrapped line is a comma (,) then that comma will also
be utf-8 escaped.

Example input:
01234567890123456789012345678901234567890123456789012345678901234567890123, ä

Example output
01234567890123456789012345678901234567890123456789012345678901234567890123
 =?utf-8?q?=2C?= =?utf-8?q?=C3=A4?=

 I expect this to be a bug in the python library but maybe it is
 correct.

> > This is what I wanted to reproduce locally, for which exact versions
> > would have been nice.
> 
> I remember Konstantin has cherry-picked some commits from
> public-inbox.git in the past, and I suspect he already
> has https://public-inbox.org/meta/20221124213155.M736847@dcvr/
> ("eml: header_raw converts octets to Perl UTF-8") for SMTPUTF8
> 
> One thing I wouldn't be opposed to doing is adding a way to
> download all loaded files in a tarball as a means for AGPL
> enforcement.  The tricky thing is those files may change on disk
> after loading (and often does in my case :x), so they'd need to
> be copied into stable storage at startup (and updated if there's
> lazy-loading).  Same security caveats apply, though.
> 
> > > I also don't like wasting memory+bandwidth on things most users
> > > won't see or care about.  This is especially true for stuff at
> > > the beginnning of the output since that's most likely to succeed
> > > in being transferred.
> > 
> > Fair enough.
> > The loading speed of public-inbox is really great, let's keep it that
> > way.
> 
> Good to know it's great for you.  It's still too slow for me,
> but I'm anti-consumerist and refuse to follow Moore's law :x

      reply	other threads:[~2023-01-09 14:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-08 19:04 Add "generator" information to HTML pages Thomas Weißschuh
2023-01-08 19:47 ` Eric Wong
2023-01-08 20:02   ` Thomas Weißschuh
2023-01-08 20:58     ` Eric Wong
2023-01-08 21:54       ` Thomas Weißschuh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230108215419.gcnbpk7er7f7davy@snowball.t-8ch.de \
    --to=thomas@t-8ch.de \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).