user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Add "generator" information to HTML pages
@ 2023-01-08 19:04 Thomas Weißschuh
  2023-01-08 19:47 ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Weißschuh @ 2023-01-08 19:04 UTC (permalink / raw)
  To: meta

Hi,

it would be nice if public-inbox could extend the HTML pages it
generates with the "generator" meta tag [0].
Especially the version would be useful.

This would help users during debugging to see the specific version of
public-inbox they are looking at.

For example:

<head>
  <title>Some page</title>
  <meta name="generator" content="public-inbox 1.9.0" />
</head>

[0] https://html.spec.whatwg.org/multipage/semantics.html#meta-generator

Thanks,
Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Add "generator" information to HTML pages
  2023-01-08 19:04 Add "generator" information to HTML pages Thomas Weißschuh
@ 2023-01-08 19:47 ` Eric Wong
  2023-01-08 20:02   ` Thomas Weißschuh
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2023-01-08 19:47 UTC (permalink / raw)
  To: Thomas Weißschuh; +Cc: meta

Thomas Weißschuh <thomas@t-8ch.de> wrote:
> Hi,
> 
> it would be nice if public-inbox could extend the HTML pages it
> generates with the "generator" meta tag [0].
> Especially the version would be useful.
> 
> This would help users during debugging to see the specific version of
> public-inbox they are looking at.

What would users be debugging?
Admins would be the only ones who care, I think...

Version info becomes worthless if an admin blocks/alters certain
endpoints via nginx/varnish or just editing the code.

> For example:
> 
> <head>
>   <title>Some page</title>
>   <meta name="generator" content="public-inbox 1.9.0" />
> </head>

I prefer to disclose as little information as possible in case
vulnerabilities are found.  Alone, security by obscurity doesn't work,
but obscurity does make things more difficult for attackers
(same reason camouflage exists).

I also don't like wasting memory+bandwidth on things most users
won't see or care about.  This is especially true for stuff at
the beginnning of the output since that's most likely to succeed
in being transferred.

> [0] https://html.spec.whatwg.org/multipage/semantics.html#meta-generator
> 
> Thanks,
> Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Add "generator" information to HTML pages
  2023-01-08 19:47 ` Eric Wong
@ 2023-01-08 20:02   ` Thomas Weißschuh
  2023-01-08 20:58     ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Weißschuh @ 2023-01-08 20:02 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Hi Eric,

On Sun, Jan 08, 2023 at 07:47:38PM +0000, Eric Wong wrote:
> Thomas Weißschuh <thomas@t-8ch.de> wrote:
> > Hi,
> > 
> > it would be nice if public-inbox could extend the HTML pages it
> > generates with the "generator" meta tag [0].
> > Especially the version would be useful.
> > 
> > This would help users during debugging to see the specific version of
> > public-inbox they are looking at.
> 
> What would users be debugging?
> Admins would be the only ones who care, I think...

Since recently my mails to linux-kernel@vger.kernel.org that should end
up on public-inbox on https://lore.kernel.org/lkml/ don't do so.
They are accepted by the mail server on vger.kernel.org but never end up
in the archives.
I suspect some interactions between b4 which is used to generate the
mails, the unicode characters in my name and public-inbox to be the
culprit.

This is what I wanted to reproduce locally, for which exact versions
would have been nice.

> Version info becomes worthless if an admin blocks/alters certain
> endpoints via nginx/varnish or just editing the code.
> 
> > For example:
> > 
> > <head>
> >   <title>Some page</title>
> >   <meta name="generator" content="public-inbox 1.9.0" />
> > </head>
> 
> I prefer to disclose as little information as possible in case
> vulnerabilities are found.  Alone, security by obscurity doesn't work,
> but obscurity does make things more difficult for attackers
> (same reason camouflage exists).
> 
> I also don't like wasting memory+bandwidth on things most users
> won't see or care about.  This is especially true for stuff at
> the beginnning of the output since that's most likely to succeed
> in being transferred.

Fair enough.
The loading speed of public-inbox is really great, let's keep it that
way.

> > [0] https://html.spec.whatwg.org/multipage/semantics.html#meta-generator

@Konstantin, if you read this:
I'll send a proper bugreport to tools@linux.kernel.org soonish.

Thanks,
Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Add "generator" information to HTML pages
  2023-01-08 20:02   ` Thomas Weißschuh
@ 2023-01-08 20:58     ` Eric Wong
  2023-01-08 21:54       ` Thomas Weißschuh
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2023-01-08 20:58 UTC (permalink / raw)
  To: Thomas Weißschuh; +Cc: meta

Thomas Weißschuh <thomas@t-8ch.de> wrote:
> On Sun, Jan 08, 2023 at 07:47:38PM +0000, Eric Wong wrote:
> > Thomas Weißschuh <thomas@t-8ch.de> wrote:
> > > Hi,
> > > 
> > > it would be nice if public-inbox could extend the HTML pages it
> > > generates with the "generator" meta tag [0].
> > > Especially the version would be useful.
> > > 
> > > This would help users during debugging to see the specific version of
> > > public-inbox they are looking at.
> > 
> > What would users be debugging?
> > Admins would be the only ones who care, I think...
> 
> Since recently my mails to linux-kernel@vger.kernel.org that should end
> up on public-inbox on https://lore.kernel.org/lkml/ don't do so.
> They are accepted by the mail server on vger.kernel.org but never end up
> in the archives.
> I suspect some interactions between b4 which is used to generate the
> mails, the unicode characters in my name and public-inbox to be the
> culprit.

Your mail seem fine to my server, but coming from an IPv6
address has caused problems with some other servers in the past.
Another potential thing might be your use of utf-8 in the From:
header, while your Content-Type: is iso-8859-1 for the body.

> This is what I wanted to reproduce locally, for which exact versions
> would have been nice.

I remember Konstantin has cherry-picked some commits from
public-inbox.git in the past, and I suspect he already
has https://public-inbox.org/meta/20221124213155.M736847@dcvr/
("eml: header_raw converts octets to Perl UTF-8") for SMTPUTF8

One thing I wouldn't be opposed to doing is adding a way to
download all loaded files in a tarball as a means for AGPL
enforcement.  The tricky thing is those files may change on disk
after loading (and often does in my case :x), so they'd need to
be copied into stable storage at startup (and updated if there's
lazy-loading).  Same security caveats apply, though.

> > I also don't like wasting memory+bandwidth on things most users
> > won't see or care about.  This is especially true for stuff at
> > the beginnning of the output since that's most likely to succeed
> > in being transferred.
> 
> Fair enough.
> The loading speed of public-inbox is really great, let's keep it that
> way.

Good to know it's great for you.  It's still too slow for me,
but I'm anti-consumerist and refuse to follow Moore's law :x

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Add "generator" information to HTML pages
  2023-01-08 20:58     ` Eric Wong
@ 2023-01-08 21:54       ` Thomas Weißschuh
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Weißschuh @ 2023-01-08 21:54 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Sun, Jan 08, 2023 at 08:58:04PM +0000, Eric Wong wrote:
> Thomas Weißschuh <thomas@t-8ch.de> wrote:
> > On Sun, Jan 08, 2023 at 07:47:38PM +0000, Eric Wong wrote:
> > > Thomas Weißschuh <thomas@t-8ch.de> wrote:
> > > > it would be nice if public-inbox could extend the HTML pages it
> > > > generates with the "generator" meta tag [0].
> > > > Especially the version would be useful.
> > > > 
> > > > This would help users during debugging to see the specific version of
> > > > public-inbox they are looking at.
> > > 
> > > What would users be debugging?
> > > Admins would be the only ones who care, I think...
> > 
> > Since recently my mails to linux-kernel@vger.kernel.org that should end
> > up on public-inbox on https://lore.kernel.org/lkml/ don't do so.
> > They are accepted by the mail server on vger.kernel.org but never end up
> > in the archives.
> > I suspect some interactions between b4 which is used to generate the
> > mails, the unicode characters in my name and public-inbox to be the
> > culprit.
> 
> Your mail seem fine to my server, but coming from an IPv6
> address has caused problems with some other servers in the past.
> Another potential thing might be your use of utf-8 in the From:
> header, while your Content-Type: is iso-8859-1 for the body.

I think I found the culprit. And it is indeed the b4 tool, or rather the
Python email library it is using.
Posting it here because you might know if this is standards conform or
if it would be reasonable to carry a workaround inside public-inbox.

When b4 passes the message to Pythons email.message.EmailMessage the
'To' header is just a long, unencoded string containing all recipients
and their unicode names.
EmailMessage then makes sure that this string conforms to legal email
header values. It performs linewrapping and the special header utf-8
encoding/escaping.

However IFF a header line contains unicode character and IFF the first
character of a linewrapped line is a comma (,) then that comma will also
be utf-8 escaped.

Example input:
01234567890123456789012345678901234567890123456789012345678901234567890123, ä

Example output
01234567890123456789012345678901234567890123456789012345678901234567890123
 =?utf-8?q?=2C?= =?utf-8?q?=C3=A4?=

 I expect this to be a bug in the python library but maybe it is
 correct.

> > This is what I wanted to reproduce locally, for which exact versions
> > would have been nice.
> 
> I remember Konstantin has cherry-picked some commits from
> public-inbox.git in the past, and I suspect he already
> has https://public-inbox.org/meta/20221124213155.M736847@dcvr/
> ("eml: header_raw converts octets to Perl UTF-8") for SMTPUTF8
> 
> One thing I wouldn't be opposed to doing is adding a way to
> download all loaded files in a tarball as a means for AGPL
> enforcement.  The tricky thing is those files may change on disk
> after loading (and often does in my case :x), so they'd need to
> be copied into stable storage at startup (and updated if there's
> lazy-loading).  Same security caveats apply, though.
> 
> > > I also don't like wasting memory+bandwidth on things most users
> > > won't see or care about.  This is especially true for stuff at
> > > the beginnning of the output since that's most likely to succeed
> > > in being transferred.
> > 
> > Fair enough.
> > The loading speed of public-inbox is really great, let's keep it that
> > way.
> 
> Good to know it's great for you.  It's still too slow for me,
> but I'm anti-consumerist and refuse to follow Moore's law :x

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-09 14:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-08 19:04 Add "generator" information to HTML pages Thomas Weißschuh
2023-01-08 19:47 ` Eric Wong
2023-01-08 20:02   ` Thomas Weißschuh
2023-01-08 20:58     ` Eric Wong
2023-01-08 21:54       ` Thomas Weißschuh

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).