user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Luke Kenneth Casson Leighton <>
To: Eric Wong <>
Subject: Re: setting up mailman-to-atom-converter then atom-to-public-inbox
Date: Tue, 4 Feb 2020 21:49:56 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20200204205541.GB27797@dcvr>

On Tue, Feb 4, 2020 at 9:05 PM Eric Wong <> wrote:

> Luke Kenneth Casson Leighton <> wrote:
> > hi, just as the subject says, i'm currently modifying mailman_rss to
> > support atom and would like to set it up on shortly.
> >
> > firstly: very grateful that public-inbox even exists, it is kinda
> > important to have really, really simple offline archives of project
> > mailing lists.
> You're welcome :>
> > second: i have no idea how to go about setting it up :)
> Once installed, "public-inbox-init" should get you started.
> From there, you can decide how you want to inject mail into
> it...

ahh exxcellent....  err... err.... man public-inbox-config only lists
Maildir not mbox?

> We should be able to clarify anything else here, just ask,
> and we can try to make the docs better :>
> Fwiw, I also started working on a mail flow diagram yesterday,
> which may help:

excellent.  very useful.

> > third: sigh, i have two unknowns (three), because i am actually
> > modifying mailman_rss to support atom, *and* i would prefer not to
> > overload my server by splitting up the creation of atom feeds into
> > multiple separate processing sections (by month) *and* i have no idea
> > if public-inbox can support feeds-of-feeds.
> This is your Mailman server?


> If so, mbox or Maildir archives
> would be MUCH easier to convert and it would preserve
> Message-Id, References, and In-Reply-To headers for proper
> message threading.

errr... errr doh!  ok so the mbox archives are private under one
account and i need to publish them via... gitweb, so that's ok.

> public-inbox doesn't have any ability to parse Atom or RSS right
> now, it only generates Atom.

aw doh!  that's where i got the impression i had to *read* the atom
feed (doh).  well, i have some nice modifications to mailman_rss which
uses a generic "Feed" python module i found, i will publish later :)

> Parsing Atom (or RSS) would not preserve headers necessary for
> proper threading, since Atom threading headers (RFC4685) don't
> reliably map back to the aforementioned mail headers.

red herring....

> > to explain / unpack that: here's how i would envisage the workflow so
> > as to minimise the server load:
> >
> > * cron job goes through the monthly mailman archives *by month*
> > performing a re-creation *only* of the latest month's atom feed
> > * same cron job adds to a "global" atom file containing "links to the
> > monthly atom files"
> > * public-inbox sees that list-of-monthly-atom-files
> > * public-inbox walks the "tree" of monthly atom files, grabbing each one in turn
> > * public-inbox loads all messages from all monthly atom files.
> s/atom/mbox/ and that's close to a planned feature.

oh superb.

> I'm not sure why the global index file is necessary, though,
> since the tree structure is predictable (YYYY/MM or similar)

i was imagining that there would be a way to reduce network traffic
however i realise now that you're running the cron job actually on the
machine, directly on the .mbox file.

> public-inbox itself uses the Email::MIME module, which
> unfortunately requires reading an entire RFC-2822 message into
> memory (and we only work on one full message at a time).

*shudder* :)

> Beyond that, the message threading in the HTML output
> (non-recursive JWZ-variant) works on a batch of 1000 message
> skeletons (subset of headers), and few threads are that big.


okaay, so i'm looking at man public-inbox-config, it says "only
supports Maildir".  grep the source, there's something about

ngggh how am i going to get mbox files in / watched?

thanks eric.


  reply	other threads:[~2020-02-04 21:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 18:42 Luke Kenneth Casson Leighton
2020-02-04 20:55 ` Eric Wong
2020-02-04 21:49   ` Luke Kenneth Casson Leighton [this message]
2020-02-04 22:14     ` Eric Wong
     [not found]       ` <>
2020-02-05  0:10         ` Eric Wong
     [not found]           ` <>
2020-02-05  0:43             ` Eric Wong
2020-02-05  1:02               ` Kyle Meyer
2020-02-05  1:04                 ` Eric Wong
2020-03-10  0:07   ` setting up mailman2 and public-inbox Luke Kenneth Casson Leighton
2020-03-11 10:33     ` Eric Wong
2020-03-11 11:58       ` Luke Kenneth Casson Leighton
2020-03-11 12:47         ` Luke Kenneth Casson Leighton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \
    --subject='Re: setting up mailman-to-atom-converter then atom-to-public-inbox' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).