From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
To: Eric Wong <e@yhbt.net>
Cc: meta@public-inbox.org
Subject: Re: setting up mailman-to-atom-converter then atom-to-public-inbox
Date: Tue, 4 Feb 2020 21:49:56 +0000 [thread overview]
Message-ID: <CAPweEDw0yRq2tKRcumuxH523PLkvRb3saXVzM86LXYQReYKPaQ@mail.gmail.com> (raw)
In-Reply-To: <20200204205541.GB27797@dcvr>
On Tue, Feb 4, 2020 at 9:05 PM Eric Wong <e@yhbt.net> wrote:
> Luke Kenneth Casson Leighton <lkcl@lkcl.net> wrote:
> > hi, just as the subject says, i'm currently modifying mailman_rss to
> > support atom and would like to set it up on libre-soc.org shortly.
> >
> > firstly: very grateful that public-inbox even exists, it is kinda
> > important to have really, really simple offline archives of project
> > mailing lists.
>
> You're welcome :>
>
> > second: i have no idea how to go about setting it up :)
>
> Once installed, "public-inbox-init" should get you started.
> From there, you can decide how you want to inject mail into
> it...
ahh exxcellent.... err... err.... man public-inbox-config only lists
Maildir not mbox?
> We should be able to clarify anything else here, just ask,
> and we can try to make the docs better :>
> Fwiw, I also started working on a mail flow diagram yesterday,
> which may help:
>
> https://public-inbox.org/flow.txt
excellent. very useful.
> > third: sigh, i have two unknowns (three), because i am actually
> > modifying mailman_rss to support atom, *and* i would prefer not to
> > overload my server by splitting up the creation of atom feeds into
> > multiple separate processing sections (by month) *and* i have no idea
> > if public-inbox can support feeds-of-feeds.
>
> This is your Mailman server?
yes
> If so, mbox or Maildir archives
> would be MUCH easier to convert and it would preserve
> Message-Id, References, and In-Reply-To headers for proper
> message threading.
errr... errr doh! ok so the mbox archives are private under one
account and i need to publish them via... gitweb, so that's ok.
> public-inbox doesn't have any ability to parse Atom or RSS right
> now, it only generates Atom.
aw doh! that's where i got the impression i had to *read* the atom
feed (doh). well, i have some nice modifications to mailman_rss which
uses a generic "Feed" python module i found, i will publish later :)
> Parsing Atom (or RSS) would not preserve headers necessary for
> proper threading, since Atom threading headers (RFC4685) don't
> reliably map back to the aforementioned mail headers.
red herring....
> > to explain / unpack that: here's how i would envisage the workflow so
> > as to minimise the server load:
> >
> > * cron job goes through the monthly mailman archives *by month*
> > performing a re-creation *only* of the latest month's atom feed
> > * same cron job adds to a "global" atom file containing "links to the
> > monthly atom files"
> > * public-inbox sees that list-of-monthly-atom-files
> > * public-inbox walks the "tree" of monthly atom files, grabbing each one in turn
> > * public-inbox loads all messages from all monthly atom files.
>
> s/atom/mbox/ and that's close to a planned feature.
oh superb.
> I'm not sure why the global index file is necessary, though,
> since the tree structure is predictable (YYYY/MM or similar)
i was imagining that there would be a way to reduce network traffic
however i realise now that you're running the cron job actually on the
machine, directly on the .mbox file.
> public-inbox itself uses the Email::MIME module, which
> unfortunately requires reading an entire RFC-2822 message into
> memory (and we only work on one full message at a time).
*shudder* :)
> Beyond that, the message threading in the HTML output
> (non-recursive JWZ-variant) works on a batch of 1000 message
> skeletons (subset of headers), and few threads are that big.
yehyeh.
okaay, so i'm looking at man public-inbox-config, it says "only
supports Maildir". grep the source, there's something about
PublicInbox::Import.pm?
ngggh how am i going to get mbox files in / watched?
thanks eric.
l.
next prev parent reply other threads:[~2020-02-04 21:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-04 18:42 setting up mailman-to-atom-converter then atom-to-public-inbox Luke Kenneth Casson Leighton
2020-02-04 20:55 ` Eric Wong
2020-02-04 21:49 ` Luke Kenneth Casson Leighton [this message]
2020-02-04 22:14 ` Eric Wong
[not found] ` <CAPweEDy1qTK93pXDKdbT-HqJV184fH7x0hqqJYDTMv_nxvoKqQ@mail.gmail.com>
2020-02-05 0:10 ` Eric Wong
[not found] ` <CAPweEDyYA+38B4uc+stMpZ9q6CrHaaAAkkorCuH4ONHmhBXbXg@mail.gmail.com>
2020-02-05 0:43 ` Eric Wong
2020-02-05 1:02 ` Kyle Meyer
2020-02-05 1:04 ` Eric Wong
2020-03-10 0:07 ` setting up mailman2 and public-inbox Luke Kenneth Casson Leighton
2020-03-11 10:33 ` Eric Wong
2020-03-11 11:58 ` Luke Kenneth Casson Leighton
2020-03-11 12:47 ` Luke Kenneth Casson Leighton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPweEDw0yRq2tKRcumuxH523PLkvRb3saXVzM86LXYQReYKPaQ@mail.gmail.com \
--to=lkcl@lkcl.net \
--cc=e@yhbt.net \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).