user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
To: Eric Wong <e@yhbt.net>
Cc: meta@public-inbox.org
Subject: Re: setting up mailman-to-atom-converter then atom-to-public-inbox
Date: Tue, 4 Feb 2020 21:49:56 +0000	[thread overview]
Message-ID: <CAPweEDw0yRq2tKRcumuxH523PLkvRb3saXVzM86LXYQReYKPaQ@mail.gmail.com> (raw)
In-Reply-To: <20200204205541.GB27797@dcvr>

On Tue, Feb 4, 2020 at 9:05 PM Eric Wong <e@yhbt.net> wrote:

> Luke Kenneth Casson Leighton <lkcl@lkcl.net> wrote:
> > hi, just as the subject says, i'm currently modifying mailman_rss to
> > support atom and would like to set it up on libre-soc.org shortly.
> >
> > firstly: very grateful that public-inbox even exists, it is kinda
> > important to have really, really simple offline archives of project
> > mailing lists.
>
> You're welcome :>
>
> > second: i have no idea how to go about setting it up :)
>
> Once installed, "public-inbox-init" should get you started.
> From there, you can decide how you want to inject mail into
> it...

ahh exxcellent....  err... err.... man public-inbox-config only lists
Maildir not mbox?

> We should be able to clarify anything else here, just ask,
> and we can try to make the docs better :>
> Fwiw, I also started working on a mail flow diagram yesterday,
> which may help:
>
>         https://public-inbox.org/flow.txt

excellent.  very useful.

> > third: sigh, i have two unknowns (three), because i am actually
> > modifying mailman_rss to support atom, *and* i would prefer not to
> > overload my server by splitting up the creation of atom feeds into
> > multiple separate processing sections (by month) *and* i have no idea
> > if public-inbox can support feeds-of-feeds.
>
> This is your Mailman server?

yes

> If so, mbox or Maildir archives
> would be MUCH easier to convert and it would preserve
> Message-Id, References, and In-Reply-To headers for proper
> message threading.

errr... errr doh!  ok so the mbox archives are private under one
account and i need to publish them via... gitweb, so that's ok.

> public-inbox doesn't have any ability to parse Atom or RSS right
> now, it only generates Atom.

aw doh!  that's where i got the impression i had to *read* the atom
feed (doh).  well, i have some nice modifications to mailman_rss which
uses a generic "Feed" python module i found, i will publish later :)

> Parsing Atom (or RSS) would not preserve headers necessary for
> proper threading, since Atom threading headers (RFC4685) don't
> reliably map back to the aforementioned mail headers.

red herring....

> > to explain / unpack that: here's how i would envisage the workflow so
> > as to minimise the server load:
> >
> > * cron job goes through the monthly mailman archives *by month*
> > performing a re-creation *only* of the latest month's atom feed
> > * same cron job adds to a "global" atom file containing "links to the
> > monthly atom files"
> > * public-inbox sees that list-of-monthly-atom-files
> > * public-inbox walks the "tree" of monthly atom files, grabbing each one in turn
> > * public-inbox loads all messages from all monthly atom files.
>
> s/atom/mbox/ and that's close to a planned feature.

oh superb.

> I'm not sure why the global index file is necessary, though,
> since the tree structure is predictable (YYYY/MM or similar)

i was imagining that there would be a way to reduce network traffic
however i realise now that you're running the cron job actually on the
machine, directly on the .mbox file.

> public-inbox itself uses the Email::MIME module, which
> unfortunately requires reading an entire RFC-2822 message into
> memory (and we only work on one full message at a time).

*shudder* :)

> Beyond that, the message threading in the HTML output
> (non-recursive JWZ-variant) works on a batch of 1000 message
> skeletons (subset of headers), and few threads are that big.

yehyeh.

okaay, so i'm looking at man public-inbox-config, it says "only
supports Maildir".  grep the source, there's something about
PublicInbox::Import.pm?

ngggh how am i going to get mbox files in / watched?

thanks eric.

l.

  reply	other threads:[~2020-02-04 21:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 18:42 setting up mailman-to-atom-converter then atom-to-public-inbox Luke Kenneth Casson Leighton
2020-02-04 20:55 ` Eric Wong
2020-02-04 21:49   ` Luke Kenneth Casson Leighton [this message]
2020-02-04 22:14     ` Eric Wong
     [not found]       ` <CAPweEDy1qTK93pXDKdbT-HqJV184fH7x0hqqJYDTMv_nxvoKqQ@mail.gmail.com>
2020-02-05  0:10         ` Eric Wong
     [not found]           ` <CAPweEDyYA+38B4uc+stMpZ9q6CrHaaAAkkorCuH4ONHmhBXbXg@mail.gmail.com>
2020-02-05  0:43             ` Eric Wong
2020-02-05  1:02               ` Kyle Meyer
2020-02-05  1:04                 ` Eric Wong
2020-03-10  0:07   ` setting up mailman2 and public-inbox Luke Kenneth Casson Leighton
2020-03-11 10:33     ` Eric Wong
2020-03-11 11:58       ` Luke Kenneth Casson Leighton
2020-03-11 12:47         ` Luke Kenneth Casson Leighton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPweEDw0yRq2tKRcumuxH523PLkvRb3saXVzM86LXYQReYKPaQ@mail.gmail.com \
    --to=lkcl@lkcl.net \
    --cc=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).