From: Luke Kenneth Casson Leighton <lkcl@lkcl.net> To: Eric Wong <e@yhbt.net> Cc: meta@public-inbox.org Subject: Re: setting up mailman-to-atom-converter then atom-to-public-inbox Date: Tue, 4 Feb 2020 21:49:56 +0000 [thread overview] Message-ID: <CAPweEDw0yRq2tKRcumuxH523PLkvRb3saXVzM86LXYQReYKPaQ@mail.gmail.com> (raw) In-Reply-To: <20200204205541.GB27797@dcvr> On Tue, Feb 4, 2020 at 9:05 PM Eric Wong <e@yhbt.net> wrote: > Luke Kenneth Casson Leighton <lkcl@lkcl.net> wrote: > > hi, just as the subject says, i'm currently modifying mailman_rss to > > support atom and would like to set it up on libre-soc.org shortly. > > > > firstly: very grateful that public-inbox even exists, it is kinda > > important to have really, really simple offline archives of project > > mailing lists. > > You're welcome :> > > > second: i have no idea how to go about setting it up :) > > Once installed, "public-inbox-init" should get you started. > From there, you can decide how you want to inject mail into > it... ahh exxcellent.... err... err.... man public-inbox-config only lists Maildir not mbox? > We should be able to clarify anything else here, just ask, > and we can try to make the docs better :> > Fwiw, I also started working on a mail flow diagram yesterday, > which may help: > > https://public-inbox.org/flow.txt excellent. very useful. > > third: sigh, i have two unknowns (three), because i am actually > > modifying mailman_rss to support atom, *and* i would prefer not to > > overload my server by splitting up the creation of atom feeds into > > multiple separate processing sections (by month) *and* i have no idea > > if public-inbox can support feeds-of-feeds. > > This is your Mailman server? yes > If so, mbox or Maildir archives > would be MUCH easier to convert and it would preserve > Message-Id, References, and In-Reply-To headers for proper > message threading. errr... errr doh! ok so the mbox archives are private under one account and i need to publish them via... gitweb, so that's ok. > public-inbox doesn't have any ability to parse Atom or RSS right > now, it only generates Atom. aw doh! that's where i got the impression i had to *read* the atom feed (doh). well, i have some nice modifications to mailman_rss which uses a generic "Feed" python module i found, i will publish later :) > Parsing Atom (or RSS) would not preserve headers necessary for > proper threading, since Atom threading headers (RFC4685) don't > reliably map back to the aforementioned mail headers. red herring.... > > to explain / unpack that: here's how i would envisage the workflow so > > as to minimise the server load: > > > > * cron job goes through the monthly mailman archives *by month* > > performing a re-creation *only* of the latest month's atom feed > > * same cron job adds to a "global" atom file containing "links to the > > monthly atom files" > > * public-inbox sees that list-of-monthly-atom-files > > * public-inbox walks the "tree" of monthly atom files, grabbing each one in turn > > * public-inbox loads all messages from all monthly atom files. > > s/atom/mbox/ and that's close to a planned feature. oh superb. > I'm not sure why the global index file is necessary, though, > since the tree structure is predictable (YYYY/MM or similar) i was imagining that there would be a way to reduce network traffic however i realise now that you're running the cron job actually on the machine, directly on the .mbox file. > public-inbox itself uses the Email::MIME module, which > unfortunately requires reading an entire RFC-2822 message into > memory (and we only work on one full message at a time). *shudder* :) > Beyond that, the message threading in the HTML output > (non-recursive JWZ-variant) works on a batch of 1000 message > skeletons (subset of headers), and few threads are that big. yehyeh. okaay, so i'm looking at man public-inbox-config, it says "only supports Maildir". grep the source, there's something about PublicInbox::Import.pm? ngggh how am i going to get mbox files in / watched? thanks eric. l.
next prev parent reply other threads:[~2020-02-04 21:50 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-04 18:42 Luke Kenneth Casson Leighton 2020-02-04 20:55 ` Eric Wong 2020-02-04 21:49 ` Luke Kenneth Casson Leighton [this message] 2020-02-04 22:14 ` Eric Wong [not found] ` <CAPweEDy1qTK93pXDKdbT-HqJV184fH7x0hqqJYDTMv_nxvoKqQ@mail.gmail.com> 2020-02-05 0:10 ` Eric Wong [not found] ` <CAPweEDyYA+38B4uc+stMpZ9q6CrHaaAAkkorCuH4ONHmhBXbXg@mail.gmail.com> 2020-02-05 0:43 ` Eric Wong 2020-02-05 1:02 ` Kyle Meyer 2020-02-05 1:04 ` Eric Wong 2020-03-10 0:07 ` setting up mailman2 and public-inbox Luke Kenneth Casson Leighton 2020-03-11 10:33 ` Eric Wong 2020-03-11 11:58 ` Luke Kenneth Casson Leighton 2020-03-11 12:47 ` Luke Kenneth Casson Leighton
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: https://public-inbox.org/README * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAPweEDw0yRq2tKRcumuxH523PLkvRb3saXVzM86LXYQReYKPaQ@mail.gmail.com \ --to=lkcl@lkcl.net \ --cc=e@yhbt.net \ --cc=meta@public-inbox.org \ --subject='Re: setting up mailman-to-atom-converter then atom-to-public-inbox' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/public-inbox.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).