From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS20860 217.147.80.0/20 X-Spam-Status: No, score=-3.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lkcl.net (lkcl.net [217.147.94.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id D85931F46C for ; Tue, 4 Feb 2020 21:50:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lkcl.net; s=201607131; h=Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version; bh=K0z421Lotp8UQ2FIr9tnzy8ewXILBFounQSj7knjj2E=; b=IBzmHVFFUB6V7vyOSq5r5qCiMg78amJzHFI46XVfHyh8tMo8OAf87qIA00Yd3xaGgQZ6ZqZvzJOJMmSPh6R+o6slNAU1CoWP19cPPY76ZxlnqrHmdbrfaoNgqiWLuqwlUVa5gm406wRyWxkteVAWLclOeVz0c8GbLBlie4qmcyQ=; Received: from mail-lj1-f171.google.com ([209.85.208.171]) by lkcl.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1iz65V-0003iE-1W for meta@public-inbox.org; Tue, 04 Feb 2020 21:50:29 +0000 Received: by mail-lj1-f171.google.com with SMTP id a13so204473ljm.10 for ; Tue, 04 Feb 2020 13:50:13 -0800 (PST) X-Gm-Message-State: APjAAAX8nn0LqjliXrRPpP/i91MW9lrco8NS8aDGj48kDxllsmdGvT8+ HIpUnSaUuYXKJAsCzcFOky4Tc1e9RIJn6BroJ6E= X-Google-Smtp-Source: APXvYqw5dtVzwsAhC3d0u8xd8sKwcIVE5r31ga4YhWMuApdUz7yPFcuLn57T5MYEGAdtneYWqWEmgETVM+WE9hKjzN0= X-Received: by 2002:a2e:914d:: with SMTP id q13mr18289111ljg.198.1580853008046; Tue, 04 Feb 2020 13:50:08 -0800 (PST) MIME-Version: 1.0 References: <20200204205541.GB27797@dcvr> In-Reply-To: <20200204205541.GB27797@dcvr> From: Luke Kenneth Casson Leighton Date: Tue, 4 Feb 2020 21:49:56 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: setting up mailman-to-atom-converter then atom-to-public-inbox To: Eric Wong Cc: meta@public-inbox.org Content-Type: text/plain; charset="UTF-8" List-Id: On Tue, Feb 4, 2020 at 9:05 PM Eric Wong wrote: > Luke Kenneth Casson Leighton wrote: > > hi, just as the subject says, i'm currently modifying mailman_rss to > > support atom and would like to set it up on libre-soc.org shortly. > > > > firstly: very grateful that public-inbox even exists, it is kinda > > important to have really, really simple offline archives of project > > mailing lists. > > You're welcome :> > > > second: i have no idea how to go about setting it up :) > > Once installed, "public-inbox-init" should get you started. > From there, you can decide how you want to inject mail into > it... ahh exxcellent.... err... err.... man public-inbox-config only lists Maildir not mbox? > We should be able to clarify anything else here, just ask, > and we can try to make the docs better :> > Fwiw, I also started working on a mail flow diagram yesterday, > which may help: > > https://public-inbox.org/flow.txt excellent. very useful. > > third: sigh, i have two unknowns (three), because i am actually > > modifying mailman_rss to support atom, *and* i would prefer not to > > overload my server by splitting up the creation of atom feeds into > > multiple separate processing sections (by month) *and* i have no idea > > if public-inbox can support feeds-of-feeds. > > This is your Mailman server? yes > If so, mbox or Maildir archives > would be MUCH easier to convert and it would preserve > Message-Id, References, and In-Reply-To headers for proper > message threading. errr... errr doh! ok so the mbox archives are private under one account and i need to publish them via... gitweb, so that's ok. > public-inbox doesn't have any ability to parse Atom or RSS right > now, it only generates Atom. aw doh! that's where i got the impression i had to *read* the atom feed (doh). well, i have some nice modifications to mailman_rss which uses a generic "Feed" python module i found, i will publish later :) > Parsing Atom (or RSS) would not preserve headers necessary for > proper threading, since Atom threading headers (RFC4685) don't > reliably map back to the aforementioned mail headers. red herring.... > > to explain / unpack that: here's how i would envisage the workflow so > > as to minimise the server load: > > > > * cron job goes through the monthly mailman archives *by month* > > performing a re-creation *only* of the latest month's atom feed > > * same cron job adds to a "global" atom file containing "links to the > > monthly atom files" > > * public-inbox sees that list-of-monthly-atom-files > > * public-inbox walks the "tree" of monthly atom files, grabbing each one in turn > > * public-inbox loads all messages from all monthly atom files. > > s/atom/mbox/ and that's close to a planned feature. oh superb. > I'm not sure why the global index file is necessary, though, > since the tree structure is predictable (YYYY/MM or similar) i was imagining that there would be a way to reduce network traffic however i realise now that you're running the cron job actually on the machine, directly on the .mbox file. > public-inbox itself uses the Email::MIME module, which > unfortunately requires reading an entire RFC-2822 message into > memory (and we only work on one full message at a time). *shudder* :) > Beyond that, the message threading in the HTML output > (non-recursive JWZ-variant) works on a batch of 1000 message > skeletons (subset of headers), and few threads are that big. yehyeh. okaay, so i'm looking at man public-inbox-config, it says "only supports Maildir". grep the source, there's something about PublicInbox::Import.pm? ngggh how am i going to get mbox files in / watched? thanks eric. l.