From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 3EEDC1F6C1; Sun, 21 Aug 2016 12:08:52 +0000 (UTC) Date: Sun, 21 Aug 2016 12:08:52 +0000 From: Eric Wong To: "W. Trevor King" Cc: notmuch@notmuchmail.org, David Bremner , Steven Allen , Tomi Ollila , Carl Worth , meta@public-inbox.org Subject: Re: Mail archives in Git using ssoma (Docker image) Message-ID: <20160821120852.GA12964@dcvr> References: <20141107190321.GL23609@odin.tremily.us> <20160821043631.GA2338@odin.tremily.us> <20160821094833.GB2338@odin.tremily.us> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20160821094833.GB2338@odin.tremily.us> List-Id: +Cc meta@public-inbox.org "W. Trevor King" wrote: > On Sat, Aug 20, 2016 at 09:36:31PM -0700, W. Trevor King wrote: > > [2]: git://tremily.us/notmuch-archives.git Cool! > This is the ssoma archive (with the data in it). I just set up a > basic HTTP archive (following [1]) based on a Docker image [2] (Gentoo > doesn't package all the Perl dependencies public-inbox needs). Ugh, that sucks (sorry, not a fan of Docker). What's missing from Gentoo? It should be easy to copy just the necessary .pm files and use PERL5LIB environment to point to the correct path (man perlrun). I'm conciously avoiding XS (compiled) extensions to make installation/distribution easier. > Dockerfile for rebuilding the image is in [2]. I'm currently hosting > the archives (HTTP only) at [3]. Spinning that up from the Docker > image looks like: > > $ mkdir srv > $ git clone --bare git://tremily.us/notmuch-archives.git srv/notmuch > $ echo 'Notmuch -- Just an email system' >srv/notmuch.git/description > $ git config -f srv/notmuch.git/config publicinbox.http http://tremily.us > $ git config -f srv/notmuch.git/config publicinbox.email notmuch@notmuchmail.org That should probably be: ; based on your [3] git config -f srv/notmuch.git/config \ publicinbox.notmuch.url http://tremily.us/notmuch git config -f srv/notmuch.git/config \ publicinbox.notmuch.address notmuch@notmuchmail.org ; this is crucial for all the public-inbox-* tools git config -f srv/notmuch.git/config \ publicinbox.notmuch.mainrepo /path/to/notmuch.git I'm sorry that most of this is still undocumented at the moment, but it's my first priority once I'm done sorting out some non-computing-related stuff. > $ docker run --name notmuch-archives -d -p 80:8080 -v ${PWD}/srv/:/srv/ wking/public-inbox > > (although I'm using -p ###:8080 and have an Nginx reverse-proxy in > front). It's not updating automatically yet, but that will probably > look like: > > 1. Pull new mbox [4]. > 2. Import into notmuch-archives [5]. > 3. Re-run public-inbox-index (this could probably be via ‘docker exec …’. > > But I'll have to test that to confirm. And ideally we'd be using > ssoma-mda or similar directly, instead of going through mbox, but I'd > rather get the official headers on the stored mail than be efficient > ;). For mirroring existing lists, I started using public-inbox-watch which currently watches Maildirs. The config knobs are sorta documented from my announcement to git@vger: https://public-inbox.org/git/20160710004813.GA20210@dcvr.yhbt.net/ http://hjrcffqmbrq6wope.onion/git/20160710004813.GA20210@dcvr.yhbt.net/ Initial import (w/o spamassassin) was done with scripts/import_vger_from_mbox in the source: torsocks git clone http://hjrcffqmbrq6wope.onion/public-inbox git clone https://public-inbox.org/ public-inbox git clone git://repo.or.cz/public-inbox I recommend public-inbox-watch for mirroring existing lists (such as what I did with git@vger) but public-inbox-mda for self-hosted lists (such as meta@public-inbox.org). > One shift from Gmane's mid.gmane.org/… is that the public-inbox UI > Message-ID lookup is per-bucket, and public-inbox seems to be > encouraging per-list buckets. The public-inbox-nntpd interface supports mid lookups across all inboxes in that instance; so it should be doable in the WWW interface, too. Either way, I think it has to be O(n) where (n) is the number of Xapian DBs, though. I already have news.public-inbox.org hooked up to both NNTP and HTTP(*), so I plan on making http://news.public-inbox.org/ to work like: nntp://news.public-inbox.org/ (*) Right now, it just redirects $GROUP to the HTTP interface: http://news.public-inbox.org/$NEWSGROUP -> http://... And the WWW interface already has fallbacks to scan + link across inboxes, so s/git/meta/ the above URLs and you'll get a link to the message on /git/ instead of /meta/ http://hjrcffqmbrq6wope.onion/meta/20160710004813.GA20210@dcvr.yhbt.net/ > And while I feel like I had a good grasp of the ssoma format two years > ago, I know very little about Perl and public-inbox. I'm sure you > could setup a public-inbox host that is more efficient than what's > currently in my Docker image. Feel free to ask me + meta@public-inbox.org if you have any questions or need help. Writing documentation doesn't come naturally to me, so it's easier for me to answer emails. I try to make it not very Perly. I don't think I'll bother with CPAN, for example (I don't think I successfully got my PAUSE account activated; not a fan of registrations, either). But there will definitely be tarball releases for distros soonish. (mainly targeting Debian at the moment, but FreeBSD is on the table). > Cheers, > Trevor > > [1]: http://public-inbox.org/INSTALL > [2]: https://hub.docker.com/r/wking/public-inbox/ > [3]: http://tremily.us/notmuch/ > [4]: https://notmuchmail.org/archives/notmuch.mbox > [5]: id:20160821043631.GA2338@odin.tremily.us