From: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
To: <meta@public-inbox.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Subject: Usage of public-inbox with maildirs
Date: Wed, 20 Mar 2019 23:28:30 +0100 [thread overview]
Message-ID: <745d6a8e-7e7c-8c61-336b-105cf9570ab7@oth-regensburg.de> (raw)
Hi,
we want to archive a fair amount of mailing lists (~160 lists) with
public-inbox.
Therefore, we subscribed to all of those lists with a single email
address. Mails are periodically fetched and stored in a local maildir
via IMAP. Mails are currently not pre-filtered or sorted, all of them
are bunched in a single maildir.
So every [publicinbox] config entry has the same 'watch' entry for the
maildir, but all have their own watchheader to be sensitive on different
lists.
Is this the intended way to use public-inbox, or should we rather place
mails from different lists in different maildirs before processing them
with public-inbox?
Secondly, I wrote a script that automatically that creates the
public-inbox config together with empty, bare git repositories for every
list.
A config entry looks like:
[publicinbox "listid"]
address = post@listid.org
mainrepo = /path/to/repo
watch = maildir:/path/to/maildir
watchheader = List-Id:<listid>
Our maildir currently contains ~120k mails for the initial import, and
this raised some new questions:
1. It appears that the initial import with public-inbox-watch is very
slow. After stracing the perl script, it looks like
public-inbox-watch lstats every single mail. After an hour of not
inserting any mail into a repo, I canceled the process and restarted
it on a smaller initial subset. This works better, but is still slow.
(~4k mails in 10 minutes, feels like constantly getting slower)
If public-inbox-watch is restarted for some reason (e.g., system
reboot), will it stat every single mail again on startup?
IOW, should old mails be removed from the maildir and/or will they
cause performance impacts? Is there an way to automatically delete
processed mails?
2. public-inbox-watch seems to fill the repositories with the 'old' v1
layout, and I don't know how to switch to v2. Is there a config
parameter for that?
I found the v1-v2 convert script, but I'd like to directly initialise
it with the newer version, if possible.
3. On the initial import, public-inbox-watch seems to randomly insert
mails into repositories. In the end, coverage matters more than
hierarchy, but is there a way to do the initial import sorted by
date?
Thanks a lot!
Ralf
next reply other threads:[~2019-03-20 22:28 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-20 22:28 Ralf Ramsauer [this message]
2019-03-21 3:35 ` Usage of public-inbox with maildirs Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=745d6a8e-7e7c-8c61-336b-105cf9570ab7@oth-regensburg.de \
--to=ralf.ramsauer@oth-regensburg.de \
--cc=lukas.bulwahn@gmail.com \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).