user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Usage of public-inbox with maildirs
@ 2019-03-20 22:28 Ralf Ramsauer
  2019-03-21  3:35 ` Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: Ralf Ramsauer @ 2019-03-20 22:28 UTC (permalink / raw)
  To: meta; +Cc: Lukas Bulwahn

Hi,

we want to archive a fair amount of mailing lists (~160 lists) with
public-inbox.

Therefore, we subscribed to all of those lists with a single email
address. Mails are periodically fetched and stored in a local maildir
via IMAP. Mails are currently not pre-filtered or sorted, all of them
are bunched in a single maildir.

So every [publicinbox] config entry has the same 'watch' entry for the
maildir, but all have their own watchheader to be sensitive on different
lists.

Is this the intended way to use public-inbox, or should we rather place
mails from different lists in different maildirs before processing them
with public-inbox?

Secondly, I wrote a script that automatically that creates the
public-inbox config together with empty, bare git repositories for every
list.

A config entry looks like:

    [publicinbox "listid"]
        address = post@listid.org
        mainrepo = /path/to/repo
        watch = maildir:/path/to/maildir
        watchheader = List-Id:<listid>

Our maildir currently contains ~120k mails for the initial import, and
this raised some new questions:

1. It appears that the initial import with public-inbox-watch is very
   slow. After stracing the perl script, it looks like
   public-inbox-watch lstats every single mail. After an hour of not
   inserting any mail into a repo, I canceled the process and restarted
   it on a smaller initial subset. This works better, but is still slow.
   (~4k mails in 10 minutes, feels like constantly getting slower)

   If public-inbox-watch is restarted for some reason (e.g., system
   reboot), will it stat every single mail again on startup?
   IOW, should old mails be removed from the maildir and/or will they
   cause performance impacts? Is there an way to automatically delete
   processed mails?

2. public-inbox-watch seems to fill the repositories with the 'old' v1
   layout, and I don't know how to switch to v2. Is there a config
   parameter for that?

   I found the v1-v2 convert script, but I'd like to directly initialise
   it with the newer version, if possible.

3. On the initial import, public-inbox-watch seems to randomly insert
   mails into repositories. In the end, coverage matters more than
   hierarchy, but is there a way to do the initial import sorted by
   date?

Thanks a lot!
  Ralf



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-03-21  3:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-20 22:28 Usage of public-inbox with maildirs Ralf Ramsauer
2019-03-21  3:35 ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).