user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* mailman mbox migration
@ 2019-02-13 14:48 Ali Alnubani
  2019-02-13 22:31 ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Ali Alnubani @ 2019-02-13 14:48 UTC (permalink / raw)
  To: meta@public-inbox.org

Hi,

Hope this is the right place to post this.

I'm trying to migrate archives from a Mailman instance (v2.1.15). The format of the archives is mbox format.
How do you suggest I do that? It seems that public-inbox only supports maildir format. Do I need to convert my mbox files to maildir to be able to import them?
Or is there an easier way to achieve this (i.e. mbox support from public-inbox)?

Thanks,
Ali

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mailman mbox migration
  2019-02-13 14:48 mailman mbox migration Ali Alnubani
@ 2019-02-13 22:31 ` Eric Wong
  2019-03-21 13:23   ` Ali Alnubani
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2019-02-13 22:31 UTC (permalink / raw)
  To: Ali Alnubani; +Cc: meta

Ali Alnubani <alialnu@mellanox.com> wrote:
> Hi,
> 
> Hope this is the right place to post this.

Of course :>

> I'm trying to migrate archives from a Mailman instance (v2.1.15). The format of the archives is mbox format.

For Mailman, Konstantin posted some scripts he used for kernel.org + Mailman
the other day:
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git

Specifically:
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/list-archive-maker.py

> How do you suggest I do that? It seems that public-inbox only supports maildir format. Do I need to convert my mbox files to maildir to be able to import them?
> Or is there an easier way to achieve this (i.e. mbox support from public-inbox)?

For regular mboxrd and mboxo (not mangled by Mailman), you can
look into adapting scripts/import_vger_from_mbox

Apparently, I did add support for importing mboxrd/mboxo formats
in the PublicInbox::InboxWritable::import_mbox subroutine (which
is used by import_vger_from_mbox).

mboxcl isn't supported, yet (and I've seen some really scary
mboxcl with multiple Content-Length headers, and >From escaping
to boot; so I'm not sure if that's a route I want to take...)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: mailman mbox migration
  2019-02-13 22:31 ` Eric Wong
@ 2019-03-21 13:23   ` Ali Alnubani
  2019-03-21 16:05     ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Ali Alnubani @ 2019-03-21 13:23 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta@public-inbox.org

Hi Eric,

Thanks for help and I apologize for replying quite late.

The script import_vger_from_mbox worked very well.

Do you think that there might be an issue in a few messages being imported twice by both import_vger_from_mbox and public-inbox-watch? Since the lists I'm migrating are very busy, and there will be a delay between importing with the script and running public-inbox-watch.

Thanks,
Ali

> -----Original Message-----
> From: Eric Wong <e@80x24.org>
> Sent: Thursday, February 14, 2019 12:32 AM
> To: Ali Alnubani <alialnu@mellanox.com>
> Cc: meta@public-inbox.org
> Subject: Re: mailman mbox migration
> 
> Ali Alnubani <alialnu@mellanox.com> wrote:
> > Hi,
> >
> > Hope this is the right place to post this.
> 
> Of course :>
> 
> > I'm trying to migrate archives from a Mailman instance (v2.1.15). The
> format of the archives is mbox format.
> 
> For Mailman, Konstantin posted some scripts he used for kernel.org +
> Mailman the other day:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fmricon%2Fkorg-
> helpers.git&amp;data=02%7C01%7Calialnu%40mellanox.com%7C4ed2e432a5
> eb4e93335a08d692030bc6%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C
> 1%7C636856939122967951&amp;sdata=5HW1i1v9OZRpcjMwgv1tn4YP9rB1W
> u5IMrOLCy1lVFo%3D&amp;reserved=0
> 
> Specifically:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fmricon%2Fkorg-
> helpers.git%2Ftree%2Flist-archive-
> maker.py&amp;data=02%7C01%7Calialnu%40mellanox.com%7C4ed2e432a5
> eb4e93335a08d692030bc6%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C
> 1%7C636856939122967951&amp;sdata=jHyEcwI1j1M3%2F%2BNd5PxUrOcu1
> 05ebHJhgU1irbwKYu8%3D&amp;reserved=0
> 
> > How do you suggest I do that? It seems that public-inbox only supports
> maildir format. Do I need to convert my mbox files to maildir to be able to
> import them?
> > Or is there an easier way to achieve this (i.e. mbox support from public-
> inbox)?
> 
> For regular mboxrd and mboxo (not mangled by Mailman), you can look into
> adapting scripts/import_vger_from_mbox
> 
> Apparently, I did add support for importing mboxrd/mboxo formats in the
> PublicInbox::InboxWritable::import_mbox subroutine (which is used by
> import_vger_from_mbox).
> 
> mboxcl isn't supported, yet (and I've seen some really scary mboxcl with
> multiple Content-Length headers, and >From escaping to boot; so I'm not
> sure if that's a route I want to take...)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mailman mbox migration
  2019-03-21 13:23   ` Ali Alnubani
@ 2019-03-21 16:05     ` Eric Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2019-03-21 16:05 UTC (permalink / raw)
  To: Ali Alnubani; +Cc: meta

Ali Alnubani <alialnu@mellanox.com> wrote:
> Hi Eric,
> 
> Thanks for help and I apologize for replying quite late.
> 
> The script import_vger_from_mbox worked very well.

Good to know :>

> Do you think that there might be an issue in a few messages
> being imported twice by both import_vger_from_mbox and
> public-inbox-watch? Since the lists I'm migrating are very
> busy, and there will be a delay between importing with the
> script and running public-inbox-watch.

Messages they are deduped by Message-ID and content.  However,
V2 allows different messages to use the same Message-IDs,
(because some non-spam-but-buggy bots/mailers do it).  So if
Mailman mangles the message going into the mbox differently than
the one going into the Maildir for -watch, then you can get
duplicates.

Fwiw, mass imports are much faster if you use "eatmydata", a
LD_PRELOAD which disables fsync.  On a reasonably fast VM with
good, TRIM-ed SSD ("fstrim -a" first), and lots of RAM,
importing 2000-2017 LKML history took around 3-4 hours.  More
cores only helps if your SSD can keep up, and I seem to remember
using NPROC=4 (via env) was the point of diminishing returns for
that VM I used.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-21 16:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13 14:48 mailman mbox migration Ali Alnubani
2019-02-13 22:31 ` Eric Wong
2019-03-21 13:23   ` Ali Alnubani
2019-03-21 16:05     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).