From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6B93420248; Thu, 21 Mar 2019 16:05:42 +0000 (UTC) Date: Thu, 21 Mar 2019 16:05:42 +0000 From: Eric Wong To: Ali Alnubani Cc: meta@public-inbox.org Subject: Re: mailman mbox migration Message-ID: <20190321160542.fzuqgwdx5qrmfrwr@dcvr> References: <20190213223147.gkutd24zxjpmmj43@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: Ali Alnubani wrote: > Hi Eric, > > Thanks for help and I apologize for replying quite late. > > The script import_vger_from_mbox worked very well. Good to know :> > Do you think that there might be an issue in a few messages > being imported twice by both import_vger_from_mbox and > public-inbox-watch? Since the lists I'm migrating are very > busy, and there will be a delay between importing with the > script and running public-inbox-watch. Messages they are deduped by Message-ID and content. However, V2 allows different messages to use the same Message-IDs, (because some non-spam-but-buggy bots/mailers do it). So if Mailman mangles the message going into the mbox differently than the one going into the Maildir for -watch, then you can get duplicates. Fwiw, mass imports are much faster if you use "eatmydata", a LD_PRELOAD which disables fsync. On a reasonably fast VM with good, TRIM-ed SSD ("fstrim -a" first), and lots of RAM, importing 2000-2017 LKML history took around 3-4 hours. More cores only helps if your SSD can keep up, and I seem to remember using NPROC=4 (via env) was the point of diminishing returns for that VM I used.