From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A4E531F5AE; Thu, 3 Jun 2021 10:05:34 +0000 (UTC) Date: Thu, 3 Jun 2021 10:05:34 +0000 From: Eric Wong To: meta@public-inbox.org Subject: TODO: Maildir import speedups Message-ID: <20210603100534.GA8833@dcvr> References: <20210603001737.31369-1-e@80x24.org> <20210603010520.GA13508@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210603010520.GA13508@dcvr> List-Id: Eric Wong wrote: > On a 4-core CPU, this speeds up "lei import" on a largish IMAP > inbox with 75K messages from ~21 minutes down to 40s. Maildir with 75K messages is nearly 8 minutes, right now :< > Parallelizing with the new LeiImportKw WQ worker class gives a > near-linear speedup and brought the runtime down to ~5:40. Maildir can be completely parallelized, since iteration has absolutely no ordering, that could bring it to around 2 minutes on a 4-core machine w/o SQLite optimization. But "--new-only"/"--quick" will require us to store a ctime in mail_sync.sqlite3 for Maildirs. See also ("lei timestamp resolution for mail synchronization"): https://public-inbox.org/meta/20210507001332.GA18490@dcvr/ > The new idx_fid_uid index on the "fid" and "uid" columns of > blob2num in mail_sync.sqlite3 brought us the final speedup. blob2name (fid,name) may need a similar index.