From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 890981F66F; Tue, 3 Nov 2020 09:08:28 +0000 (UTC) Date: Tue, 3 Nov 2020 09:08:28 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: [PATCH 51/52] extsearchidx: support --batch-size checkpoints Message-ID: <20201103090828.GA18503@dcvr> References: <20201027075453.19163-1-e@80x24.org> <20201027075453.19163-52-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201027075453.19163-52-e@80x24.org> List-Id: Eric Wong wrote: > This is needed to limit the RSS of processes and ensure the > stored data in over.sqlite3 and Xapian DBs are consistent if > interrupted. Without checkpoints, indexing lore causes shard > workers to take several GB of memory and thrash/OOM smaller > systems. Ugh, the ~30 hours in the cover letter was without this patch. Using even an 100m batch size(*) makes lore/* take ~70 hours :< (*) 100m works fine for lore/lkml and even >10m is diminishing returns...