From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 992C31F597 for ; Thu, 19 Jul 2018 03:21:38 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] searchidx: respect XAPIAN_FLUSH_THRESHOLD env if set Date: Thu, 19 Jul 2018 03:21:38 +0000 Message-Id: <20180719032138.6329-1-e@80x24.org> List-Id: Xapian documents and respect XAPIAN_FLUSH_THRESHOLD to define the interval in documents to flush, so don't override it with our own BATCH_BYTES. This is helpful for initial indexing for those on slower storage but enough RAM. It is unnecessary for -watch and frequent incremental indexing; and it increases transaction times if -watch is playing "catch-up" if it was stopped for a while. The original BATCH_BYTES was tuned for a machine with little memory as the default XAPIAN_FLUSH_THRESHOLD of 10000 documents was causing swap storms. Using document counts also proved an innaccurate estimator of RAM usage compared to the actual bytes processed. --- lib/PublicInbox/SearchIdx.pm | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 0e0796c..5ba326b 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -22,7 +22,8 @@ require PublicInbox::Git; use Compress::Zlib qw(compress); use constant { - BATCH_BYTES => 1_000_000, + BATCH_BYTES => defined($ENV{XAPIAN_FLUSH_THRESHOLD}) ? + 0x7fffffff : 1_000_000, DEBUG => !!$ENV{DEBUG}, }; -- EW