From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 8ABE51F462 for ; Fri, 14 Jun 2019 03:03:18 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 0/4] xcpdb: support resharding Xapian DBs Date: Fri, 14 Jun 2019 03:03:14 +0000 Message-Id: <20190614030318.17216-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Defaulting the number of Xapian shards based on the number of CPUs can be detrimental to performance given the lack of speed in common storage systems; since NVMe speeds are not yet common. To help public-inbox users recover from this inefficiency while allowing continuous email archival, we can support arbitrary resharding to have fewer shards (or more, if doing HW upgrades). Note: I'm also going to move the documentation towards using the word "shard" (instead of "partition") to be consistent with current Xapian documentation (1.4+, and "master"). Xapian 1.2 did not use the word "shard" at all, but IME from my interactions with non-Xapian search engine folks, the word "shard" is pretty common. Eric Wong (4): v2writable: use a smaller default for Xapian partitions xapcmd: preserve indexlevel based on the destination xcpdb: use destination shard as progress prefix xcpdb: support resharding v2 repos Documentation/public-inbox-xcpdb.pod | 11 ++ MANIFEST | 1 + lib/PublicInbox/V2Writable.pm | 18 ++- lib/PublicInbox/Xapcmd.pm | 222 +++++++++++++++++++++------ script/public-inbox-xcpdb | 4 +- t/xcpdb-reshard.t | 83 ++++++++++ 6 files changed, 286 insertions(+), 53 deletions(-) create mode 100644 t/xcpdb-reshard.t -- EW