From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 87B051F8C6 for ; Tue, 6 Jul 2021 12:42:03 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 0/3] extindex: dedupe support, + gc fix Date: Tue, 6 Jul 2021 12:42:00 +0000 Message-Id: <20210706124203.17745-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: I'm still not sure how the duplicates got into my extindices; but the problem doesn't seem reproducible at the moment so maybe the original bug was fixed. Since there's already dedupe failures from past indexing, the --dedupe switch here should help us get rid of them. It's only lightly tested, but it seems to be working. There's also a minor fix for --gc, too. Eric Wong (3): eml: relax warn_ignore regexps for current Email::Address::XS extindex: implement --dedupe to fix old extindices extindex: --gc: avoid SQLite lock conflict on shard cleanup lib/PublicInbox/Eml.pm | 4 +- lib/PublicInbox/ExtSearchIdx.pm | 96 +++++++++++++++++++++++++++++++ lib/PublicInbox/OverIdx.pm | 20 +++++++ lib/PublicInbox/SearchIdxShard.pm | 5 +- script/public-inbox-extindex | 13 ++++- t/extsearch.t | 11 ++++ 6 files changed, 142 insertions(+), 7 deletions(-)