From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 3447C1F406; Sat, 25 Nov 2023 20:27:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1700944071; bh=fO5hdGqa+eiVuxmRYI8X93TQuy3k9d28Nw/R+pu5QW0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=UYBuGZ+pyWIGrpQTB7frKYIfD6GTidTLj5cwsLYTW0sgQ5EXVYhgY+iMuZeO+M16d 4kIZDaN/VK+nBXQhCywpiWQIxnhGQ/+FStq/9Qq8tN8d5ZWDBH0UyQh48K3GgezcQG F8etX0cTBfL0yer3RZxZtnBZw69X0prmLNzI28Hs= Date: Sat, 25 Nov 2023 20:25:20 +0000 From: Eric Wong To: =?utf-8?B?xaB0xJtww6FuIE7Em21lYw==?= Cc: meta@public-inbox.org Subject: [PATCH v3] doc/extindex: document --dedupe switch Message-ID: <20231125202520.M343969@dcvr> References: <20231124041819.1979651-1-e@80x24.org> <20231124135059+0100.879284-stepnem@smrk.net> <20231124235829.M382392@dcvr> <20231125093637+0100.776019-stepnem@smrk.net> <20231125114939.M737325@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20231125114939.M737325@dcvr> List-Id: Eric Wong wrote: > Štěpán Němec wrote: > > Eric Wong wrote: > > > +Rerun deduplication on messages of with the given Message-ID or > > ^^^^^^^ > > not so fast :-P > > Thanks. Will s/of // when I commit when more awake. > Getting even more scatter-brained :x OK, will probably push this out: --------8<-------- Subject: [PATCH] doc/extindex: document --dedupe switch We've had it since v1.7.0 when -extindex was introduced, but it was never documented outside of commit messages. --- Documentation/public-inbox-extindex.pod | 26 +++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod index be4ea4de..b53e45ed 100644 --- a/Documentation/public-inbox-extindex.pod +++ b/Documentation/public-inbox-extindex.pod @@ -47,6 +47,20 @@ C set to C and their respective Xapian public-inboxes where cross-posting is common, this allows significant space savings on Xapian indices. +=item --dedupe=MSGID + +=item --dedupe + +Rerun deduplication on messages with the given Message-ID or +all messages if no Message-ID is specified. Deduplication rules may +change and evolve over time, especially if filters are involved. + +C<--dedupe=MSGID> may be specified multiple times to deduplicate +multiple Message-IDs. + +Use this if you see C +warnings from WWW logs. + =item --gc Perform garbage collection instead of indexing. Use this if @@ -61,10 +75,6 @@ used for in-place upgrades and bugfixes while read-only server processes are utilizing the index. Keep in mind this roughly doubles the size of the already-large Xapian database. -The extindex locks will be released roughly every 10s to -allow L and L -processes to write to the extindex. - =item --fast Used with C<--reindex>, it will only look for new and stale @@ -131,6 +141,14 @@ Default: none, uses C Occasionally, public-inbox will update its schema version and require a full index by running this command. +=head1 LOCKING + +It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while +other processes are writing to covered inboxes or extindex. +The extindex locks will be released roughly every 10s to +allow L and L +processes to write to the extindex. + =head1 CONTACT Feedback welcome via plain-text mail to L