user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH] doc/extindex: document --dedupe switch
Date: Fri, 24 Nov 2023 04:18:19 +0000	[thread overview]
Message-ID: <20231124041819.1979651-1-e@80x24.org> (raw)

We've had it since v1.7.0 when -extindex was introduced,
but it was never documented outside of commit messages.
---
 Documentation/public-inbox-extindex.pod | 26 +++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod
index be4ea4de..361eb43f 100644
--- a/Documentation/public-inbox-extindex.pod
+++ b/Documentation/public-inbox-extindex.pod
@@ -47,6 +47,20 @@ C<indexlevel> set to C<basic> and their respective Xapian
 public-inboxes where cross-posting is common, this allows
 significant space savings on Xapian indices.
 
+=item --dedupe=MSGID
+
+=item --dedupe
+
+Rerun deduplication on messages of a Message-IDs or all messages
+if no Message-ID is specified.  Deduplication rules may change
+and evolve over time, especially if filters are involved.
+
+C<--dedupe=MSGID> may be specified multiple times to deduplicate
+multiple Message-IDs.
+
+Use this if you see C<W: BUG? $MSGID not deduplicated properly>
+warnings from WWW logs.
+
 =item --gc
 
 Perform garbage collection instead of indexing.  Use this if
@@ -61,10 +75,6 @@ used for in-place upgrades and bugfixes while read-only server
 processes are utilizing the index.  Keep in mind this roughly
 doubles the size of the already-large Xapian database.
 
-The extindex locks will be released roughly every 10s to
-allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
-processes to write to the extindex.
-
 =item --fast
 
 Used with C<--reindex>, it will only look for new and stale
@@ -131,6 +141,14 @@ Default: none, uses C<publicinbox.indexBatchSize>
 Occasionally, public-inbox will update its schema version and
 require a full index by running this command.
 
+=head1 LOCKING
+
+It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while
+other processes are writing to covered inboxes or extindex.
+The extindex locks will be released roughly every 10s to
+allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
+processes to write to the extindex.
+
 =head1 CONTACT
 
 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>

             reply	other threads:[~2023-11-24  4:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  4:18 Eric Wong [this message]
2023-11-24 12:50 ` [PATCH] doc/extindex: document --dedupe switch Štěpán Němec
2023-11-24 23:58   ` Eric Wong
2023-11-25  8:36     ` Štěpán Němec
2023-11-25 11:49       ` Eric Wong
2023-11-25 20:25         ` [PATCH v3] " Eric Wong
2023-11-25 21:35         ` [PATCH] " Štěpán Němec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231124041819.1979651-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).