* [PATCH 3/3] extsearchidx: use more appropriate max for dedupe
2021-07-25 0:11 5% ` [PATCH 0/3] extindex dedupe improvements Eric Wong
@ 2021-07-25 0:11 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2021-07-25 0:11 UTC (permalink / raw)
To: meta
The over.msgid table may contain ghost Message-IDs and also
Message-IDs of deleted spam messages, so over->max isn't a
good aproproximation of dedupe progress.
---
lib/PublicInbox/ExtSearchIdx.pm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 1c2a9758..51dbf54f 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -896,7 +896,10 @@ sub eidx_dedupe ($$$) {
my ($iter, $cur_mid);
my $min_id = 0;
my $idx = 0;
- local $sync->{-regen_fmt} = "dedupe %u/".$self->{oidx}->max."\n";
+ my ($max_id) = $self->{oidx}->dbh->selectrow_array(<<EOS);
+SELECT MAX(id) FROM msgid
+EOS
+ local $sync->{-regen_fmt} = "dedupe %u/$max_id\n";
# note: we could write this query more intelligently,
# but that causes lock contention with read-only processes
^ permalink raw reply related [relevance 7%]
* [PATCH 0/3] extindex dedupe improvements
@ 2021-07-25 0:11 5% ` Eric Wong
2021-07-25 0:11 7% ` [PATCH 3/3] extsearchidx: use more appropriate max for dedupe Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2021-07-25 0:11 UTC (permalink / raw)
To: meta
It's still slow as hell due to I/O latency from Xapian(glass)
on RAM-starved systems; but at least dedupe seems correct, now.
Eric Wong (3):
extindex: support --dedupe[=MSGID]
extindex: improve comment around git->async_wait_all
extsearchidx: use more appropriate max for dedupe
lib/PublicInbox/ExtSearchIdx.pm | 34 ++++++++++++++++++++++++---------
script/public-inbox-extindex | 4 ++--
2 files changed, 27 insertions(+), 11 deletions(-)
^ permalink raw reply [relevance 5%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-07-24 6:34 [WIP] extindex: support --dedupe[=MSGID] Eric Wong
2021-07-25 0:11 5% ` [PATCH 0/3] extindex dedupe improvements Eric Wong
2021-07-25 0:11 7% ` [PATCH 3/3] extsearchidx: use more appropriate max for dedupe Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).