user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 3/3] extsearchidx: use more appropriate max for dedupe
  2021-07-25  0:11  5% ` [PATCH 0/3] extindex dedupe improvements Eric Wong
@ 2021-07-25  0:11  7%   ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2021-07-25  0:11 UTC (permalink / raw)
  To: meta

The over.msgid table may contain ghost Message-IDs and also
Message-IDs of deleted spam messages, so over->max isn't a
good aproproximation of dedupe progress.
---
 lib/PublicInbox/ExtSearchIdx.pm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 1c2a9758..51dbf54f 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -896,7 +896,10 @@ sub eidx_dedupe ($$$) {
 	my ($iter, $cur_mid);
 	my $min_id = 0;
 	my $idx = 0;
-	local $sync->{-regen_fmt} = "dedupe %u/".$self->{oidx}->max."\n";
+	my ($max_id) = $self->{oidx}->dbh->selectrow_array(<<EOS);
+SELECT MAX(id) FROM msgid
+EOS
+	local $sync->{-regen_fmt} = "dedupe %u/$max_id\n";
 
 	# note: we could write this query more intelligently,
 	# but that causes lock contention with read-only processes

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/3] extindex dedupe improvements
  @ 2021-07-25  0:11  5% ` Eric Wong
  2021-07-25  0:11  7%   ` [PATCH 3/3] extsearchidx: use more appropriate max for dedupe Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2021-07-25  0:11 UTC (permalink / raw)
  To: meta

It's still slow as hell due to I/O latency from Xapian(glass)
on RAM-starved systems; but at least dedupe seems correct, now.

Eric Wong (3):
  extindex: support --dedupe[=MSGID]
  extindex: improve comment around git->async_wait_all
  extsearchidx: use more appropriate max for dedupe

 lib/PublicInbox/ExtSearchIdx.pm | 34 ++++++++++++++++++++++++---------
 script/public-inbox-extindex    |  4 ++--
 2 files changed, 27 insertions(+), 11 deletions(-)

^ permalink raw reply	[relevance 5%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-07-24  6:34     [WIP] extindex: support --dedupe[=MSGID] Eric Wong
2021-07-25  0:11  5% ` [PATCH 0/3] extindex dedupe improvements Eric Wong
2021-07-25  0:11  7%   ` [PATCH 3/3] extsearchidx: use more appropriate max for dedupe Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).