user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 5/9] extsearchidx: reindex works on Xapian, too
  2020-12-15  2:02  5% ` [PATCH 0/9] extindex: " Eric Wong
@ 2020-12-15  2:02  7%   ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2020-12-15  2:02 UTC (permalink / raw)
  To: meta

Instead of just working on over.sqlite3, we need to work on
the Xapian DBs as well.  While no changes to our Xapian use
have taken place recently, they could in the future and
--reindex exists to account for that.
---
 lib/PublicInbox/ExtSearchIdx.pm | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index c77fb197..f29a84e3 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -404,13 +404,18 @@ sub _reindex_finalize ($$$) {
 	my $orig_smsg = $req->{orig_smsg} // die 'BUG: no {orig_smsg}';
 	my $docid = $smsg->{num} = $orig_smsg->{num};
 	$self->{oidx}->add_overview($eml, $smsg); # may rethread
-	return if $nr == 1; # likely, all good
-
+	$self->{transact_bytes} += $smsg->{bytes};
+	if ($nr == 1) { # likely, all good
+		$self->idx_shard($docid)->shard_reindex_docid($docid);
+		return;
+	}
 	warn "W: #$docid split into $nr due to deduplication change\n";
 	my $chash0 = $smsg->{chash} // die "BUG: $smsg->{blob} no {chash}";
 	delete($by_chash->{$chash0}) // die "BUG: $smsg->{blob} chash missing";
+	my @todo;
 	for my $ary (values %$by_chash) {
 		for my $x (reverse @$ary) {
+			warn "removing #$docid xref3 $x->{blob}\n";
 			my $n = $self->{oidx}->remove_xref3($docid, $x->{blob});
 			die "BUG: $x->{blob} invalidated #$docid" if $n == 0;
 		}
@@ -424,6 +429,12 @@ sub _reindex_finalize ($$$) {
 		$e->{blob} eq $x->{blob} or die <<EOF;
 $x->{blob} != $e->{blob} (${\$ibx->eidx_key}:$e->{num});
 EOF
+		push @todo, $ibx, $e;
+	}
+	$self->{oidx}->commit_lazy; # ensure shard workers can see xref removals
+	$self->{oidx}->begin_lazy;
+	$self->idx_shard($docid)->shard_reindex_docid($docid);
+	while (my ($ibx, $e) = splice(@todo, 0, 2)) {
 		reindex_unseen($self, $sync, $ibx, $e);
 	}
 }
@@ -531,11 +542,12 @@ sub eidxq_process ($$) { # for reindexing
 
 		# shards flush on their own, just don't queue up too many
 		# deletes
-		if (($cur % 1000) == 0) {
+		if ($self->{transact_bytes} >= $self->{batch_bytes}) {
 			$self->git->async_wait_all;
 			$self->{oidx}->commit_lazy;
 			$self->{oidx}->begin_lazy;
 			$pr->("reindexed $cur/$tot\n") if $pr;
+			$self->{transact_bytes} = 0;
 		}
 		# this is only for SIGUSR1, shards do their own accounting:
 		reindex_checkpoint($self, $sync) if ${$sync->{need_checkpoint}};

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/9] extindex: --reindex support
  @ 2020-12-15  2:02  5% ` Eric Wong
  2020-12-15  2:02  7%   ` [PATCH 5/9] extsearchidx: reindex works on Xapian, too Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2020-12-15  2:02 UTC (permalink / raw)
  To: meta

Patches 1 and 2 are resends, the rest have gone through a lot of
changes and I'm probably ready to run this live on the
extindex which holds my lore mirror onion
<http://rskvuqcfnfizkjg6h5jvovwb3wkikzcwskf54lfpymus6mxrzw67b5ad.onion/all/>

Eric Wong (9):
  extindex: preliminary --reindex support
  extindex: delete stale messages from over.sqlite3
  over: sort xref3 by xnum if ibx_id repeats
  extindex: support --rethread and content bifurcation
  extsearchidx: reindex works on Xapian, too
  extsearchidx: checkpoint releases locks
  extsearchidx: simplify reindex code paths
  extsearchidx: reindex releases over.sqlite3 handles properly
  searchidxshard: simplify newline elimination

 lib/PublicInbox/ExtSearchIdx.pm   | 369 +++++++++++++++++++++++++++++-
 lib/PublicInbox/Over.pm           |   5 +-
 lib/PublicInbox/OverIdx.pm        |  23 ++
 lib/PublicInbox/SearchIdx.pm      |  13 +-
 lib/PublicInbox/SearchIdxShard.pm |  20 +-
 lib/PublicInbox/V2Writable.pm     |  11 +-
 t/extsearch.t                     | 133 ++++++++++-
 7 files changed, 542 insertions(+), 32 deletions(-)

^ permalink raw reply	[relevance 5%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-12-11  3:37     [PATCH] extindex: preliminary --reindex support Eric Wong
2020-12-15  2:02  5% ` [PATCH 0/9] extindex: " Eric Wong
2020-12-15  2:02  7%   ` [PATCH 5/9] extsearchidx: reindex works on Xapian, too Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).