user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 5/6] extindex: more consistent doc removal
Date: Mon, 11 Oct 2021 08:06:19 +0000	[thread overview]
Message-ID: <20211011080620.27478-6-e@80x24.org> (raw)
In-Reply-To: <20211011080620.27478-1-e@80x24.org>

We need to ensure a message is consistently removed from eidxq,
over and Xapian in all cases.  Removing from eidxq saves users
from some noisy error messages.
---
 lib/PublicInbox/ExtSearchIdx.pm | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index c0fd282358f9..ce9cea25da5e 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -129,6 +129,13 @@ sub apply_boost ($$) {
 	$req->{self}->{oidx}->add_overview($req->{eml}, $new_smsg);
 }
 
+sub remove_doc ($$) {
+	my ($self, $docid) = @_;
+	$self->{oidx}->delete_by_num($docid);
+	$self->{oidx}->eidxq_del($docid);
+	$self->idx_shard($docid)->ipc_do('xdb_remove', $docid);
+}
+
 sub _unref_doc ($$$$$;$) {
 	my ($sync, $docid, $ibx, $xnum, $oidbin, $eml) = @_;
 	my $s = 'DELETE FROM xref3 WHERE ibx_id = ? AND oidbin = ?';
@@ -139,13 +146,11 @@ sub _unref_doc ($$$$$;$) {
 	$del->bind_param(3, $xnum) if defined($xnum);
 	$del->execute;
 	my $xr3 = $sync->{self}->{oidx}->get_xref3($docid, 1);
-	my $idx = $sync->{self}->idx_shard($docid);
 	if (scalar(@$xr3) == 0) { # all gone
-		$sync->{self}->{oidx}->delete_by_num($docid);
-		$sync->{self}->{oidx}->eidxq_del($docid);
-		$idx->ipc_do('xdb_remove', $docid);
+		remove_doc($sync->{self}, $docid);
 	} else { # enqueue for reindex of remaining messages
 		my $ekey = $ibx->{-gc_eidx_key} // $ibx->eidx_key;
+		my $idx = $sync->{self}->idx_shard($docid);
 		$idx->ipc_do('remove_eidx_info', $docid, $ekey, $eml);
 		$sync->{self}->{oidx}->eidxq_add($docid); # yes, add
 	}
@@ -246,7 +251,7 @@ E: #$smsg->{num} gone ($smsg->{blob} => $oidhex)
 EOM
 	} else {
 		warn "E: $smsg->{blob} gone, removing #$smsg->{num}\n";
-		$self->{oidx}->delete_by_num($smsg->{num});
+		remove_doc($self, $smsg->{num});
 	}
 }
 
@@ -424,6 +429,12 @@ DELETE FROM over WHERE num > 0 AND num NOT IN (SELECT docid FROM xref3)
 	warn "I: eliminated $nr stale over entries\n" if $nr != 0;
 	reindex_checkpoint($self, $sync) if checkpoint_due($sync);
 
+	$nr = $self->{oidx}->dbh->do(<<'');
+DELETE FROM eidxq WHERE docid NOT IN (SELECT num FROM over)
+
+	warn "I: eliminated $nr stale reindex queue entries\n" if $nr != 0;
+	reindex_checkpoint($self, $sync) if checkpoint_due($sync);
+
 	my ($cur) = $self->{oidx}->dbh->selectrow_array(<<EOM);
 SELECT MIN(num) FROM over WHERE num > 0
 EOM
@@ -571,12 +582,13 @@ sub _reindex_oid { # git->cat_async callback
 		my $remain = $self->{oidx}->remove_xref3($docid, $expect_oid);
 		if ($remain == 0) {
 			warn "W: #$docid gone or corrupted\n";
-			$self->idx_shard($docid)->ipc_do('xdb_remove', $docid);
+			remove_doc($self, $docid);
 		} elsif (my $next_oid = $req->{xr3r}->[++$req->{ix}]->[2]) {
+			# n.b. we can't remove_eidx_info here
 			$self->git->cat_async($next_oid, \&_reindex_oid, $req);
 		} else {
 			warn "BUG: #$docid gone (UNEXPECTED)\n";
-			$self->idx_shard($docid)->ipc_do('xdb_remove', $docid);
+			remove_doc($self, $docid);
 		}
 		return;
 	}
@@ -609,8 +621,7 @@ sub _reindex_smsg ($$$) {
 		warn <<"";
 BUG? #$docid $smsg->{blob} is not referenced by inboxes during reindex
 
-		$self->{oidx}->delete_by_num($docid);
-		$self->idx_shard($docid)->ipc_do('xdb_remove', $docid);
+		remove_doc($self, $docid);
 		return;
 	}
 
@@ -957,8 +968,7 @@ sub dd_smsg { # git->cat_async callback
 		for my $smsg (@$ary) {
 			my $gone = $smsg->{num};
 			$oidx->merge_xref3($keep->{num}, $gone, $smsg->{blob});
-			$self->idx_shard($gone)->ipc_do('xdb_remove', $gone);
-			$oidx->delete_by_num($gone);
+			remove_doc($self, $gone);
 		}
 	}
 }

  parent reply	other threads:[~2021-10-11  8:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-11  8:06 [PATCH 0/6] extindex: --reindex --fast gets faster Eric Wong
2021-10-11  8:06 ` [PATCH 1/6] extindex: speed up --reindex --fast Eric Wong
2021-10-11  8:06 ` [PATCH 2/6] sqlite: PRAGMA optimize on close Eric Wong
2021-10-11  8:06 ` [PATCH 3/6] extindex: rename var: active => active_shards Eric Wong
2021-10-11  8:06 ` [PATCH 4/6] extindex: share unref logic in more places Eric Wong
2021-10-11  8:06 ` Eric Wong [this message]
2021-10-11  8:06 ` [PATCH 6/6] extindex: avoid invalid blobs after unref Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211011080620.27478-6-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    --subject='Re: [PATCH 5/6] extindex: more consistent doc removal' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).