user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/4] extindex tweaks and small fixes
@ 2021-10-17  9:52 Eric Wong
  2021-10-17  9:52 ` [PATCH 1/4] extindex: use localtime to display lock time Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-17  9:52 UTC (permalink / raw)
  To: meta

Probably nothing of note, but some extra safety and
redundant work elimination.

Eric Wong (4):
  extindex: use localtime to display lock time
  extindex: retry sync_inbox before reindex
  extindex: guard against false mismatch unrefs
  extindex: better locations for {quit} checks

 lib/PublicInbox/ExtSearchIdx.pm | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] extindex: use localtime to display lock time
  2021-10-17  9:52 [PATCH 0/4] extindex tweaks and small fixes Eric Wong
@ 2021-10-17  9:52 ` Eric Wong
  2021-10-17  9:52 ` [PATCH 2/4] extindex: retry sync_inbox before reindex Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-17  9:52 UTC (permalink / raw)
  To: meta

Since this is intended for use on the command-line,
include TZ offset in time and try to shorten the
message a bit so it wraps less on a terminal.
---
 lib/PublicInbox/ExtSearchIdx.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 69d048fb7342..67d720368922 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -719,11 +719,12 @@ sub eidxq_lock_acquire ($) {
 		return $locked if $locked eq $cur;
 	}
 	my ($pid, $time, $euid, $ident) = split(/-/, $cur, 4);
-	my $t = strftime('%Y-%m-%d %k:%M:%S', gmtime($time));
+	my $t = strftime('%Y-%m-%d %k:%M %z', localtime($time));
+	local $self->{current_info} = 'eidxq';
 	if ($euid == $> && $ident eq host_ident) {
 		if (kill(0, $pid)) {
 			warn <<EOM; return;
-I: PID:$pid (re)indexing Xapian since $t, it will continue our work
+I: PID:$pid (re)indexing since $t, it will continue our work
 EOM
 		}
 		if ($!{ESRCH}) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] extindex: retry sync_inbox before reindex
  2021-10-17  9:52 [PATCH 0/4] extindex tweaks and small fixes Eric Wong
  2021-10-17  9:52 ` [PATCH 1/4] extindex: use localtime to display lock time Eric Wong
@ 2021-10-17  9:52 ` Eric Wong
  2021-10-17  9:52 ` [PATCH 3/4] extindex: guard against false mismatch unrefs Eric Wong
  2021-10-17  9:52 ` [PATCH 4/4] extindex: better locations for {quit} checks Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-17  9:52 UTC (permalink / raw)
  To: meta

Ensure the num highwater mark of the target inbox is stable
before using it.  Otherwise we may end up repeating work
done to index a message.
---
 lib/PublicInbox/ExtSearchIdx.pm | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 67d720368922..daff656d1ac5 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -859,14 +859,20 @@ sub _reindex_check_ibx ($$$) {
 	my $slice = 10000;
 	my $opt = { limit => $slice };
 	my ($beg, $end) = (1, $slice);
-	my $err = sync_inbox($self, $sync, $ibx) and return;
-	my $max = $ibx->mm->num_highwater;
+	my $ekey = $ibx->eidx_key;
+	my ($max, $max0);
+	do {
+		$max0 = $ibx->mm->num_highwater;
+		sync_inbox($self, $sync, $ibx) and return; # warned
+		$max = $ibx->mm->num_highwater;
+		return if $sync->{quit};
+	} while ($max > $max0 &&
+		warn("# $ekey moved $max0..$max, resyncing..\n"));
 	$end = $max if $end > $max;
 
 	# first, check if we missed any messages in target $ibx
 	my $msgs;
 	my $pr = $sync->{-opt}->{-progress};
-	my $ekey = $ibx->eidx_key;
 	local $sync->{-regen_fmt} = "$ekey checking %u/$max\n";
 	${$sync->{nr}} = 0;
 	my $fast = $sync->{-opt}->{fast};

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] extindex: guard against false mismatch unrefs
  2021-10-17  9:52 [PATCH 0/4] extindex tweaks and small fixes Eric Wong
  2021-10-17  9:52 ` [PATCH 1/4] extindex: use localtime to display lock time Eric Wong
  2021-10-17  9:52 ` [PATCH 2/4] extindex: retry sync_inbox before reindex Eric Wong
@ 2021-10-17  9:52 ` Eric Wong
  2021-10-17  9:52 ` [PATCH 4/4] extindex: better locations for {quit} checks Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-17  9:52 UTC (permalink / raw)
  To: meta

I'm not sure if this is a bug or not (or it could be
an old bug in the v2 indexing code).
---
 lib/PublicInbox/ExtSearchIdx.pm | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index daff656d1ac5..cb5256a2c562 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -921,6 +921,16 @@ ibx_id = ? AND xnum >= ? AND xnum <= ?
 			my ($xnum, $hex) = unpack('JH*', $k);
 			my $bin = pack('H*', $hex);
 			my $exp = $mismatch{$xnum};
+			if (defined $exp) {
+				my $smsg = $ibx->over->get_art($xnum) // next;
+				# $xnum may be expired by another process
+				if ($smsg->{blob} eq $hex) {
+					warn <<"";
+BUG: (non-fatal) $ekey #$xnum $smsg->{blob} still matches (old exp: $exp)
+
+					next;
+				} # else: continue to unref
+			}
 			my $m = defined($exp) ? "mismatch (!= $exp)" : 'stale';
 			warn("# $xnum:$hex (#@$docids): $m\n");
 			for my $i (@$docids) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] extindex: better locations for {quit} checks
  2021-10-17  9:52 [PATCH 0/4] extindex tweaks and small fixes Eric Wong
                   ` (2 preceding siblings ...)
  2021-10-17  9:52 ` [PATCH 3/4] extindex: guard against false mismatch unrefs Eric Wong
@ 2021-10-17  9:52 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-17  9:52 UTC (permalink / raw)
  To: meta

Check for graceful termination at every message since it's
a fairly inexpensive check.
---
 lib/PublicInbox/ExtSearchIdx.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index cb5256a2c562..f479cf9e1a3f 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -908,10 +908,9 @@ ibx_id = ? AND xnum >= ? AND xnum <= ?
 				for my $num (@$docids) {
 					$self->{oidx}->eidxq_add($num);
 				}
-				return if $sync->{quit};
 			}
+			return if $sync->{quit};
 		}
-		return if $sync->{quit};
 		next unless scalar keys %x3m;
 		$self->git->async_wait_all; # wait for reindex_unseen
 
@@ -936,6 +935,7 @@ BUG: (non-fatal) $ekey #$xnum $smsg->{blob} still matches (old exp: $exp)
 			for my $i (@$docids) {
 				_unref_doc($sync, $i, $ibx, $xnum, $bin);
 			}
+			return if $sync->{quit};
 		}
 	}
 	defined($hi) and ($hi < $max) and

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-17  9:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-17  9:52 [PATCH 0/4] extindex tweaks and small fixes Eric Wong
2021-10-17  9:52 ` [PATCH 1/4] extindex: use localtime to display lock time Eric Wong
2021-10-17  9:52 ` [PATCH 2/4] extindex: retry sync_inbox before reindex Eric Wong
2021-10-17  9:52 ` [PATCH 3/4] extindex: guard against false mismatch unrefs Eric Wong
2021-10-17  9:52 ` [PATCH 4/4] extindex: better locations for {quit} checks Eric Wong

Code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).