* [PATCH 0/5] fix extindex reindex harder
@ 2021-10-12 22:44 Eric Wong
2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
To: meta
1/5 may affect some users
3/5 quiets things down for users on SQLite <3.18
5/5 is a good usability fix for me
Eric Wong (5):
extindex: flush pending reindex before unref
lei/store: use remove_doc to save some LoC
index: optimize after all SQLite DB commits
doc: relnotes: note some recent improvements
lei up --all: show output for warnings
Documentation/RelNotes/v1.7.0.wip | 14 +++++++++++++-
lib/PublicInbox/ExtSearchIdx.pm | 4 ++++
lib/PublicInbox/LEI.pm | 12 ++++++++----
lib/PublicInbox/LeiMailSync.pm | 2 +-
lib/PublicInbox/LeiStore.pm | 3 +--
lib/PublicInbox/LeiUp.pm | 7 +++++++
lib/PublicInbox/OverIdx.pm | 1 +
lib/PublicInbox/SearchIdx.pm | 1 +
lib/PublicInbox/V2Writable.pm | 16 +++++-----------
9 files changed, 41 insertions(+), 19 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/5] extindex: flush pending reindex before unref
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
2021-10-12 22:44 ` [PATCH 2/5] lei/store: use remove_doc to save some LoC Eric Wong
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
To: meta
This prevents unnecessary message renumbering and I/O.
Without this change, there is a small window for long-running
WWW streaming requests to miss a message that was unref-ed
before reindexing. If we expose an "All Mail" mailbox via
IMAP/JMAP, this will save client traffic.
---
lib/PublicInbox/ExtSearchIdx.pm | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index c2ab0447e176..40489eab4c66 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -193,6 +193,7 @@ sub do_xpost ($$) {
$idx->ipc_do('add_eidx_info', $docid, $eidx_key, $eml);
apply_boost($req, $smsg) if $req->{boost_in_use};
} else { # 'd' no {xnum}
+ $self->git->async_wait_all;
$oid = pack('H*', $oid);
_unref_doc($req, $docid, $xibx, undef, $oid, $eml);
}
@@ -261,6 +262,7 @@ sub _blob_missing ($$) { # called when a known $smsg->{blob} is gone
# xnum and ibx are unknown, we only call this when an entry from
# /ei*/over.sqlite3 is bad, not on entries from xap*/over.sqlite3
my $oidbin = pack('H*', $smsg->{blob});
+ $req->{self}->git->async_wait_all;
_unref_doc($req, $smsg, undef, undef, $oidbin);
}
@@ -552,6 +554,7 @@ sub _reindex_finalize ($$$) {
}
return if $nr == 1; # likely, all good
+ $self->git->async_wait_all;
warn "W: #$docid split into $nr due to deduplication change\n";
my @todo;
for my $ary (values %$by_chash) {
@@ -896,6 +899,7 @@ ibx_id = ? AND xnum >= ? AND xnum <= ?
}
return if $sync->{quit};
next unless scalar keys %x3m;
+ $self->git->async_wait_all; # wait for reindex_unseen
# eliminate stale/mismatched entries
my %mismatch = map { $_->{num} => $_->{blob} } @$msgs;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/5] lei/store: use remove_doc to save some LoC
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
2021-10-12 22:44 ` [PATCH 3/5] index: optimize after all SQLite DB commits Eric Wong
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
To: meta
---
lib/PublicInbox/LeiStore.pm | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 613d1d31f581..bf41dcf53094 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -281,8 +281,7 @@ sub remove_docids ($;@) {
my ($self, @docids) = @_;
my $eidx = eidx_init($self);
for my $docid (@docids) {
- $eidx->idx_shard($docid)->ipc_do('xdb_remove', $docid);
- $eidx->{oidx}->delete_by_num($docid);
+ $eidx->remove_doc($docid);
$eidx->{oidx}->{dbh}->do(<<EOF, undef, $docid);
DELETE FROM xref3 WHERE docid = ?
EOF
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/5] index: optimize after all SQLite DB commits
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
2021-10-12 22:44 ` [PATCH 2/5] lei/store: use remove_doc to save some LoC Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
2021-10-12 22:44 ` [PATCH 4/5] doc: relnotes: note some recent improvements Eric Wong
2021-10-12 22:45 ` [PATCH 5/5] lei up --all: show output for warnings Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
To: meta
This covers v1 inboxes, as well. We also guard the execution
since "PRAGMA optimize" was only introduced in SQLite 3.18.0
(2017-03-30)
---
lib/PublicInbox/LeiMailSync.pm | 2 +-
lib/PublicInbox/OverIdx.pm | 1 +
lib/PublicInbox/SearchIdx.pm | 1 +
lib/PublicInbox/V2Writable.pm | 16 +++++-----------
4 files changed, 8 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm
index c6cd1bc58d0a..f7e37ad9ca80 100644
--- a/lib/PublicInbox/LeiMailSync.pm
+++ b/lib/PublicInbox/LeiMailSync.pm
@@ -48,7 +48,7 @@ sub lms_pause {
my ($self) = @_;
$self->{fmap} = {};
my $dbh = delete $self->{dbh};
- $dbh->do('PRAGMA optimize') if $dbh;
+ eval { $dbh->do('PRAGMA optimize') } if $dbh;
}
sub create_tables {
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index d6d706f7fed0..9fdb26c0d5c2 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -434,6 +434,7 @@ sub commit_lazy {
my ($self) = @_;
delete $self->{txn} or return;
$self->{dbh}->commit;
+ eval { $self->{dbh}->do('PRAGMA optimize') };
}
sub begin_lazy {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a2ed94993993..928152ec4df4 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -793,6 +793,7 @@ sub v1_checkpoint ($$;$) {
${$sync->{max}} = $self->{batch_bytes};
$self->{mm}->{dbh}->commit;
+ eval { $self->{mm}->{dbh}->do('PRAGMA optimize') };
my $xdb = $self->{xdb};
if ($newest && $xdb) {
my $cur = $xdb->get_metadata('last_commit');
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index efcc1fc21a18..3914383cc9d3 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -547,11 +547,11 @@ sub checkpoint ($;$) {
}
my $shards = $self->{idx_shards};
if ($shards) {
- my $mm = $self->{mm};
- my $dbh = $mm->{dbh} if $mm;
+ my $dbh = $self->{mm}->{dbh} if $self->{mm};
# SQLite msgmap data is second in importance
$dbh->commit if $dbh;
+ eval { $dbh->do('PRAGMA optimize') };
# SQLite overview is third
$self->{oidx}->commit_lazy;
@@ -620,16 +620,10 @@ sub done {
my $m = $err ? 'rollback' : 'commit';
eval { $mm->{dbh}->$m };
$err .= "msgmap $m: $@\n" if $@;
- eval { $mm->{dbh}->do('PRAGMA optimize') };
- $err .= "msgmap optimize: $@\n" if $@;
}
- if ($self->{oidx} && $self->{oidx}->{dbh}) {
- if ($err) {
- eval { $self->{oidx}->rollback_lazy };
- $err .= "overview rollback: $@\n" if $@;
- }
- eval { $self->{oidx}->{dbh}->do('PRAGMA optimize') };
- $err .= "overview optimize: $@\n" if $@;
+ if ($self->{oidx} && $self->{oidx}->{dbh} && $err) {
+ eval { $self->{oidx}->rollback_lazy };
+ $err .= "overview rollback: $@\n" if $@;
}
my $shards = delete $self->{idx_shards};
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/5] doc: relnotes: note some recent improvements
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
` (2 preceding siblings ...)
2021-10-12 22:44 ` [PATCH 3/5] index: optimize after all SQLite DB commits Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
2021-10-12 22:45 ` [PATCH 5/5] lei up --all: show output for warnings Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
To: meta
---
Documentation/RelNotes/v1.7.0.wip | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/Documentation/RelNotes/v1.7.0.wip b/Documentation/RelNotes/v1.7.0.wip
index f71f447feb4d..854c2fce7c88 100644
--- a/Documentation/RelNotes/v1.7.0.wip
+++ b/Documentation/RelNotes/v1.7.0.wip
@@ -8,7 +8,11 @@ Another big release focused on multi-inbox search and scalability.
* general changes
- config file parsing is 2x faster with 50K inboxes
+ - config file parsing is 2x faster with 50K inboxes
+
+ - deduplication ignores whitespace differences within address fields
+
+ - "PRAGMA optimize" is now issued on commits for SQLite 3.18+
* read-only public-inbox-daemon (-httpd, -nntpd, -imapd):
@@ -47,6 +51,14 @@ Another big release focused on multi-inbox search and scalability.
filesystem or over HTTP(S). See lei(1), lei-overview(7), and other
lei-* manpages for details.
+* public-inbox-index
+
+ - non-strict (Subject-based) threading supports non-ASCII characters,
+ reindexing is necessary for old messages with non-ASCII subjects.
+
+ - --batch-size is now 8M on 64-bit systems for throughput improvements,
+ higher values are still advised for more powerful hardware.
+
* public-inbox-watch
- IMAP and NNTP code shared with lei, fixing an off-by-one error
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 5/5] lei up --all: show output for warnings
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
` (3 preceding siblings ...)
2021-10-12 22:44 ` [PATCH 4/5] doc: relnotes: note some recent improvements Eric Wong
@ 2021-10-12 22:45 ` Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:45 UTC (permalink / raw)
To: meta
This helps users make sense of which saved searches some
warnings were coming from.
Since I often create and discard externals, some warnings
from saved searches were confusing to me without output context:
"`$FOO' is unknown"
"$FOO not indexed by Xapian"
---
lib/PublicInbox/LEI.pm | 12 ++++++++----
lib/PublicInbox/LeiUp.pm | 7 +++++++
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 51b0e95e1728..183cb545fe55 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -522,7 +522,7 @@ sub sigint_reap {
sub fail ($$;$) {
my ($self, $buf, $exit_code) = @_;
$self->{failed}++;
- err($self, $buf) if defined $buf;
+ warn($buf, "\n") if defined $buf;
$self->{pkt_op_p}->pkt_do('fail_handler') if $self->{pkt_op_p};
x_it($self, ($exit_code // 1) << 8);
undef;
@@ -542,7 +542,7 @@ sub puts ($;@) { out(shift, map { "$_\n" } @_) }
sub child_error { # passes non-fatal curl exit codes to user
my ($self, $child_error, $msg) = @_; # child_error is $?
$child_error ||= 1 << 8;
- $self->err($msg) if $msg;
+ warn($msg, "\n") if defined $msg;
if ($self->{pkt_op_p}) { # to top lei-daemon
$self->{pkt_op_p}->pkt_do('child_error', $child_error);
} elsif ($self->{sock}) { # to lei(1) client
@@ -588,8 +588,12 @@ sub _lei_atfork_child {
eval 'no warnings; undef $PublicInbox::LeiNoteEvent::to_flush';
undef $errors_log;
$quit = \&CORE::exit;
- $self->{-eml_noisy} or # only "lei import" sets this atm
- $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
+ if (!$self->{-eml_noisy}) { # only "lei import" sets this atm
+ my $cb = $SIG{__WARN__} // \&CORE::warn;
+ $SIG{__WARN__} = sub {
+ $cb->(@_) unless PublicInbox::Eml::warn_ignore(@_)
+ };
+ }
$current_lei = $persist ? undef : $self; # for SIG{__WARN__}
}
diff --git a/lib/PublicInbox/LeiUp.pm b/lib/PublicInbox/LeiUp.pm
index 3e1ca21e29e7..3011300dd836 100644
--- a/lib/PublicInbox/LeiUp.pm
+++ b/lib/PublicInbox/LeiUp.pm
@@ -159,6 +159,13 @@ sub event_step { # runs via PublicInbox::DS::requeue
delete $l->{opt}->{all};
$l->qerr("# updating $self->{out}");
$l->{up_op_p} = $self->{op_p}; # ($l => $lei => script/lei)
+ my $cb = $SIG{__WARN__} // \&CORE::warn;
+ my $o = " (output: $self->{out})";
+ local $SIG{__WARN__} = sub {
+ my @m = @_;
+ push(@m, $o) if !@m || $m[-1] !~ s/\n\z/$o\n/;
+ $cb->(@m);
+ };
eval { $l->dispatch('up', $self->{out}) };
$lei->child_error(0, $@) if $@ || $l->{failed}; # lei->fail()
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-12 22:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
2021-10-12 22:44 ` [PATCH 2/5] lei/store: use remove_doc to save some LoC Eric Wong
2021-10-12 22:44 ` [PATCH 3/5] index: optimize after all SQLite DB commits Eric Wong
2021-10-12 22:44 ` [PATCH 4/5] doc: relnotes: note some recent improvements Eric Wong
2021-10-12 22:45 ` [PATCH 5/5] lei up --all: show output for warnings Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).