* [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1)
2019-05-23 9:36 7% [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
@ 2019-05-23 9:36 6% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2019-05-23 9:36 UTC (permalink / raw)
To: meta
copydatabase(1) is an existing Xapian tool which is the
recommended way to upgrade existing DBs to the latest Xapian
database format (currently "glass" for stable/released
versions). Our use of Xapian relies on preserving document IDs,
so we'll wrap it like we do xapian-compact(1) and use the
"--no-renumber" switch.
I could not name the tool "public-inbox-copydatabase" since it
would be ambiguous as to which DB it's actually copying. So, I
abbreviated the suffix to "xcpdb" (Xapian CoPy DataBase), which
I hope is acceptable and unambiguous.
---
Documentation/include.mk | 6 ++--
Documentation/public-inbox-xcpdb.pod | 51 ++++++++++++++++++++++++++++
MANIFEST | 2 ++
script/public-inbox-xcpdb | 18 ++++++++++
t/indexlevels-mirror.t | 22 ++++++++++++
5 files changed, 97 insertions(+), 2 deletions(-)
create mode 100644 Documentation/public-inbox-xcpdb.pod
create mode 100755 script/public-inbox-xcpdb
diff --git a/Documentation/include.mk b/Documentation/include.mk
index 6415338..27d6ea6 100644
--- a/Documentation/include.mk
+++ b/Documentation/include.mk
@@ -26,11 +26,13 @@ podtext = $(PODTEXT) $(PODTEXT_OPTS)
# MakeMaker only seems to support manpage sections 1 and 3...
m1 =
-m1 += public-inbox-mda
+m1 += public-inbox-compact
m1 += public-inbox-httpd
+m1 += public-inbox-index
+m1 += public-inbox-mda
m1 += public-inbox-nntpd
m1 += public-inbox-watch
-m1 += public-inbox-index
+m1 += public-inbox-xcpdb
m5 =
m5 += public-inbox-config
m5 += public-inbox-v1-format
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
new file mode 100644
index 0000000..4ff5186
--- /dev/null
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -0,0 +1,51 @@
+=head1 NAME
+
+public-inbox-xcpdb - copy Xapian DBs (for format upgrades)
+
+=head1 SYNOPSIS
+
+ public-inbox-xcpdb INBOX_DIR
+
+=head1 DESCRIPTION
+
+public-inbox-xcpdb is a wrapper for L<copydatabase(1)> for
+upgrading to the latest database format supported by Xapian
+(e.g. "glass" or "honey").
+
+It locks the inbox and prevents other processes such as
+L<public-inbox-watch(1)> and L<public-inbox-mda(1)> from
+writing while it operates.
+
+This is intended for upgrading the database format used by
+Xapian. It DOES NOT upgrade the schema used by the
+public-inbox search interface (see L<public-inbox-index(1)>).
+
+=head1 ENVIRONMENT
+
+=over 8
+
+=item PI_CONFIG
+
+The default config file, normally "~/.public-inbox/config".
+See L<public-inbox-config(5)>
+
+=back
+
+=head1 UPGRADING
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/>
+and L<http://hjrcffqmbrq6wope.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright 2019 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<copydatabase(1)>, L<public-inbox-index(1)>
diff --git a/MANIFEST b/MANIFEST
index dfc1f66..efd5658 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -20,6 +20,7 @@ Documentation/public-inbox-overview.pod
Documentation/public-inbox-v1-format.pod
Documentation/public-inbox-v2-format.pod
Documentation/public-inbox-watch.pod
+Documentation/public-inbox-xcpdb.pod
Documentation/standards.perl
Documentation/txt2pre
HACKING
@@ -154,6 +155,7 @@ script/public-inbox-mda
script/public-inbox-nntpd
script/public-inbox-purge
script/public-inbox-watch
+script/public-inbox-xcpdb
script/public-inbox.cgi
scripts/dc-dlvr
scripts/dc-dlvr.pre
diff --git a/script/public-inbox-xcpdb b/script/public-inbox-xcpdb
new file mode 100755
index 0000000..cbf9f55
--- /dev/null
+++ b/script/public-inbox-xcpdb
@@ -0,0 +1,18 @@
+#!/usr/bin/perl -w
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# xcpdb: Xapian copy database, a wrapper around Xapian's copydatabase(1)
+use PublicInbox::InboxWritable;
+use PublicInbox::Xapcmd;
+use PublicInbox::Admin;
+PublicInbox::Admin::require_or_die('-search');
+my $usage = "Usage: public-inbox-xcpdb INBOX_DIR\n";
+my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV) or die $usage;
+my $cmd = [qw(copydatabase --no-renumber)];
+open my $null, '>', '/dev/null' or die "failed to open /dev/null: $!\n";
+my $rdr = { 1 => fileno($null) };
+foreach (@ibxs) {
+ my $ibx = PublicInbox::InboxWritable->new($_);
+ # we rely on --no-renumber to keep docids synched to NNTP
+ PublicInbox::Xapcmd::run($ibx, $cmd, undef, $rdr);
+}
diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t
index d124c75..61053b6 100644
--- a/t/indexlevels-mirror.t
+++ b/t/indexlevels-mirror.t
@@ -18,6 +18,7 @@ foreach my $mod (qw(DBD::SQLite)) {
my $path = 'blib/script';
my $index = "$path/public-inbox-index";
+my $xcpdb = "$path/public-inbox-xcpdb";
my $mime = PublicInbox::MIME->create(
header => [
@@ -108,6 +109,13 @@ sub import_index_incremental {
ok($im->remove($mime), '2nd message removed');
$im->done;
+ if ($level ne 'basic') {
+ is(system($xcpdb, $mirror), 0, "v$v xcpdb OK");
+ delete $ro_mirror->{$_} for (qw(over search));
+ ($nr, $msgs) = $ro_mirror->search->query('m:m@2');
+ is($nr, 1, "v$v found m\@2 via Xapian on $level");
+ }
+
# sync the mirror
is(system('git', "--git-dir=$fetch_dir", qw(fetch -q)), 0, 'fetch OK');
is(system($index, $mirror), 0, "v$v index mirror again OK");
@@ -120,6 +128,10 @@ sub import_index_incremental {
is_deeply([glob("$ibx->{mainrepo}/xap*/?/")], [],
'no Xapian partition directories for v2 basic');
}
+ if ($level ne 'basic') {
+ ($nr, $msgs) = $ro_mirror->search->reopen->query('m:m@2');
+ is($nr, 0, "v$v m\@2 gone from Xapian in mirror on $level");
+ }
}
# we can probably cull some other tests and put full/medium tests, here
@@ -131,4 +143,14 @@ for my $level (qw(basic)) {
}
}
+SKIP: {
+ require PublicInbox::Search;
+ PublicInbox::Search::load_xapian() or skip 'Search::Xapian missing', 2;
+ for my $v (1..2) {
+ subtest("v$v indexlevel=medium" => sub {
+ import_index_incremental($v, 'medium');
+ })
+ }
+}
+
done_testing();
--
EW
^ permalink raw reply related [relevance 6%]
* [PATCH 00/26] xcpdb: ease Xapian DB format migrations
@ 2019-05-23 9:36 7% Eric Wong
2019-05-23 9:36 6% ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2019-05-23 9:36 UTC (permalink / raw)
To: meta
I've noticed performance problems in Xapian's old chert
backend which seem alleviated with the new glass backend;
particularly related to phrase searches.
Unfortunately, the tool distributed with Xapian for updating DB
formats, copydatabase(1), is extremely slow and blocking updates
for hours at a time to perform the migration is not acceptable.
(That's right, "copydatabase" is NOT a Postgres command!)
So, I've written "public-inbox-xcpdb" and gotten it to perform
the bulk copy operation without holding inbox.lock and have it
deal gracefully with Xapian DB modifications. xcpdb is still
slow, but I've (finally!) implemented partial reindexing to
allow it to minimize the lock time and not stall -mda or -watch
processes while it is working.
There's a bunch of cleanups along the way, too; and it should
make future changes to repartition the Xapian DB on existing v2
inboxes easier.
Eric Wong (26):
t/convert-compact: skip on missing xapian-compact(1)
v1writable: retire in favor of InboxWritable
doc: document the reason for --no-renumber
search: reenable phrase search on non-chert Xapian
xapcmd: new module for wrapping Xapian commands
admin: hoist out resolve_inboxes for -compact and -index
xapcmd: support spawn options
xcpdb: new tool which wraps Xapian's copydatabase(1)
xapcmd: do not cleanup on errors
admin: move index_inbox over
xcpdb: implement using Perl bindings
xapcmd: xcpdb supports compaction
v2writable: hoist out log_range sub for readability
xcpdb: use fine-grained locking
xcpdb: implement progress reporting
xcpdb: cleanup error handling and diagnosis
xapcmd: avoid EXDEV when finalizing changes
doc: xcpdb: update to reflect the current state
xapcmd: use "print STDERR" for progress reporting
xcpdb: show re-indexing progress
xcpdb: remove temporary directories on aborts
compact: reuse infrastructure from xcpdb
xcpdb|compact: support some xapian-compact switches
xapcmd: cleanup on interrupted xcpdb "--compact"
xcpdb|compact: support --jobs/-j flag like gmake(1)
xapcmd: do not reset %SIG until last Xtmpdir is done
Documentation/include.mk | 6 +-
Documentation/public-inbox-v1-format.pod | 4 +
Documentation/public-inbox-v2-format.pod | 4 +
Documentation/public-inbox-xcpdb.pod | 57 ++++
MANIFEST | 4 +-
lib/PublicInbox/Admin.pm | 66 ++++
lib/PublicInbox/InboxWritable.pm | 35 ++-
lib/PublicInbox/Search.pm | 48 +--
lib/PublicInbox/SearchIdx.pm | 34 ++-
lib/PublicInbox/V1Writable.pm | 34 ---
lib/PublicInbox/V2Writable.pm | 109 ++++---
lib/PublicInbox/Xapcmd.pm | 370 +++++++++++++++++++++++
script/public-inbox-compact | 102 +------
script/public-inbox-index | 102 +------
script/public-inbox-init | 13 +-
script/public-inbox-xcpdb | 19 ++
t/cgi.t | 4 +-
t/convert-compact.t | 4 +
t/indexlevels-mirror.t | 27 +-
t/init.t | 4 +-
t/nntpd.t | 15 +-
t/search.t | 1 +
t/v2mirror.t | 1 +
23 files changed, 740 insertions(+), 323 deletions(-)
create mode 100644 Documentation/public-inbox-xcpdb.pod
delete mode 100644 lib/PublicInbox/V1Writable.pm
create mode 100644 lib/PublicInbox/Xapcmd.pm
create mode 100755 script/public-inbox-xcpdb
--
EW
^ permalink raw reply [relevance 7%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-05-23 9:36 7% [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23 9:36 6% ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).