* [PATCH 00/26] xcpdb: ease Xapian DB format migrations
@ 2019-05-23 9:36 7% Eric Wong
2019-05-23 9:36 5% ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2019-05-23 9:36 UTC (permalink / raw)
To: meta
I've noticed performance problems in Xapian's old chert
backend which seem alleviated with the new glass backend;
particularly related to phrase searches.
Unfortunately, the tool distributed with Xapian for updating DB
formats, copydatabase(1), is extremely slow and blocking updates
for hours at a time to perform the migration is not acceptable.
(That's right, "copydatabase" is NOT a Postgres command!)
So, I've written "public-inbox-xcpdb" and gotten it to perform
the bulk copy operation without holding inbox.lock and have it
deal gracefully with Xapian DB modifications. xcpdb is still
slow, but I've (finally!) implemented partial reindexing to
allow it to minimize the lock time and not stall -mda or -watch
processes while it is working.
There's a bunch of cleanups along the way, too; and it should
make future changes to repartition the Xapian DB on existing v2
inboxes easier.
Eric Wong (26):
t/convert-compact: skip on missing xapian-compact(1)
v1writable: retire in favor of InboxWritable
doc: document the reason for --no-renumber
search: reenable phrase search on non-chert Xapian
xapcmd: new module for wrapping Xapian commands
admin: hoist out resolve_inboxes for -compact and -index
xapcmd: support spawn options
xcpdb: new tool which wraps Xapian's copydatabase(1)
xapcmd: do not cleanup on errors
admin: move index_inbox over
xcpdb: implement using Perl bindings
xapcmd: xcpdb supports compaction
v2writable: hoist out log_range sub for readability
xcpdb: use fine-grained locking
xcpdb: implement progress reporting
xcpdb: cleanup error handling and diagnosis
xapcmd: avoid EXDEV when finalizing changes
doc: xcpdb: update to reflect the current state
xapcmd: use "print STDERR" for progress reporting
xcpdb: show re-indexing progress
xcpdb: remove temporary directories on aborts
compact: reuse infrastructure from xcpdb
xcpdb|compact: support some xapian-compact switches
xapcmd: cleanup on interrupted xcpdb "--compact"
xcpdb|compact: support --jobs/-j flag like gmake(1)
xapcmd: do not reset %SIG until last Xtmpdir is done
Documentation/include.mk | 6 +-
Documentation/public-inbox-v1-format.pod | 4 +
Documentation/public-inbox-v2-format.pod | 4 +
Documentation/public-inbox-xcpdb.pod | 57 ++++
MANIFEST | 4 +-
lib/PublicInbox/Admin.pm | 66 ++++
lib/PublicInbox/InboxWritable.pm | 35 ++-
lib/PublicInbox/Search.pm | 48 +--
lib/PublicInbox/SearchIdx.pm | 34 ++-
lib/PublicInbox/V1Writable.pm | 34 ---
lib/PublicInbox/V2Writable.pm | 109 ++++---
lib/PublicInbox/Xapcmd.pm | 370 +++++++++++++++++++++++
script/public-inbox-compact | 102 +------
script/public-inbox-index | 102 +------
script/public-inbox-init | 13 +-
script/public-inbox-xcpdb | 19 ++
t/cgi.t | 4 +-
t/convert-compact.t | 4 +
t/indexlevels-mirror.t | 27 +-
t/init.t | 4 +-
t/nntpd.t | 15 +-
t/search.t | 1 +
t/v2mirror.t | 1 +
23 files changed, 740 insertions(+), 323 deletions(-)
create mode 100644 Documentation/public-inbox-xcpdb.pod
delete mode 100644 lib/PublicInbox/V1Writable.pm
create mode 100644 lib/PublicInbox/Xapcmd.pm
create mode 100755 script/public-inbox-xcpdb
--
EW
^ permalink raw reply [relevance 7%]
* [PATCH 02/26] v1writable: retire in favor of InboxWritable
2019-05-23 9:36 7% [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
@ 2019-05-23 9:36 5% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2019-05-23 9:36 UTC (permalink / raw)
To: meta
In retrospect, introducing V1Writable was unnecessary and
InboxWritable->importer is in a better position to abstract
away differences between v1 and v2 writers.
So teach InboxWritable to initialize inboxes and get rid
of V1Writable.
---
MANIFEST | 1 -
lib/PublicInbox/InboxWritable.pm | 35 ++++++++++++++++++++++++--------
lib/PublicInbox/V1Writable.pm | 34 -------------------------------
lib/PublicInbox/V2Writable.pm | 6 +++---
script/public-inbox-init | 13 +++---------
t/cgi.t | 4 ++--
t/indexlevels-mirror.t | 5 ++---
t/init.t | 4 ++--
t/nntpd.t | 15 +++-----------
t/v2mirror.t | 1 +
10 files changed, 43 insertions(+), 75 deletions(-)
delete mode 100644 lib/PublicInbox/V1Writable.pm
diff --git a/MANIFEST b/MANIFEST
index 2c356c6..2b101fa 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -126,7 +126,6 @@ lib/PublicInbox/SpawnPP.pm
lib/PublicInbox/Syscall.pm
lib/PublicInbox/Unsubscribe.pm
lib/PublicInbox/UserContent.pm
-lib/PublicInbox/V1Writable.pm
lib/PublicInbox/V2Writable.pm
lib/PublicInbox/View.pm
lib/PublicInbox/ViewDiff.pm
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 2f1ca6f..116f423 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -19,25 +19,44 @@ use constant {
};
sub new {
- my ($class, $ibx) = @_;
- bless $ibx, $class;
+ my ($class, $ibx, $creat_opt) = @_;
+ my $self = bless $ibx, $class;
+
+ # TODO: maybe stop supporting this
+ if ($creat_opt) { # for { nproc => $N }
+ $self->{-creat_opt} = $creat_opt;
+ init_inbox($self) if ($self->{version} || 1) == 1;
+ }
+ $self;
+}
+
+sub init_inbox {
+ my ($self, $partitions, $skip_epoch, $skip_artnum) = @_;
+ # TODO: honor skip_artnum
+ my $v = $self->{version} || 1;
+ if ($v == 1) {
+ my $dir = $self->{mainrepo} or die "no mainrepo in inbox\n";
+ PublicInbox::Import::init_bare($dir);
+ } else {
+ my $v2w = importer($self);
+ $v2w->init_inbox($partitions, $skip_epoch, $skip_artnum);
+ }
}
sub importer {
my ($self, $parallel) = @_;
- $self->{-importer} ||= eval {
+ $self->{-importer} ||= do {
my $v = $self->{version} || 1;
if ($v == 2) {
eval { require PublicInbox::V2Writable };
die "v2 not supported: $@\n" if $@;
- my $v2w = PublicInbox::V2Writable->new($self);
+ my $opt = $self->{-creat_opt};
+ my $v2w = PublicInbox::V2Writable->new($self, $opt);
$v2w->{parallel} = $parallel;
$v2w;
} elsif ($v == 1) {
- my $git = $self->git;
- my $name = $self->{name};
- my $addr = $self->{-primary_address};
- PublicInbox::Import->new($git, $name, $addr, $self);
+ my @arg = (undef, undef, undef, $self);
+ PublicInbox::Import->new(@arg);
} else {
$! = 78; # EX_CONFIG 5.3.5 local configuration error
die "unsupported inbox version: $v\n";
diff --git a/lib/PublicInbox/V1Writable.pm b/lib/PublicInbox/V1Writable.pm
deleted file mode 100644
index 6ca5db4..0000000
--- a/lib/PublicInbox/V1Writable.pm
+++ /dev/null
@@ -1,34 +0,0 @@
-# Copyright (C) 2019 all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-
-# This interface wraps PublicInbox::Import and makes it closer
-# to V2Writable
-# Used to write to V1 inboxes (see L<public-inbox-v1-format(5)>).
-package PublicInbox::V1Writable;
-use strict;
-use warnings;
-use base qw(PublicInbox::Import);
-use PublicInbox::InboxWritable;
-
-sub new {
- my ($class, $ibx, $creat) = @_;
- my $dir = $ibx->{mainrepo} or die "no mainrepo in inbox\n";
- unless (-d $dir) {
- if ($creat) {
- PublicInbox::Import::init_bare($dir);
- } else {
- die "$dir does not exist\n";
- }
- }
- $ibx = PublicInbox::InboxWritable->new($ibx);
- $class->SUPER::new(undef, undef, undef, $ibx);
-}
-
-sub init_inbox {
- my ($self, $partitions, $skip_epoch, $skip_artnum) = @_;
- # TODO: honor skip_artnum
- my $dir = $self->{-inbox}->{mainrepo} or die "no mainrepo in inbox\n";
- PublicInbox::Import::init_bare($dir);
-}
-
-1;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index afcac4d..c476cb3 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -94,13 +94,13 @@ sub new {
}
sub init_inbox {
- my ($self, $parallel, $skip) = @_;
+ my ($self, $parallel, $skip_epoch) = @_;
$self->{parallel} = $parallel;
$self->idx_init;
my $epoch_max = -1;
git_dir_latest($self, \$epoch_max);
- if (defined $skip && $epoch_max == -1) {
- $epoch_max = $skip;
+ if (defined $skip_epoch && $epoch_max == -1) {
+ $epoch_max = $skip_epoch;
}
$self->git_init($epoch_max >= 0 ? $epoch_max : 0);
$self->done;
diff --git a/script/public-inbox-init b/script/public-inbox-init
index 2cc704c..5724c52 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -10,7 +10,7 @@ use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/;
use PublicInbox::Admin;
PublicInbox::Admin::require_or_die('-base');
require PublicInbox::Config;
-require PublicInbox::Inbox;
+require PublicInbox::InboxWritable;
use File::Temp qw/tempfile/;
use File::Basename qw/dirname/;
use File::Path qw/mkpath/;
@@ -116,15 +116,8 @@ my $ibx = PublicInbox::Inbox->new({
indexlevel => $indexlevel,
});
-if ($version >= 2) {
- require PublicInbox::V2Writable;
- PublicInbox::V2Writable->new($ibx, 1)->init_inbox(0, $skip);
-} elsif ($version == 1) {
- require PublicInbox::V1Writable;
- PublicInbox::V1Writable->new($ibx, 1)->init_inbox(0, $skip);
-} else {
- die "Unsupported -V/--version: $version\n";
-}
+my $creat_opt = {};
+PublicInbox::InboxWritable->new($ibx, $creat_opt)->init_inbox(0, $skip);
# needed for git prior to v2.1.0
umask(0077) if defined $perm;
diff --git a/t/cgi.t b/t/cgi.t
index d3172bf..81130df 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -41,11 +41,11 @@ my $cfgpfx = "publicinbox.test";
use_ok 'PublicInbox::Git';
use_ok 'PublicInbox::Import';
use_ok 'PublicInbox::Inbox';
-use_ok 'PublicInbox::V1Writable';
+use_ok 'PublicInbox::InboxWritable';
use_ok 'PublicInbox::Config';
my $cfg = PublicInbox::Config->new($pi_config);
my $ibx = $cfg->lookup_name('test');
-my $im = PublicInbox::V1Writable->new($ibx);
+my $im = PublicInbox::InboxWritable->new($ibx)->importer;
{
local $ENV{HOME} = $home;
diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t
index 3dd4323..d124c75 100644
--- a/t/indexlevels-mirror.t
+++ b/t/indexlevels-mirror.t
@@ -5,6 +5,7 @@ use warnings;
use Test::More;
use PublicInbox::MIME;
use PublicInbox::Inbox;
+use PublicInbox::InboxWritable;
use File::Temp qw/tempdir/;
require './t/common.perl';
require_git(2.6);
@@ -38,9 +39,7 @@ sub import_index_incremental {
-primary_address => 'test@example.com',
indexlevel => $level,
});
- my $cls = "PublicInbox::V${v}Writable";
- use_ok $cls;
- my $im = $cls->new($ibx, {nproc=>1});
+ my $im = PublicInbox::InboxWritable->new($ibx, {nproc=>1})->importer;
$mime->header_set('Message-ID', '<m@1>');
ok($im->add($mime), 'first message added');
$im->done;
diff --git a/t/init.t b/t/init.t
index 86b4eb5..79dcad1 100644
--- a/t/init.t
+++ b/t/init.t
@@ -88,7 +88,7 @@ SKIP: {
qw(http://example.com/skip1 skip1@example.com));
is(system(@cmd), 0, "--skip 1");
my $gits = [ glob("$tmpdir/skip1/git/*.git") ];
- is_deeply(["$tmpdir/skip1/git/1.git"], $gits, 'skip OK');
+ is_deeply($gits, ["$tmpdir/skip1/git/1.git"], 'skip OK');
}
@@ -96,7 +96,7 @@ SKIP: {
qw(http://example.com/skip2 skip2@example.com));
is(system(@cmd), 0, "--skip 2");
my $gits = [ glob("$tmpdir/skip2/git/*.git") ];
- is_deeply(["$tmpdir/skip2/git/2.git"], $gits, 'skipping 2 works, too');
+ is_deeply($gits, ["$tmpdir/skip2/git/2.git"], 'skipping 2 works, too');
}
done_testing();
diff --git a/t/nntpd.t b/t/nntpd.t
index c7ea319..aa62ff6 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -9,6 +9,7 @@ foreach my $mod (qw(DBD::SQLite)) {
}
require PublicInbox::SearchIdx;
require PublicInbox::Msgmap;
+require PublicInbox::InboxWritable;
use Email::Simple;
use IO::Socket;
use Socket qw(IPPROTO_TCP TCP_NODELAY);
@@ -30,9 +31,6 @@ my $group = 'test-nntpd';
my $addr = $group . '@example.com';
my $nntpd = 'blib/script/public-inbox-nntpd';
my $init = 'blib/script/public-inbox-init';
-use_ok 'PublicInbox::Import';
-use_ok 'PublicInbox::Inbox';
-use_ok 'PublicInbox::Git';
SKIP: {
skip "git 2.6+ required for V2Writable", 1 if $version == 1;
use_ok 'PublicInbox::V2Writable';
@@ -68,15 +66,8 @@ $ibx = PublicInbox::Inbox->new($ibx);
0, 'enabled newsgroup');
my $len;
- my $im;
- if ($version == 2) {
- $im = PublicInbox::V2Writable->new($ibx);
- } elsif ($version == 1) {
- use_ok 'PublicInbox::V1Writable';
- $im = PublicInbox::V1Writable->new($ibx);
- } else {
- die "unsupported version: $version";
- }
+ $ibx = PublicInbox::InboxWritable->new($ibx);
+ my $im = $ibx->importer;
# ensure successful message delivery
{
diff --git a/t/v2mirror.t b/t/v2mirror.t
index 441e36d..fe05ec4 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -17,6 +17,7 @@ use File::Temp qw/tempdir/;
use IO::Socket;
use POSIX qw(dup2);
use_ok 'PublicInbox::V2Writable';
+use PublicInbox::InboxWritable;
use PublicInbox::MIME;
use PublicInbox::Config;
# FIXME: too much setup
--
EW
^ permalink raw reply related [relevance 5%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-05-23 9:36 7% [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23 9:36 5% ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).