user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 11/26] xcpdb: implement using Perl bindings
Date: Thu, 23 May 2019 09:36:49 +0000
Message-ID: <20190523093704.18367-12-e@80x24.org> (raw)
In-Reply-To: <20190523093704.18367-1-e@80x24.org>

By avoid copydatabase(1) entirely, we can make further changes
to avoid locking the entire inbox for a long operation and
switch to fine-grained locking.
---
 lib/PublicInbox/Xapcmd.pm | 77 +++++++++++++++++++++++++++++++++++++--
 script/public-inbox-xcpdb |  2 +-
 2 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index 81e2f10..ca74ea0 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -24,15 +24,36 @@ sub commit_changes ($$$) {
 	remove_tree("$old/old") or die "failed to remove $old/old: $!\n";
 }
 
+sub xspawn {
+	my ($cmd, $env, $opt) = @_;
+	if (ref($cmd->[0]) eq 'CODE') {
+		my $cb = shift(@$cmd); # $cb = cpdb()
+		defined(my $pid = fork) or die "fork: $!";
+		return $pid if $pid > 0;
+		eval { $cb->($cmd, $env, $opt) };
+		die $@ if $@;
+		exit 0;
+	} else {
+		spawn($cmd, $env, $opt);
+	}
+}
+
 sub run {
 	my ($ibx, $cmd, $env, $opt) = @_;
 	$opt ||= {};
 	my $dir = $ibx->{mainrepo} or die "no mainrepo in inbox\n";
-	which($cmd->[0]) or die "$cmd->[0] not found in PATH\n";
+	my $exe = $cmd->[0];
+	my $pfx = $exe;
+	if (ref($exe) eq 'CODE') {
+		$pfx = 'CODE';
+		require Search::Xapian::WritableDatabase;
+	} else {
+		which($exe) or die "$exe not found in PATH\n";
+	}
 	$ibx->umask_prepare;
 	my $old = $ibx->search->xdir(1);
 	-d $old or die "$old does not exist\n";
-	my $new = tempdir($cmd->[0].'-XXXXXXXX', DIR => $dir);
+	my $new = tempdir("$pfx-XXXXXXXX", DIR => $dir);
 	my $v = $ibx->{version} || 1;
 	my @cmds;
 	if ($v == 1) {
@@ -58,7 +79,7 @@ sub run {
 		while (@cmds) {
 			while (scalar(keys(%pids)) < $max && scalar(@cmds)) {
 				my $x = shift @cmds;
-				$pids{spawn($x, $env, $opt)} = $x;
+				$pids{xspawn($x, $env, $opt)} = $x;
 			}
 
 			while (scalar keys %pids) {
@@ -71,4 +92,54 @@ sub run {
 	});
 }
 
+sub cpdb_retryable ($$) {
+	my ($src, $err) = @_;
+	if (ref($err) eq 'Search::Xapian::DatabaseModifiedError') {
+		warn "$err, reopening and retrying\n";
+		$src->reopen;
+		return 1;
+	}
+	die $err if $err;
+	0;
+}
+
+sub cpdb {
+	my ($args, $env, $opt) = @_;
+	my ($old, $new) = @$args;
+	my $src = Search::Xapian::Database->new($old);
+
+	# like copydatabase(1), be sure we don't overwrite anything in case
+	# of other bugs:
+	my $creat = Search::Xapian::DB_CREATE();
+	my $dst = Search::Xapian::WritableDatabase->new($new, $creat);
+	my ($it, $end);
+
+	do {
+		eval {
+			# update the only metadata key for v1:
+			my $lc = $src->get_metadata('last_commit');
+			$dst->set_metadata('last_commit', $lc) if $lc;
+
+			$it = $src->postlist_begin('');
+			$end = $src->postlist_end('');
+		};
+	} while (cpdb_retryable($src, $@));
+
+	do {
+		eval {
+			while ($it != $end) {
+				my $docid = $it->get_docid;
+				my $doc = $src->get_document($docid);
+				$dst->replace_document($docid, $doc);
+				$it->inc;
+			}
+
+			# unlike copydatabase(1), we don't copy spelling
+			# and synonym data (or other user metadata) since
+			# the Perl APIs don't expose iterators for them
+			# (and public-inbox does not use those features)
+		};
+	} while (cpdb_retryable($src, $@));
+}
+
 1;
diff --git a/script/public-inbox-xcpdb b/script/public-inbox-xcpdb
index cbf9f55..d494991 100755
--- a/script/public-inbox-xcpdb
+++ b/script/public-inbox-xcpdb
@@ -8,7 +8,7 @@ use PublicInbox::Admin;
 PublicInbox::Admin::require_or_die('-search');
 my $usage = "Usage: public-inbox-xcpdb INBOX_DIR\n";
 my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV) or die $usage;
-my $cmd = [qw(copydatabase --no-renumber)];
+my $cmd = [ \&PublicInbox::Xapcmd::cpdb ];
 open my $null, '>', '/dev/null' or die "failed to open /dev/null: $!\n";
 my $rdr = { 1 => fileno($null) };
 foreach (@ibxs) {
-- 
EW


  parent reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23  9:36 [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23  9:36 ` [PATCH 01/26] t/convert-compact: skip on missing xapian-compact(1) Eric Wong
2019-05-23  9:36 ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
2019-05-23  9:36 ` [PATCH 03/26] doc: document the reason for --no-renumber Eric Wong
2019-05-23  9:36 ` [PATCH 04/26] search: reenable phrase search on non-chert Xapian Eric Wong
2019-05-23  9:36 ` [PATCH 05/26] xapcmd: new module for wrapping Xapian commands Eric Wong
2019-05-23  9:36 ` [PATCH 06/26] admin: hoist out resolve_inboxes for -compact and -index Eric Wong
2019-05-23  9:36 ` [PATCH 07/26] xapcmd: support spawn options Eric Wong
2019-05-23  9:36 ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
2019-05-23  9:36 ` [PATCH 09/26] xapcmd: do not cleanup on errors Eric Wong
2019-05-23  9:36 ` [PATCH 10/26] admin: move index_inbox over Eric Wong
2019-05-23  9:36 ` Eric Wong [this message]
2019-05-23  9:36 ` [PATCH 12/26] xapcmd: xcpdb supports compaction Eric Wong
2019-05-23  9:36 ` [PATCH 13/26] v2writable: hoist out log_range sub for readability Eric Wong
2019-05-23  9:36 ` [PATCH 14/26] xcpdb: use fine-grained locking Eric Wong
2019-05-23  9:36 ` [PATCH 15/26] xcpdb: implement progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 16/26] xcpdb: cleanup error handling and diagnosis Eric Wong
2019-05-23  9:36 ` [PATCH 17/26] xapcmd: avoid EXDEV when finalizing changes Eric Wong
2019-05-23  9:36 ` [PATCH 18/26] doc: xcpdb: update to reflect the current state Eric Wong
2019-05-23  9:36 ` [PATCH 19/26] xapcmd: use "print STDERR" for progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 20/26] xcpdb: show re-indexing progress Eric Wong
2019-05-23  9:36 ` [PATCH 21/26] xcpdb: remove temporary directories on aborts Eric Wong
2019-05-23  9:37 ` [PATCH 22/26] compact: reuse infrastructure from xcpdb Eric Wong
2019-05-23  9:37 ` [PATCH 23/26] xcpdb|compact: support some xapian-compact switches Eric Wong
2019-05-23  9:37 ` [PATCH 24/26] xapcmd: cleanup on interrupted xcpdb "--compact" Eric Wong
2019-05-23  9:37 ` [PATCH 25/26] xcpdb|compact: support --jobs/-j flag like gmake(1) Eric Wong
2019-05-23  9:37 ` [PATCH 26/26] xapcmd: do not reset %SIG until last Xtmpdir is done Eric Wong
2019-05-23 10:37 ` [PATCH 27/26] doc: various updates to reflect current state Eric Wong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190523093704.18367-12-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox