user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 12/26] xapcmd: xcpdb supports compaction
Date: Thu, 23 May 2019 09:36:50 +0000	[thread overview]
Message-ID: <20190523093704.18367-13-e@80x24.org> (raw)
In-Reply-To: <20190523093704.18367-1-e@80x24.org>

To minimize the delay on active inboxes, it's actually ideal to
run xapian-compact at the end of the per-partition cpdb process;
since the new DB isn't accessible yet and so we don't have to
deal with lock contention with -mda or -watch processes.  The
downside is temporary file overhead (3x instead of 2x) required.
---
 lib/PublicInbox/Xapcmd.pm | 34 ++++++++++++++++++++++++++++++++--
 script/public-inbox-xcpdb |  8 ++++++--
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Xapcmd.pm b/lib/PublicInbox/Xapcmd.pm
index ca74ea0..d2de874 100644
--- a/lib/PublicInbox/Xapcmd.pm
+++ b/lib/PublicInbox/Xapcmd.pm
@@ -8,6 +8,10 @@ use PublicInbox::Over;
 use File::Temp qw(tempdir);
 use File::Path qw(remove_tree);
 
+# support testing with dev versions of Xapian which installs
+# commands with a version number suffix (e.g. "xapian-compact-1.5")
+our $XAPIAN_COMPACT = $ENV{XAPIAN_COMPACT} || 'xapian-compact';
+
 sub commit_changes ($$$) {
 	my ($im, $old, $new) = @_;
 	my @st = stat($old) or die "failed to stat($old): $!\n";
@@ -38,17 +42,23 @@ sub xspawn {
 	}
 }
 
+sub runnable_or_die ($) {
+	my ($exe) = @_;
+	which($exe) or die "$exe not found in PATH\n";
+}
+
 sub run {
 	my ($ibx, $cmd, $env, $opt) = @_;
 	$opt ||= {};
 	my $dir = $ibx->{mainrepo} or die "no mainrepo in inbox\n";
 	my $exe = $cmd->[0];
 	my $pfx = $exe;
+	runnable_or_die($XAPIAN_COMPACT) if $opt->{compact};
 	if (ref($exe) eq 'CODE') {
 		$pfx = 'CODE';
 		require Search::Xapian::WritableDatabase;
 	} else {
-		which($exe) or die "$exe not found in PATH\n";
+		runnable_or_die($exe);
 	}
 	$ibx->umask_prepare;
 	my $old = $ibx->search->xdir(1);
@@ -107,11 +117,12 @@ sub cpdb {
 	my ($args, $env, $opt) = @_;
 	my ($old, $new) = @$args;
 	my $src = Search::Xapian::Database->new($old);
+	my $tmp = $opt->{compact} ? "$new.compact" : $new;
 
 	# like copydatabase(1), be sure we don't overwrite anything in case
 	# of other bugs:
 	my $creat = Search::Xapian::DB_CREATE();
-	my $dst = Search::Xapian::WritableDatabase->new($new, $creat);
+	my $dst = Search::Xapian::WritableDatabase->new($tmp, $creat);
 	my ($it, $end);
 
 	do {
@@ -140,6 +151,25 @@ sub cpdb {
 			# (and public-inbox does not use those features)
 		};
 	} while (cpdb_retryable($src, $@));
+
+	return unless $opt->{compact};
+
+	$src = $dst = undef; # flushes and closes
+
+	# this is probably the best place to do xapian-compact
+	# since $dst isn't readable by HTTP or NNTP clients, yet:
+	my $cmd = [ $XAPIAN_COMPACT, '--no-renumber', $tmp, $new ];
+	my $rdr = {};
+	foreach my $fd (0..2) {
+		defined(my $dst = $opt->{$fd}) or next;
+		$rdr->{$fd} = $dst;
+	}
+	my $pid = spawn($cmd, $env, $rdr);
+	my $r = waitpid($pid, 0);
+	if ($? || $r != $pid) {
+		die join(' ', @$cmd)." failed: $? (pid=$pid, reaped=$r)\n";
+	}
+	remove_tree($tmp) or die "failed to remove $tmp: $!\n";
 }
 
 1;
diff --git a/script/public-inbox-xcpdb b/script/public-inbox-xcpdb
index d494991..78d37da 100755
--- a/script/public-inbox-xcpdb
+++ b/script/public-inbox-xcpdb
@@ -2,17 +2,21 @@
 # Copyright (C) 2019 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 # xcpdb: Xapian copy database, a wrapper around Xapian's copydatabase(1)
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 use PublicInbox::InboxWritable;
 use PublicInbox::Xapcmd;
 use PublicInbox::Admin;
 PublicInbox::Admin::require_or_die('-search');
 my $usage = "Usage: public-inbox-xcpdb INBOX_DIR\n";
+my $opt = {};
+GetOptions($opt, qw(compact)) or die "bad command-line args\n$usage";
 my @ibxs = PublicInbox::Admin::resolve_inboxes(\@ARGV) or die $usage;
+
 my $cmd = [ \&PublicInbox::Xapcmd::cpdb ];
 open my $null, '>', '/dev/null' or die "failed to open /dev/null: $!\n";
-my $rdr = { 1 => fileno($null) };
+$opt->{1} = fileno($null);
 foreach (@ibxs) {
 	my $ibx = PublicInbox::InboxWritable->new($_);
 	# we rely on --no-renumber to keep docids synched to NNTP
-	PublicInbox::Xapcmd::run($ibx, $cmd, undef, $rdr);
+	PublicInbox::Xapcmd::run($ibx, $cmd, undef, $opt);
 }
-- 
EW


  parent reply	other threads:[~2019-05-23  9:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23  9:36 [PATCH 00/26] xcpdb: ease Xapian DB format migrations Eric Wong
2019-05-23  9:36 ` [PATCH 01/26] t/convert-compact: skip on missing xapian-compact(1) Eric Wong
2019-05-23  9:36 ` [PATCH 02/26] v1writable: retire in favor of InboxWritable Eric Wong
2019-05-23  9:36 ` [PATCH 03/26] doc: document the reason for --no-renumber Eric Wong
2019-05-23  9:36 ` [PATCH 04/26] search: reenable phrase search on non-chert Xapian Eric Wong
2019-05-23  9:36 ` [PATCH 05/26] xapcmd: new module for wrapping Xapian commands Eric Wong
2019-05-23  9:36 ` [PATCH 06/26] admin: hoist out resolve_inboxes for -compact and -index Eric Wong
2019-05-23  9:36 ` [PATCH 07/26] xapcmd: support spawn options Eric Wong
2019-05-23  9:36 ` [PATCH 08/26] xcpdb: new tool which wraps Xapian's copydatabase(1) Eric Wong
2019-05-23  9:36 ` [PATCH 09/26] xapcmd: do not cleanup on errors Eric Wong
2019-05-23  9:36 ` [PATCH 10/26] admin: move index_inbox over Eric Wong
2019-05-23  9:36 ` [PATCH 11/26] xcpdb: implement using Perl bindings Eric Wong
2019-05-23  9:36 ` Eric Wong [this message]
2019-05-23  9:36 ` [PATCH 13/26] v2writable: hoist out log_range sub for readability Eric Wong
2019-05-23  9:36 ` [PATCH 14/26] xcpdb: use fine-grained locking Eric Wong
2019-05-23  9:36 ` [PATCH 15/26] xcpdb: implement progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 16/26] xcpdb: cleanup error handling and diagnosis Eric Wong
2019-05-23  9:36 ` [PATCH 17/26] xapcmd: avoid EXDEV when finalizing changes Eric Wong
2019-05-23  9:36 ` [PATCH 18/26] doc: xcpdb: update to reflect the current state Eric Wong
2019-05-23  9:36 ` [PATCH 19/26] xapcmd: use "print STDERR" for progress reporting Eric Wong
2019-05-23  9:36 ` [PATCH 20/26] xcpdb: show re-indexing progress Eric Wong
2019-05-23  9:36 ` [PATCH 21/26] xcpdb: remove temporary directories on aborts Eric Wong
2019-05-23  9:37 ` [PATCH 22/26] compact: reuse infrastructure from xcpdb Eric Wong
2019-05-23  9:37 ` [PATCH 23/26] xcpdb|compact: support some xapian-compact switches Eric Wong
2019-05-23  9:37 ` [PATCH 24/26] xapcmd: cleanup on interrupted xcpdb "--compact" Eric Wong
2019-05-23  9:37 ` [PATCH 25/26] xcpdb|compact: support --jobs/-j flag like gmake(1) Eric Wong
2019-05-23  9:37 ` [PATCH 26/26] xapcmd: do not reset %SIG until last Xtmpdir is done Eric Wong
2019-05-23 10:37 ` [PATCH 27/26] doc: various updates to reflect current state Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190523093704.18367-13-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).