user/dev discussion of public-inbox itself
 help / color / mirror / Atom feed
From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [TESTING] WIP - parallel shards on HDD with sequential flush
Date: Wed, 12 Aug 2020 09:34:37 +0000
Message-ID: <20200812093437.9147-1-e@yhbt.net> (raw)

Frequent flushing to save RAM with HDD is horrible with random
writes Xapian tends to do; especially when parallelized

I think just making the Xapian commits in sequence while the
random reads + in-memory changes are still parallelized
is doable, though...

With this, --no-fsync may even be detrimental because Linux
defaults to excessive writeback buffering.

This could take some time to test (using ext4 on an HDD).  So
far it's showing decent CPU usage on a 4 core system and
sequential shard commits only show one process in D state at a
time and it seems to be going at a decent rate...

Waiting to see if it slows down as the Xapian DBs get bigger...
---
 lib/PublicInbox/V2Writable.pm | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index d99e476a..51985a58 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -20,6 +20,7 @@ use PublicInbox::Spawn qw(spawn popen_rd);
 use PublicInbox::SearchIdx qw(log2stack crlf_adjust is_ancestor check_size);
 use IO::Handle; # ->autoflush
 use File::Temp ();
+use Carp qw(croak);
 
 my $OID = qr/[a-f0-9]{40,}/;
 # an estimate of the post-packed size to the raw uncompressed size
@@ -114,6 +115,7 @@ sub new {
 		total_bytes => 0,
 		current_info => '',
 		xpfx => $xpfx,
+		seq_flush => 1,
 		over => PublicInbox::OverIdx->new("$xpfx/over.sqlite3"),
 		lock_path => "$dir/inbox.lock",
 		# limit each git repo (epoch) to 1GB or so
@@ -593,14 +595,20 @@ sub barrier_init {
 	my $barrier = { map { $_ => 1 } (0..$n) };
 }
 
+sub barrier_wait_1 ($) {
+	my ($r) = @_;
+	defined(my $line = readline($r)) or croak "EOF on barrier_wait: $!";
+	$line =~ /\Abarrier (\d+)/ or croak "bad line on barrier_wait: $line";
+	$1;
+}
+
 sub barrier_wait {
 	my ($self, $barrier) = @_;
 	my $bnote = $self->{bnote} or return;
 	my $r = $bnote->[0];
 	while (scalar keys %$barrier) {
-		defined(my $l = readline($r)) or die "EOF on barrier_wait: $!";
-		$l =~ /\Abarrier (\d+)/ or die "bad line on barrier_wait: $l";
-		delete $barrier->{$1} or die "bad shard[$1] on barrier wait";
+		my $i = barrier_wait_1($r);
+		delete($barrier->{$i}) or die "bad shard[$i] on barrier wait";
 	}
 }
 
@@ -609,7 +617,7 @@ sub checkpoint ($;$) {
 	my ($self, $wait) = @_;
 
 	if (my $im = $self->{im}) {
-		if ($wait) {
+		if ($wait || $self->{seq_flush}) {
 			$im->barrier;
 		} else {
 			$im->checkpoint;
@@ -626,15 +634,27 @@ sub checkpoint ($;$) {
 		$self->{over}->commit_lazy;
 
 		# Now deal with Xapian
-		if ($wait) {
+		if ($self->{seq_flush} && $self->{parallel}) {
+			my $r = $self->{bnote}->[0] or
+					die "BUG: {bnote} missing";
+			my $i = 0;
+			for (@$shards) {
+				$_->remote_barrier;
+				my $j = barrier_wait_1($r);
+				$i == $j or die <<EOF;
+BUG: bad shard[$j] on barrier wait (expected $i)
+EOF
+				$i++
+			}
+		} elsif ($wait) { # parallel flush w/ shared barrier
 			my $barrier = $self->barrier_init(scalar @$shards);
 
 			# each shard needs to issue a barrier command
 			$_->remote_barrier for @$shards;
 
 			# wait for each Xapian shard
-			$self->barrier_wait($barrier);
-		} else {
+			barrier_wait($self, $barrier);
+		} else { # sequential flush with 1 process
 			$_->remote_commit for @$shards;
 		}
 

             reply	other threads:[~2020-08-12  9:34 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12  9:34 Eric Wong [this message]
2020-08-12 20:26 ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812093437.9147-1-e@yhbt.net \
    --to=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git