user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/3] avoid msgmap reopens in long-lived processes
@ 2020-07-14  2:14 Eric Wong
  2020-07-14  2:14 ` [PATCH 1/3] over: unset sqlite_unicode attribute Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2020-07-14  2:14 UTC (permalink / raw)
  To: meta

As with commit 2a717d13f10fcdc69921d80cf94c47a694a175d4
("nntpd+imapd: detect replaced over.sqlite3"), this is
another step towards eliminating needless wakeups on
systems with inotify or kqueue.

To save memory, we'll also stop storing {filename} in Perl once
the SQLite DB is open, since we expect to have thousands of
inboxes soon.

Eric Wong (3):
  over: unset sqlite_unicode attribute
  nntpd+imapd: detect unlinked msgmap
  over+msgmap: do not store filename after DBI->connect

 lib/PublicInbox/Inbox.pm   | 11 +++----
 lib/PublicInbox/Msgmap.pm  | 67 ++++++++++++++++++++------------------
 lib/PublicInbox/Over.pm    | 31 +++++++++++++-----
 lib/PublicInbox/OverIdx.pm |  6 ++--
 t/nntpd.t                  |  8 +++++
 5 files changed, 74 insertions(+), 49 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] over: unset sqlite_unicode attribute
  2020-07-14  2:14 [PATCH 0/3] avoid msgmap reopens in long-lived processes Eric Wong
@ 2020-07-14  2:14 ` Eric Wong
  2020-07-14  2:14 ` [PATCH 2/3] nntpd+imapd: detect unlinked msgmap Eric Wong
  2020-07-14  2:14 ` [PATCH 3/3] over+msgmap: do not store filename after DBI->connect Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-07-14  2:14 UTC (permalink / raw)
  To: meta

None of the human-readable strings stored in over.sqlite3
require UTF-8.  Message-IDs do not, nor do the compressed
Subject IDs (sid) we use for Subject-based threading.  And the
`ddd' (doc-data-deflated) column is of course binary data.

This frees us of having to use SQL_BLOB for the `ddd', column,
and will open the door for us to use dbh_new for Msgmap, too.
---
 lib/PublicInbox/Over.pm    | 1 -
 lib/PublicInbox/OverIdx.pm | 4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index e5a980d5..5d285057 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -36,7 +36,6 @@ sub dbh_new {
 		$st = pack('dd', $st[0], $st[1]);
 	} while ($st ne $self->{st} && $tries++ < 3);
 	warn "W: $f: .st_dev, .st_ino unstable\n" if $st ne $self->{st};
-	$dbh->{sqlite_unicode} = 1;
 	$dbh;
 }
 
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 008a5d1a..13aa2d74 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -12,7 +12,7 @@ use strict;
 use warnings;
 use base qw(PublicInbox::Over);
 use IO::Handle;
-use DBI qw(:sql_types); # SQL_BLOB
+use DBI;
 use PublicInbox::MID qw/id_compress mids_for_index references/;
 use PublicInbox::Smsg qw(subject_normalized);
 use Compress::Zlib qw(compress);
@@ -309,7 +309,7 @@ VALUES (?,?,?,?,?,?)
 	my $n = 0;
 	my @v = ($num, $tid, $sid, $ts, $ds);
 	foreach (@v) { $sth->bind_param(++$n, $_) }
-	$sth->bind_param(++$n, $ddd, SQL_BLOB);
+	$sth->bind_param(++$n, $ddd);
 	$sth->execute;
 	$sth = $dbh->prepare_cached(<<'');
 INSERT INTO id2num (id, num) VALUES (?,?)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] nntpd+imapd: detect unlinked msgmap
  2020-07-14  2:14 [PATCH 0/3] avoid msgmap reopens in long-lived processes Eric Wong
  2020-07-14  2:14 ` [PATCH 1/3] over: unset sqlite_unicode attribute Eric Wong
@ 2020-07-14  2:14 ` Eric Wong
  2020-07-14  2:14 ` [PATCH 3/3] over+msgmap: do not store filename after DBI->connect Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-07-14  2:14 UTC (permalink / raw)
  To: meta

While it's even less common to experience a replaced
msgmap.sqlite3 file, BOFHs may do the darndest things.  This is
another step towards reducing the number of needless wakeups
we need to do in long-lived read-only daemons.
---
 lib/PublicInbox/Inbox.pm  |  7 ++---
 lib/PublicInbox/Msgmap.pm | 59 ++++++++++++++++++++-------------------
 t/nntpd.t                 |  8 ++++++
 3 files changed, 42 insertions(+), 32 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 02186dac..3d9754dc 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -31,7 +31,7 @@ sub cleanup_task () {
 	for my $ibx (values %$CLEANUP) {
 		my $again;
 		if ($have_devel_peek) {
-			foreach my $f (qw(mm search)) {
+			foreach my $f (qw(search)) {
 				# we bump refcnt by assigning tmp, here:
 				my $tmp = $ibx->{$f} or next;
 				next if Devel::Peek::SvREFCNT($tmp) > 2;
@@ -47,7 +47,7 @@ sub cleanup_task () {
 		}
 		check_inodes($ibx);
 		if ($have_devel_peek) {
-			$again ||= !!($ibx->{mm} || $ibx->{search});
+			$again ||= !!$ibx->{search};
 		}
 		$next->{"$ibx"} = $ibx if $again;
 	}
@@ -182,7 +182,6 @@ sub mm {
 	my ($self) = @_;
 	$self->{mm} ||= eval {
 		require PublicInbox::Msgmap;
-		_cleanup_later($self);
 		my $dir = $self->{inboxdir};
 		if ($self->version >= 2) {
 			PublicInbox::Msgmap->new_file("$dir/msgmap.sqlite3");
@@ -409,7 +408,7 @@ sub unsubscribe_unlock {
 
 sub check_inodes ($) {
 	my ($self) = @_;
-	for (qw(over)) { # TODO: search, mm
+	for (qw(over mm)) { # TODO: search
 		$self->{$_}->check_inodes if $self->{$_};
 	}
 }
diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index aa07e344..e86fb854 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -13,6 +13,7 @@ use warnings;
 use DBI;
 use DBD::SQLite;
 use File::Temp qw(tempfile);
+use PublicInbox::Over;
 
 sub new {
 	my ($class, $git_dir, $writable) = @_;
@@ -24,29 +25,13 @@ sub new {
 	new_file($class, "$d/msgmap.sqlite3", $writable);
 }
 
-sub dbh_new {
-	my ($f, $writable) = @_;
-	if ($writable && !-f $f) { # SQLite defaults mode to 0644, we want 0666
-		open my $fh, '+>>', $f or die "failed to open $f: $!";
-	}
-	my $dbh = DBI->connect("dbi:SQLite:dbname=$f",'','', {
-		AutoCommit => 1,
-		RaiseError => 1,
-		PrintError => 0,
-		ReadOnly => !$writable,
-		sqlite_use_immediate_transaction => 1,
-	});
-	$dbh;
-}
-
 sub new_file {
-	my ($class, $f, $writable) = @_;
-	return if !$writable && !-r $f;
+	my ($class, $f, $rw) = @_;
+	return if !$rw && !-r $f;
 
-	my $dbh = dbh_new($f, $writable);
-	my $self = bless { dbh => $dbh }, $class;
-
-	if ($writable) {
+	my $self = bless { filename => $f }, $class;
+	my $dbh = $self->{dbh} = PublicInbox::Over::dbh_new($self, $rw);
+	if ($rw) {
 		create_tables($dbh);
 
 		# TRUNCATE reduces I/O compared to the default (DELETE)
@@ -70,7 +55,6 @@ sub tmp_clone {
 	my $tmp = ref($self)->new_file($fn, 1);
 	$tmp->{dbh}->do('PRAGMA synchronous = OFF');
 	$tmp->{dbh}->do('PRAGMA journal_mode = MEMORY');
-	$tmp->{tmp_name} = $fn; # SQLite won't work if unlinked, apparently
 	$tmp->{pid} = $$;
 	close $fh or die "failed to close $fn: $!";
 	$tmp;
@@ -246,28 +230,28 @@ sub mid_set {
 sub DESTROY {
 	my ($self) = @_;
 	delete $self->{dbh};
-	my $f = delete $self->{tmp_name};
-	if (defined $f && $self->{pid} == $$) {
+	my $f = $self->{filename};
+	if (($self->{pid} // 0) == $$) {
 		unlink $f or warn "failed to unlink $f: $!\n";
 	}
 }
 
 sub atfork_parent {
 	my ($self) = @_;
-	my $f = $self->{tmp_name} or die "not a temporary clone\n";
+	$self->{pid} or die "not a temporary clone\n";
 	delete $self->{dbh} and die "tmp_clone dbh not prepared for parent";
-	my $dbh = $self->{dbh} = dbh_new($f, 1);
+	my $dbh = $self->{dbh} = PublicInbox::Over::dbh_new($self, 1);
 	$dbh->do('PRAGMA synchronous = OFF');
 }
 
 sub atfork_prepare {
 	my ($self) = @_;
-	my $f = $self->{tmp_name} or die "not a temporary clone\n";
+	$self->{pid} or die "not a temporary clone\n";
 	$self->{pid} == $$ or
 		die "BUG: atfork_prepare not called from $self->{pid}\n";
 	$self->{dbh} or die "temporary clone not open\n";
 	# must clobber prepared statements
-	%$self = (tmp_name => $f, pid => $$);
+	%$self = (filename => $self->{filename}, pid => $$);
 }
 
 sub skip_artnum {
@@ -296,4 +280,23 @@ sub skip_artnum {
 	}
 }
 
+sub check_inodes {
+	my ($self) = @_;
+	# no filename if in-:memory:
+	my $f = $self->{dbh}->sqlite_db_filename // return;
+	if (my @st = stat($f)) { # did st_dev, st_ino change?
+		my $st = pack('dd', $st[0], $st[1]);
+		if ($st ne ($self->{st} // $st)) {
+			my $tmp = eval { ref($self)->new_file($f) };
+			if ($@) {
+				warn "E: DBI->connect($f): $@\n";
+			} else {
+				%$self = %$tmp;
+			}
+		}
+	} else {
+		warn "W: stat $f: $!\n";
+	}
+}
+
 1;
diff --git a/t/nntpd.t b/t/nntpd.t
index 28008ec1..954e6e75 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -14,6 +14,7 @@ use Net::NNTP;
 use Sys::Hostname;
 use POSIX qw(_exit);
 use Digest::SHA;
+use_ok 'PublicInbox::Msgmap';
 
 # FIXME: make easier to test both versions
 my $version = $ENV{PI_TEST_VERSION} || 1;
@@ -341,6 +342,13 @@ Date: Fri, 02 Oct 1993 00:00:00 +0000
 			'article did not exist');
 		$im->add($ex);
 		$im->done;
+		{
+			my $f = $ibx->mm->{filename};
+			my $tmp = "$tmpdir/tmp.sqlite3";
+			$ibx->mm->{dbh}->sqlite_backup_to_file($tmp);
+			delete $ibx->{mm};
+			rename($tmp, $f) or BAIL_OUT "rename($tmp, $f): $!";
+		}
 		ok(run_script([qw(-index --reindex -c), $ibx->{inboxdir}],
 				undef, $noerr), '-compacted');
 		select(undef, undef, undef, $fast_idle ? 0.1 : 2.1);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3] over+msgmap: do not store filename after DBI->connect
  2020-07-14  2:14 [PATCH 0/3] avoid msgmap reopens in long-lived processes Eric Wong
  2020-07-14  2:14 ` [PATCH 1/3] over: unset sqlite_unicode attribute Eric Wong
  2020-07-14  2:14 ` [PATCH 2/3] nntpd+imapd: detect unlinked msgmap Eric Wong
@ 2020-07-14  2:14 ` Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-07-14  2:14 UTC (permalink / raw)
  To: meta

SQLite already knows the filename internally, so avoid having it
as a long-lived Perl SV to save some bytes when there's many
inboxes and open DBs.
---
 lib/PublicInbox/Inbox.pm   |  4 ++--
 lib/PublicInbox/Msgmap.pm  | 14 ++++++++------
 lib/PublicInbox/Over.pm    | 30 ++++++++++++++++++++++--------
 lib/PublicInbox/OverIdx.pm |  2 +-
 t/nntpd.t                  |  2 +-
 5 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 3d9754dc..267be4e3 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -204,9 +204,9 @@ sub search ($;$) {
 sub over ($) {
 	my ($self) = @_;
 	my $srch = search($self, 1) or return;
-	$self->{over} ||= eval {
+	$self->{over} //= eval {
 		my $over = $srch->{over_ro};
-		$over->dbh_new; # may fail
+		$over->connect; # may fail
 		$over;
 	}
 }
diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index e86fb854..38ec7858 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -229,9 +229,9 @@ sub mid_set {
 
 sub DESTROY {
 	my ($self) = @_;
-	delete $self->{dbh};
-	my $f = $self->{filename};
+	my $dbh = $self->{dbh} or return;
 	if (($self->{pid} // 0) == $$) {
+		my $f = $dbh->sqlite_db_filename;
 		unlink $f or warn "failed to unlink $f: $!\n";
 	}
 }
@@ -239,8 +239,9 @@ sub DESTROY {
 sub atfork_parent {
 	my ($self) = @_;
 	$self->{pid} or die "not a temporary clone\n";
-	delete $self->{dbh} and die "tmp_clone dbh not prepared for parent";
-	my $dbh = $self->{dbh} = PublicInbox::Over::dbh_new($self, 1);
+	my $dbh = $self->{dbh} and die "tmp_clone dbh not prepared for parent";
+	$self->{filename} = $dbh->sqlite_db_filename;
+	$dbh = $self->{dbh} = PublicInbox::Over::dbh_new($self, 1);
 	$dbh->do('PRAGMA synchronous = OFF');
 }
 
@@ -249,9 +250,10 @@ sub atfork_prepare {
 	$self->{pid} or die "not a temporary clone\n";
 	$self->{pid} == $$ or
 		die "BUG: atfork_prepare not called from $self->{pid}\n";
-	$self->{dbh} or die "temporary clone not open\n";
+	my $dbh = $self->{dbh} or die "temporary clone not open\n";
+
 	# must clobber prepared statements
-	%$self = (filename => $self->{filename}, pid => $$);
+	%$self = (filename => $dbh->sqlite_db_filename, pid => $$);
 }
 
 sub skip_artnum {
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 5d285057..e3f26456 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -15,9 +15,13 @@ use constant DEFAULT_LIMIT => 1000;
 
 sub dbh_new {
 	my ($self, $rw) = @_;
-	my $f = $self->{filename};
-	if ($rw && !-f $f) { # SQLite defaults mode to 0644, we want 0666
-		open my $fh, '+>>', $f or die "failed to open $f: $!";
+	my $f = delete $self->{filename};
+	if (!-f $f) { # SQLite defaults mode to 0644, we want 0666
+		if ($rw) {
+			open my $fh, '+>>', $f or die "failed to open $f: $!";
+		} else {
+			$self->{filename} = $f; # die on stat() below:
+		}
 	}
 	my (@st, $st, $dbh);
 	my $tries = 0;
@@ -44,9 +48,14 @@ sub new {
 	bless { filename => $f }, $class;
 }
 
-sub disconnect { $_[0]->{dbh} = undef }
+sub disconnect {
+	my ($self) = @_;
+	if (my $dbh = delete $self->{dbh}) {
+		$self->{filename} = $dbh->sqlite_db_filename;
+	}
+}
 
-sub connect { $_[0]->{dbh} ||= $_[0]->dbh_new }
+sub connect { $_[0]->{dbh} //= $_[0]->dbh_new }
 
 sub load_from_row ($;$) {
 	my ($smsg, $cull) = @_;
@@ -258,13 +267,18 @@ SELECT COUNT(num) FROM over WHERE num > ? AND num <= ?
 
 sub check_inodes {
 	my ($self) = @_;
-	if (my @st = stat($self->{filename})) { # did st_dev, st_ino change?
+	my $dbh = $self->{dbh} or return;
+	my $f = $dbh->sqlite_db_filename;
+	if (my @st = stat($f)) { # did st_dev, st_ino change?
 		my $st = pack('dd', $st[0], $st[1]);
 
 		# don't actually reopen, just let {dbh} be recreated later
-		delete($self->{dbh}) if ($st ne ($self->{st} // $st));
+		if ($st ne ($self->{st} // $st)) {
+			delete($self->{dbh});
+			$self->{filename} = $f;
+		}
 	} else {
-		warn "W: stat $self->{filename}: $!\n";
+		warn "W: stat $f: $!\n";
 	}
 }
 
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 13aa2d74..ea8da723 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -431,7 +431,7 @@ sub rollback_lazy {
 sub disconnect {
 	my ($self) = @_;
 	die "in transaction" if $self->{txn};
-	$self->{dbh} = undef;
+	$self->SUPER::disconnect;
 }
 
 sub create {
diff --git a/t/nntpd.t b/t/nntpd.t
index 954e6e75..aaf6661d 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -343,7 +343,7 @@ Date: Fri, 02 Oct 1993 00:00:00 +0000
 		$im->add($ex);
 		$im->done;
 		{
-			my $f = $ibx->mm->{filename};
+			my $f = $ibx->mm->{dbh}->sqlite_db_filename;
 			my $tmp = "$tmpdir/tmp.sqlite3";
 			$ibx->mm->{dbh}->sqlite_backup_to_file($tmp);
 			delete $ibx->{mm};

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-14  2:14 [PATCH 0/3] avoid msgmap reopens in long-lived processes Eric Wong
2020-07-14  2:14 ` [PATCH 1/3] over: unset sqlite_unicode attribute Eric Wong
2020-07-14  2:14 ` [PATCH 2/3] nntpd+imapd: detect unlinked msgmap Eric Wong
2020-07-14  2:14 ` [PATCH 3/3] over+msgmap: do not store filename after DBI->connect Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git