user/dev discussion of public-inbox itself
 help / color / mirror / Atom feed
* [PATCH 0/3] SQLite-related things
@ 2020-08-25  3:02 Eric Wong
  2020-08-25  3:02 ` [PATCH 1/3] over: skip nodatacow on the journal Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2020-08-25  3:02 UTC (permalink / raw)
  To: meta

While I'm considering options for dealing with SQLite locking
problems <https://public-inbox.org/meta/20200825001204.GA840@dcvr/>,
some obvious cleanups and improvements can be made, here.

We'll now support WAL if (and only if) a user chooses to enable
it; since we can't do it by default (see 3/3 for explanation).

Eric Wong (3):
  over: skip nodatacow on the journal
  msgmap: use "CREATE TABLE IF NOT EXISTS"
  over+msgmap: respect WAL journal_mode if set

 lib/PublicInbox/Msgmap.pm  | 29 ++++++++++++++---------------
 lib/PublicInbox/Over.pm    | 26 ++++++++++++++++++++++----
 lib/PublicInbox/OverIdx.pm |  5 -----
 t/over.t                   | 10 ++++++++++
 4 files changed, 46 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] over: skip nodatacow on the journal
  2020-08-25  3:02 [PATCH 0/3] SQLite-related things Eric Wong
@ 2020-08-25  3:02 ` Eric Wong
  2020-08-25  3:02 ` [PATCH 2/3] msgmap: use "CREATE TABLE IF NOT EXISTS" Eric Wong
  2020-08-25  3:02 ` [PATCH 3/3] over+msgmap: respect WAL journal_mode if set Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-08-25  3:02 UTC (permalink / raw)
  To: meta

This file gets truncated anyhow, so it won't fragment.
---
 lib/PublicInbox/Over.pm | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index fba58d17..a2f04117 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -21,9 +21,6 @@ sub dbh_new {
 			require PublicInbox::Spawn;
 			open my $fh, '+>>', $f or die "failed to open $f: $!";
 			PublicInbox::Spawn::nodatacow_fd(fileno($fh));
-			my $j = "$f-journal";
-			open $fh, '+>>', $j or die "failed to open $j: $!";
-			PublicInbox::Spawn::nodatacow_fd(fileno($fh));
 		} else {
 			$self->{filename} = $f; # die on stat() below:
 		}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] msgmap: use "CREATE TABLE IF NOT EXISTS"
  2020-08-25  3:02 [PATCH 0/3] SQLite-related things Eric Wong
  2020-08-25  3:02 ` [PATCH 1/3] over: skip nodatacow on the journal Eric Wong
@ 2020-08-25  3:02 ` Eric Wong
  2020-08-25  3:02 ` [PATCH 3/3] over+msgmap: respect WAL journal_mode if set Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-08-25  3:02 UTC (permalink / raw)
  To: meta

It's fewer queries and matches what we do in OverIdx.
---
 lib/PublicInbox/Msgmap.pm | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index 7290959d..5b4cebc1 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -175,18 +175,20 @@ sub num_delete {
 
 sub create_tables {
 	my ($dbh) = @_;
-	my $e;
-
-	$e = eval { $dbh->selectrow_array('EXPLAIN SELECT * FROM msgmap;') };
-	defined $e or $dbh->do('CREATE TABLE msgmap (' .
-			'num INTEGER PRIMARY KEY AUTOINCREMENT, '.
-			'mid VARCHAR(1000) NOT NULL, ' .
-			'UNIQUE (mid) )');
-
-	$e = eval { $dbh->selectrow_array('EXPLAIN SELECT * FROM meta') };
-	defined $e or $dbh->do('CREATE TABLE meta (' .
-			'key VARCHAR(32) PRIMARY KEY, '.
-			'val VARCHAR(255) NOT NULL)');
+
+	$dbh->do(<<'');
+CREATE TABLE IF NOT EXISTS msgmap (
+	num INTEGER PRIMARY KEY AUTOINCREMENT,
+	mid VARCHAR(1000) NOT NULL,
+	UNIQUE (mid)
+)
+
+	$dbh->do(<<'');
+CREATE TABLE IF NOT EXISTS meta (
+	key VARCHAR(32) PRIMARY KEY,
+	val VARCHAR(255) NOT NULL
+)
+
 }
 
 # used by NNTP.pm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3] over+msgmap: respect WAL journal_mode if set
  2020-08-25  3:02 [PATCH 0/3] SQLite-related things Eric Wong
  2020-08-25  3:02 ` [PATCH 1/3] over: skip nodatacow on the journal Eric Wong
  2020-08-25  3:02 ` [PATCH 2/3] msgmap: use "CREATE TABLE IF NOT EXISTS" Eric Wong
@ 2020-08-25  3:02 ` Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-08-25  3:02 UTC (permalink / raw)
  To: meta

WAL actually seems to have ideal locking characteristics given
concurrency problems I'm experiencing with --reindex running
in parallel with expensive read-only SQLite queries:
<https://public-inbox.org/meta/20200825001204.GA840@dcvr/>

Unfortunately, we cannot blindly use WAL while preserving
compatibility with existing setups nor our guarantees that
read-only daemons are indeed "read-only".

However, respect an user's the choice to set WAL on their
own if they're comfortable with giving -nntpd/-httpd/-imapd
processes write permission to the directory storing SQLite DBs.
---
 lib/PublicInbox/Msgmap.pm  |  3 ---
 lib/PublicInbox/Over.pm    | 23 ++++++++++++++++++++++-
 lib/PublicInbox/OverIdx.pm |  5 -----
 t/over.t                   | 10 ++++++++++
 4 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index 5b4cebc1..d28e96c8 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -31,9 +31,6 @@ sub new_file {
 	my $self = bless { filename => $f }, $class;
 	my $dbh = $self->{dbh} = PublicInbox::Over::dbh_new($self, $rw);
 	if ($rw) {
-		# TRUNCATE reduces I/O compared to the default (DELETE)
-		$dbh->do('PRAGMA journal_mode = TRUNCATE');
-
 		$dbh->begin_work;
 		create_tables($dbh);
 		$self->created_at(time) unless $self->created_at;
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index a2f04117..3e74b7a6 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -42,7 +42,28 @@ sub dbh_new {
 		$st = pack('dd', $st[0], $st[1]);
 	} while ($st ne $self->{st} && $tries++ < 3);
 	warn "W: $f: .st_dev, .st_ino unstable\n" if $st ne $self->{st};
-	$dbh->do('PRAGMA synchronous = OFF') if ($rw // 0) > 1;
+
+	if ($rw) {
+		# TRUNCATE reduces I/O compared to the default (DELETE).
+		#
+		# Do not use WAL by default since we expect the case
+		# where any users may read via read-only daemons
+		# (-httpd/-imapd/-nntpd); but only a single user has
+		# write permissions for -watch/-mda.
+		#
+		# Read-only WAL support in SQLite 3.22.0 (2018-01-22)
+		# doesn't do what we need: it is only intended for
+		# immutable read-only media (e.g. CD-ROM) and not
+		# usable for our use case described above.
+		#
+		# If an admin is willing to give read-only daemons R/W
+		# permissions; they can enable WAL manually and we will
+		# respect that by not clobbering it.
+		my $jm = $dbh->selectrow_array('PRAGMA journal_mode');
+		$dbh->do('PRAGMA journal_mode = TRUNCATE') if $jm ne 'wal';
+
+		$dbh->do('PRAGMA synchronous = OFF') if $rw > 1;
+	}
 	$dbh;
 }
 
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index d42d6fe7..9f4a56fb 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -23,11 +23,6 @@ sub dbh_new {
 	my ($self) = @_;
 	my $dbh = $self->SUPER::dbh_new($self->{-no_fsync} ? 2 : 1);
 
-	# TRUNCATE reduces I/O compared to the default (DELETE)
-	# We do not use WAL since we're optimized for read-only ops,
-	# (and read-only requires SQLite 3.22.0 (2018-01-22)).
-	$dbh->do('PRAGMA journal_mode = TRUNCATE');
-
 	# 80000 pages (80MiB on SQLite <3.12.0, 320MiB on 3.12.0+)
 	# was found to be good in 2018 during the large LKML import
 	# at the time.  This ought to be configurable based on HW
diff --git a/t/over.t b/t/over.t
index 07672aa7..8bf64ecb 100644
--- a/t/over.t
+++ b/t/over.t
@@ -65,4 +65,14 @@ isnt($over->max, 0, 'max is non-zero');
 
 $over->rollback_lazy;
 
+# L<perldata/"Version Strings">
+my $v = eval 'v'.$over->{dbh}->{sqlite_version};
+SKIP: {
+	skip("no WAL in SQLite version $v < 3.7.0", 1) if $v lt v3.7.0;
+	$over->{dbh}->do('PRAGMA journal_mode = WAL');
+	$over = PublicInbox::OverIdx->new("$tmpdir/over.sqlite3");
+	is($over->connect->selectrow_array('PRAGMA journal_mode'), 'wal',
+		'WAL journal_mode not clobbered if manually set');
+}
+
 done_testing();

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-25  3:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25  3:02 [PATCH 0/3] SQLite-related things Eric Wong
2020-08-25  3:02 ` [PATCH 1/3] over: skip nodatacow on the journal Eric Wong
2020-08-25  3:02 ` [PATCH 2/3] msgmap: use "CREATE TABLE IF NOT EXISTS" Eric Wong
2020-08-25  3:02 ` [PATCH 3/3] over+msgmap: respect WAL journal_mode if set Eric Wong

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \
		meta@public-inbox.org
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git