From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id B4B541F670 for ; Sun, 10 Oct 2021 09:14:43 +0000 (UTC) Date: Sun, 10 Oct 2021 09:14:43 +0000 From: Eric Wong To: meta@public-inbox.org Subject: [CFT] SQLite and mmap... Message-ID: <20211010091443.M599445@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline List-Id: Anybody feel like benchmarking the below patch? I've been trying it out a bit with indexing/reindexing/gc and read-only daemons, but I haven't noticed an improvement on my old AMD CPUs[1]. If anything, it's maybe <0.5% slower with mmap for me, but probably within the margin of error on noisy machines. [1] only affected by Spectre v1/v2, but not other CPU bugs AFAIK DBI and Perl have their own overheads, so it's probably masked somewhat. I do notice huge differences between different SSDs, though... Theoretically, mmap-ing regular files is nice since it avoids data copies, mallocs (from SQLite's page cache) and syscalls. However, throughout the years (not just this project nor SQLite) I haven't seen meaningful speedups from mmap over pread in most workloads. Maybe I'm working on the wrong projects :P There's also caveats listed at: . By default, the SQLite compile-time limit is only 2GB (which doesn't make sense to me on 64-bit Linux, at least), and over.sqlite3 is ~12G for lore/all; so that could also have something to do with it. With more mmap use, there's also /proc/sys/vm/max_map_count limits which I already bump into when testing git. So I'm not exactly thrilled with the possibility of users being more likely to bump into that... Anyways, SQLite will parse the "PRAGMA mmap_size" value as a signed 64-bit int, the change below is using the largest possible value and letting it clamp to whatever compile-time limit was. --- lib/PublicInbox/LeiMailSync.pm | 1 + lib/PublicInbox/Over.pm | 1 + lib/PublicInbox/OverIdx.pm | 4 ++-- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm index 91cd1c934a1f..56cd14e6ef02 100644 --- a/lib/PublicInbox/LeiMailSync.pm +++ b/lib/PublicInbox/LeiMailSync.pm @@ -30,6 +30,7 @@ sub dbh_new { create_tables($self, $dbh) if $rw; $dbh->do('PRAGMA journal_mode = WAL') if $creat; $dbh->do('PRAGMA case_sensitive_like = ON'); + $dbh->do('PRAGMA mmap_size = 0x7'.('f' x 15)); # SQLite will clamp $dbh; } diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index 19da056a10af..bcd91067e267 100644 --- a/lib/PublicInbox/Over.pm +++ b/lib/PublicInbox/Over.pm @@ -43,6 +43,7 @@ sub dbh_new { } while ($st ne $self->{st} && $tries++ < 3); warn "W: $f: .st_dev, .st_ino unstable\n" if $st ne $self->{st}; + $dbh->do('PRAGMA mmap_size = 0x7'.('f' x 15)); # SQLite will clamp if ($rw) { # TRUNCATE reduces I/O compared to the default (DELETE). # diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm index 985abbf4e693..ce9b86616594 100644 --- a/lib/PublicInbox/OverIdx.pm +++ b/lib/PublicInbox/OverIdx.pm @@ -24,8 +24,8 @@ sub dbh_new { # 80000 pages (80MiB on SQLite <3.12.0, 320MiB on 3.12.0+) # was found to be good in 2018 during the large LKML import - # at the time. This ought to be configurable based on HW - # and inbox size; I suspect it's overkill for many inboxes. + # at the time. SQLite will only use as much as it needs, + # and maybe it's irrelevant since we use mmap nowadays. $dbh->do('PRAGMA cache_size = 80000'); create_tables($dbh);