authorEric Wong <e@yhbt.net>2020-07-28 22:21:58 +0000
committerEric Wong <e@yhbt.net>2020-07-29 11:32:57 +0000
commitc106504309621b662ce6c7cd914718f7045edca4 (patch)
treec569ad56cfd8e192c9f087faf9a5e13482dcd27f /lib/PublicInbox/Over.pm
parenta3391407c960e4bbd825a34b87d053de6ef3767a (diff)
SQLite and Xapian files are written randomly, thus they become
fragmented under btrfs with copy-on-write.  This leads to
noticeable performance problems (and probably ENOSPC) as these
files get big.

lore/git (v2, <1GB) indexes around 20% faster with this on an
ancient SSD.  lore/lkml seems to be taking forever and I'll
probably cancel it to save wear on my SSD.

Unfortunately, disabling CoW also means disabling checksumming
(and compression), so we'll be careful to only set the No_COW
attribute on regeneratable data.  We want to keep CoW (and
checksums+compression) on git storage because current ref
storage is neither checksummed nor compressed, and git streams
pack output.
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index f32743c0..0146414c 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -18,7 +18,12 @@ sub dbh_new {
         my $f = delete $self->{filename};
         if (!-f $f) { # SQLite defaults mode to 0644, we want 0666
                 if ($rw) {
+                        require PublicInbox::Spawn;
                         open my $fh, '+>>', $f or die "failed to open $f: $!";
+                        PublicInbox::Spawn::set_nodatacow(fileno($fh));
+                        my $j = "$f-journal";
+                        open $fh, '+>>', $j or die "failed to open $j: $!";
+                        PublicInbox::Spawn::set_nodatacow(fileno($fh));
                 } else {
                         $self->{filename} = $f; # die on stat() below: