about summary refs log tree commit homepage
path: root/lib/PublicInbox/Over.pm
diff options
authorEric Wong <e@80x24.org>2021-06-03 01:05:20 +0000
committerEric Wong <e@80x24.org>2021-06-03 01:09:43 +0000
commitbdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 (patch)
tree33616d6248bf6b8d2a78d2a609f5ef8389b36b47 /lib/PublicInbox/Over.pm
parent6ff03ba2be9247f1ead26c2524fadc789de558f1 (diff)
On a 4-core CPU, this speeds up "lei import" on a largish IMAP
inbox with 75K messages from ~21 minutes down to 40s.

Parallelizing with the new LeiImportKw WQ worker class gives a
near-linear speedup and brought the runtime down to ~5:40.

The new idx_fid_uid index on the "fid" and "uid" columns of
blob2num in mail_sync.sqlite3 brought us the final speedup.

An additional index on over.sqlite3#xref3(oidbin) did not help,
since idx_nntp already exists and speeds up the new ->oidbin_exists
internal API.

I initially experimented with a separate "lei import-kw" command
but decided against it since it's useless outside of IMAP+JMAP
and would require extra cognitive overhead for both users and
hackers.  So LeiImportKw is just a WQ worker used by "lei import"
and not its own user-visible command.

v2: fix ikw_done_wait arg handling (ugh, confusing API :x)
Diffstat (limited to 'lib/PublicInbox/Over.pm')
1 files changed, 6 insertions, 4 deletions
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 0e191c47..58fdea0e 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -349,13 +349,13 @@ sub check_inodes {
-sub blob_exists {
-        my ($self, $oidhex) = @_;
+sub oidbin_exists {
+        my ($self, $oidbin) = @_;
         if (wantarray) {
                 my $sth = $self->dbh->prepare_cached(<<'', undef, 1);
 SELECT docid FROM xref3 WHERE oidbin = ? ORDER BY docid ASC
-                $sth->bind_param(1, pack('H*', $oidhex), SQL_BLOB);
+                $sth->bind_param(1, $oidbin, SQL_BLOB);
                 my $tmp = $sth->fetchall_arrayref;
                 map { $_->[0] } @$tmp;
@@ -363,10 +363,12 @@ SELECT docid FROM xref3 WHERE oidbin = ? ORDER BY docid ASC
                 my $sth = $self->dbh->prepare_cached(<<'', undef, 1);
 SELECT COUNT(*) FROM xref3 WHERE oidbin = ?
-                $sth->bind_param(1, pack('H*', $oidhex), SQL_BLOB);
+                $sth->bind_param(1, $oidbin, SQL_BLOB);
+sub blob_exists { oidbin_exists($_[0], pack('H*', $_[1])) }