about summary refs log tree commit homepage
path: root/lib/PublicInbox/MdirReader.pm
diff options
context:
space:
mode:
authorEric Wong <e@80x24.org>2021-06-08 09:50:21 +0000
committerEric Wong <e@80x24.org>2021-06-08 16:50:47 +0000
commit10b523eb017162240b1ac3647f8dcbbf2be348a7 (patch)
tree9ea63ea4c4919556a1bf5b335f365372dfa1c84a /lib/PublicInbox/MdirReader.pm
parentba34a69490dce6ea3ba85ee5416b6590fa0c0a39 (diff)
downloadpublic-inbox-10b523eb017162240b1ac3647f8dcbbf2be348a7.tar.gz
On a 4-core CPU, this speeds up "lei import" on a largish
Maildir inbox with 75K messages from ~8 minutes down to ~40s.

Parallelizing alone did not bring any improvement and may
even hurt performance slightly, depending on CPU availability.
However, creating the index on the "fid" and "name" columns in
blob2name yields us the same speedup we got.

Parallelizing IMAP makes more sense due to the fact most IMAP
stores are non-local and subject to network latency.

Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
Diffstat (limited to 'lib/PublicInbox/MdirReader.pm')
-rw-r--r--lib/PublicInbox/MdirReader.pm22
1 files changed, 13 insertions, 9 deletions
diff --git a/lib/PublicInbox/MdirReader.pm b/lib/PublicInbox/MdirReader.pm
index 304be63d..484bf0a8 100644
--- a/lib/PublicInbox/MdirReader.pm
+++ b/lib/PublicInbox/MdirReader.pm
@@ -87,17 +87,21 @@ sub maildir_each_eml {
 sub new { bless {}, __PACKAGE__ }
 
 sub flags2kw ($) {
-        my @unknown;
-        my %kw;
-        for (split(//, $_[0])) {
-                my $k = $c2kw{$_};
-                if (defined($k)) {
-                        $kw{$k} = 1;
-                } else {
-                        push @unknown, $_;
+        if (wantarray) {
+                my @unknown;
+                my %kw;
+                for (split(//, $_[0])) {
+                        my $k = $c2kw{$_};
+                        if (defined($k)) {
+                                $kw{$k} = 1;
+                        } else {
+                                push @unknown, $_;
+                        }
                 }
+                (\%kw, \@unknown);
+        } else {
+                [ sort(map { $c2kw{$_} // () } split(//, $_[0])) ];
         }
-        (\%kw, \@unknown);
 }
 
 1;