From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH 22/34] watch: support imap.fetchBatchSize parameter
Date: Sat, 27 Jun 2020 10:03:48 +0000 [thread overview]
Message-ID: <20200627100400.9871-23-e@yhbt.net> (raw)
In-Reply-To: <20200627100400.9871-1-e@yhbt.net>
IMAP allows retrieving multiple messages with a single command,
and Mail::IMAPClient supports that. Unfortunately, it means we
slurp multiple messages into memory at once. This option allows
users to trade off memory usage to reduce network round-trips.
Ideally, we'd support pipelining; but AFAIK no widely installed
Perl IMAP library supports it.
---
lib/PublicInbox/WatchMaildir.pm | 47 ++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index d492e5d65b7..05aa6594147 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -274,6 +274,15 @@ sub imap_common_init ($) {
$self->{imap_opt}->{$sec}->{poll_intvl} = $to if $to;
$to = cfg_intvl($cfg, 'imap', 'IdleInterval', $sec, $url);
$self->{imap_opt}->{$sec}->{idle_intvl} = $to if $to;
+
+ my $key = lc("imap.$sec.fetchBatchSize");
+ my $bs = $cfg->{lc($key)} //
+ $cfg->urlmatch('imap.fetchBatchSize', $url) // next;
+ if ($bs =~ /\A([0-9]+)\z/) {
+ $self->{imap_opt}->{$sec}->{batch_size} = $bs;
+ } else {
+ warn "W: $key=$bs is not an integer\n";
+ }
}
$mic_args;
}
@@ -389,25 +398,31 @@ sub imap_fetch_all ($$$) {
warn "I: $url fetching UID $l_uid:$r_uid\n";
$mic->Uid(1); # the default, we hope
- my $uids;
+ my $bs = $self->{imap_opt}->{$sec}->{batch_size} // 1;
my $req = $mic->imap4rev1 ? 'BODY.PEEK[]' : 'RFC822.PEEK';
+
+ # TODO: FLAGS may be useful for personal use
my $key = $req;
$key =~ s/\.PEEK//;
- my $uid;
+ my ($uids, $batch);
my $warn_cb = $SIG{__WARN__} || sub { print STDERR @_ };
local $SIG{__WARN__} = sub {
- $uid //= -1;
- $warn_cb->("$url UID:$uid\n");
+ $batch //= '?';
+ $warn_cb->("$url UID:$batch\n");
$warn_cb->(@_);
};
my $err;
do {
+ # I wish "UID FETCH $START:*" could work, but:
+ # 1) servers do not need to return results in any order
+ # 2) Mail::IMAPClient doesn't offer a streaming API
$uids = $mic->search("UID $l_uid:*") or
return "E: $url UID SEARCH $l_uid:* error: $!";
return if scalar(@$uids) == 0;
# RFC 3501 doesn't seem to indicate order of UID SEARCH
- # responses, so sort it ourselves
+ # responses, so sort it ourselves. Order matters so
+ # IMAPTracker can store the newest UID.
@$uids = sort { $a <=> $b } @$uids;
# Did we actually get new messages?
@@ -416,17 +431,23 @@ sub imap_fetch_all ($$$) {
$l_uid = $uids->[-1] + 1; # for next search
my $last_uid;
- while (defined(($uid = shift(@$uids)))) {
- local $0 = "UID:$uid $mbx $sec";
- my $r = $mic->fetch_hash($uid, $req);
+ while (scalar @$uids) {
+ my @batch = splice(@$uids, 0, $bs);
+ $batch = join(',', @batch);
+ local $0 = "UID:$batch $mbx $sec";
+ my $r = $mic->fetch_hash($batch, $req);
unless ($r) { # network error?
- $err = "E: $url UID FETCH $uid error: $!";
+ $err = "E: $url UID FETCH $batch error: $!";
last;
}
- # messages get deleted, so holes appear
- defined(my $raw = delete $r->{$uid}->{$key}) or next;
- imap_import_msg($self, $url, $uid, \$raw);
- $last_uid = $uid;
+ for my $uid (@batch) {
+ # messages get deleted, so holes appear
+ my $per_uid = delete $r->{$uid} // next;
+ my $raw = delete($per_uid->{$key}) // next;
+ imap_import_msg($self, $url, $uid, \$raw);
+ $last_uid = $uid;
+ last if $self->{quit};
+ }
last if $self->{quit};
}
_done_for_now($self);
next prev parent reply other threads:[~2020-06-27 10:04 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-27 10:03 [PATCH 00/34] watch: add IMAP and NNTP support Eric Wong
2020-06-27 10:03 ` [PATCH 01/34] inboxwritable: ensure ssoma.lock exists on init Eric Wong
2020-06-27 10:03 ` [PATCH 02/34] inbox: warn on ->on_inbox_unlock exception Eric Wong
2020-06-27 10:03 ` [PATCH 03/34] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric Wong
2020-06-27 10:03 ` [PATCH 04/34] imaptracker: use ~/.local/share/public-inbox/imap.sqlite3 Eric Wong
2020-06-27 10:03 ` [PATCH 05/34] watchmaildir: hoist out compile_watchheaders Eric Wong
2020-06-27 10:03 ` [PATCH 06/34] watchmaildir: fix check for spam vs ham inbox conflicts Eric Wong
2020-06-27 10:03 ` [PATCH 07/34] URI IMAP support Eric Wong
2020-06-27 10:03 ` [PATCH 08/34] watch: preliminary " Eric Wong
2020-06-27 10:03 ` [PATCH 09/34] kqnotify|fake_inotify: detect Maildir write ops Eric Wong
2020-06-27 10:03 ` [PATCH 10/34] watch: remove Filesys::Notify::Simple dependency Eric Wong
2020-06-27 10:03 ` [PATCH 11/34] watch: use signalfd for Maildir watching Eric Wong
2020-06-27 19:05 ` Kyle Meyer
2020-06-27 22:32 ` Eric Wong
2020-06-27 10:03 ` [PATCH 12/34] ds: remove fields.pm usage Eric Wong
2020-06-27 10:03 ` [PATCH 13/34] watch: wire up IMAP IDLE reapers to DS Eric Wong
2020-06-27 10:03 ` [PATCH 14/34] watch: support IMAP polling Eric Wong
2020-06-27 10:03 ` [PATCH 15/34] config: support ->urlmatch method for -watch Eric Wong
2020-06-27 10:03 ` [PATCH 16/34] watch: stop importers before forking Eric Wong
2020-06-27 10:03 ` [PATCH 17/34] watch: use UID SEARCH to avoid empty UID FETCH Eric Wong
2020-06-27 10:03 ` [PATCH 18/34] ds: add_timer: allow passing arg to callback Eric Wong
2020-06-27 10:03 ` [PATCH 19/34] imaptracker: add {url} field to reduce args Eric Wong
2020-06-27 10:03 ` [PATCH 20/34] imaptracker: drop {dbname} field Eric Wong
2020-06-27 10:03 ` [PATCH 21/34] watch: avoid long transaction to IMAPTracker Eric Wong
2020-06-27 10:03 ` Eric Wong [this message]
2020-06-27 10:03 ` [PATCH 23/34] watch: imap: be quiet about disconnecting on quit Eric Wong
2020-06-27 10:03 ` [PATCH 24/34] watch: support multiple watch: directives per-inbox Eric Wong
2020-06-27 10:03 ` [PATCH 25/34] watch: remove {mdir} array Eric Wong
2020-06-27 10:03 ` [PATCH 26/34] watch: just use ->urlmatch Eric Wong
2020-06-27 10:03 ` [PATCH 27/34] testcommon: $ENV{TAIL} supports non-@ARGV redirects Eric Wong
2020-06-27 10:03 ` [PATCH 28/34] watch: add NNTP support Eric Wong
2020-06-27 19:06 ` Kyle Meyer
2020-06-27 10:03 ` [PATCH 29/34] watch: show user-specified URL consistently Eric Wong
2020-06-27 10:03 ` [PATCH 30/34] watch: enable autoflush for STDOUT and STDERR Eric Wong
2020-06-27 10:03 ` [PATCH 31/34] watch: use our own "git credential" wrapper Eric Wong
2020-06-27 10:03 ` [PATCH 32/34] watch: support ~/.netrc via Net::Netrc Eric Wong
2020-06-27 10:03 ` [PATCH 33/34] imaptracker: use flock(2) around writes Eric Wong
2020-06-27 10:04 ` [PATCH 34/34] watch: simplify internal structures Eric Wong
2020-06-29 10:34 ` [PATCH 0/5] watch: Maildir fixes Eric Wong
2020-06-29 10:34 ` [PATCH 1/5] watch: check for duplicates in ->over before spamcheck Eric Wong
2020-06-29 10:34 ` [PATCH 2/5] watch: show path for warnings from spam messages Eric Wong
2020-06-29 10:34 ` [PATCH 3/5] watch: ensure SIGCHLD works in forked children Eric Wong
2020-06-29 10:34 ` [PATCH 4/5] spawn: unblock SIGCHLD in subprocess Eric Wong
2020-07-07 6:17 ` [PATCH 6/5] t/spawn: fix test reliability Eric Wong
2020-06-29 10:34 ` [PATCH 5/5] watch: make waitpid() synchronous for Maildir scans Eric Wong
2020-06-29 10:37 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200627100400.9871-23-e@yhbt.net \
--to=e@yhbt.net \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).