Date | Commit message (Collapse) |
|
We should not waste memory for IDLE unless it's used on the most
recent inbox slice. We also need to keep the IDLE connection
alive regardless of $PublicInbox::DS::EXPTIME.
|
|
We can speed up this common mutt request by another 2-3x by not
loading the entire smsg from SQLite, just the UID.
|
|
We can avoid loading the entire message from git when mutt makes
a "UID FETCH" request for "(UID FLAGS)". This speeds mutt up by
more than an order-of-magnitude in informal measurements.
|
|
This is just a hair faster and cacheable in the future, if we
need it. Most notably, this avoids doing PublicInbox::Eml->new
for simple "RFC822", "BODY[]", and "RFC822.SIZE" requests.
|
|
Dummy messages make for bad user experience with MUAs which
still use sequence numbers. Not being able to fetch a message
doesn't seem fatal in mutt, so just ignore (sometimes large)
gaps.
|
|
We'll need to support searching UID ranges for IMAP,
so make sure it's indexed, too.
|
|
Searching for messages smaller than a certain size is allowed by
offlineimap(1), mbsync(1), and possibly other tools. Maybe
public-inbox-watch will support it, too.
I don't see a reason to expose searching by size via WWW search
right now (but maybe in the future, I could be convinced to).
Note: we only store the byte-size of the message in git,
this is typically LF-only and we won't have the correct
size after CRLF conversion for NNTP or IMAP.
|
|
This speeds up xt/imapd-validate.t by around 10% when used with
an abandoned patch to remove ->query_xover. We may also depend
on this further if we abandon storing doc_data in Xapian to save
disk space.
|
|
Since it seems somewhat common for IMAP clients to limit
searches by sent Date: or INTERNALDATE, we can rely on
the NNTP/WWW-optimized overview DB.
For other queries, we'll have to depend on the Xapian DB.
|
|
We won't support searching across mailboxes, just yet;
but maybe in the future.
|
|
None of the new cases are wired up, yet, but existing cases
still work.
|
|
No point in spewing "uninitialized" warnings into logs when
the cat jumps on the Enter key.
|
|
We can share code between them and account for each 50K
mailbox slice. However, we must overreport these for
non-zero slices and just return lots of empty data for
high-numbered slices because some MUAs still insist
on non-UID fetches.
|
|
Some clients insist on sending "INBOX" in all caps,
since it's special in RFC 3501.
|
|
This will make it easier to show parameters used for testing
and potential tweaks to be made.
|
|
We'll use the xqx() to avoid losing too much performance
compared to normal `backtick` (qx) when testing using
"make check-run" + Inline::C.
|
|
Having two large numbers separated by a dash can make visual
comparisons difficult when numbers are in the 3,000,000 range
for LKML. So avoid the $UID_END value, since it can be
calculated from $UID_MIN. And we can avoid large values of
$UID_MIN, too, by instead storing the block index and just
multiplying it by 50000 (and adding 1) on the server side.
Of course, LKML still goes up to 72, at the moment.
|
|
I'm not sure this matters, and it could be a waste of
CPU cycles if no real clients care. However, it does
make debugging over telnet or s_client a bit easier.
|
|
Finish up the IMAP-only portion of iterative config reloading,
which allows us to create all sub-ranges of an inbox up front.
The InboxIdler still uses ->each_inbox which will struggle with
100K inboxes.
Having messages in the top-level newsgroup name of an inbox will
still waste bandwidth for clients which want to do full syncs
once there's a rollover to a new 50K range. So instead, make
every inbox accessible exclusively via 50K slices in the form of
"$NEWSGROUP.$UID_MIN-$UID_END".
This introduces the DummyInbox, which makes $NEWSGROUP
and every parent component a selectable, empty inbox.
This aids navigation with mutt and possibly other MUAs.
Finally, the xt/perf-imap-list maintainer test is broken, now,
so remove it. The grep perlfunc is already proven effective,
and we'll have separate tests for mocking out ~100k inboxes.
|
|
This will be used to prevent reloading a giant config with
tens/hundreds of thousands of inboxes from blocking the event
loop.
|
|
This limit on mailbox size should keep users of tools like
mbsync (isync) and offlineimap happy, since typical filesystems
struggle with giant Maildirs.
I chose 50K since it's a bit more than what LKML typically sees
in a month and still manages to give acceptable performance on
my ancient Centrino laptop.
There were also no responses to my original proposal at:
<https://public-inbox.org/meta/20200519090000.GA24273@dcvr/>
so no objections, either :>
|
|
IMAP RFC 3501 stipulates case-insensitive comparisons, and so
does RFC 977 (NNTP). However, INN (nnrpd) uses case-sensitive
comparisons, so we've always used case-sensitive comparisons for
NNTP to match nnrpd behavior.
Unfortunately, some IMAP clients insist on sending "INBOX" with
caps, which causes problems for us. Since NNTP group names are
typically all lowercase anyways, just force all comparisons to
lowercase for IMAP and warn admins if uppercase-containing
newsgroups won't be accessible over IMAP.
This ensures our existing -nntpd behavior remains unchanged
while being compatible with the expectations of real-world IMAP
clients.
|
|
It's useful to know how fast SIGHUP can be handled, too.
|
|
"$UID_START:*" needs to return at least one message according
to RFC 3501 section 6.4.8.
While we're in the area, coerce ranges to (unsigned) integers by
adding zero ("+ 0") to reduce memory overhead.
|
|
imapd-validate is a beefed up version of our nntpd-validate test
which hammers the server with parallel connections over regular
IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of
each of those. It uses $START_UID:$END_UID fetch ranges to
reduce requests and slurp many responses at once to saturate
"git cat-file --batch" processes.
mbsync(1) also uses pipelining extensively (but IMHO
unnecessarily), so it was able to shake out some bugs in
the async git code.
Finally, we remove xt/cmp-imapd-compress.t since it's
redundant now that we have PublicInbox::IMAPClient to work
around bugs in Mail::IMAPClient.
|
|
We'll be using this wrapper class to workaround some upstream
bugs in Mail::IMAPClient. There may also be experiments with
new APIs for more performance.
|
|
This matches the behavior of the existing synchronous ->cat_file
method. In fact, ->cat_file now becomes a small wrapper around
the ->cat_async method.
|
|
Trying to avoid a circular reference by relying on $ibx object
here makes no sense, since skipping GitCatAsync::close will
result in an FD leak, anyways. So keep GitAsyncCat contained to
git-only operations, since we'll be using it for Solver in the
distant feature.
|
|
This will make it easier to implement the retries on
alternates_changed() of the synchronous ->cat_file API.
|
|
Since IMAP yields control to GitAsyncCat, IMAP->event_step may
be invoked with {long_cb} still active. We must be sure to
bail out of IMAP->event_step if that happens and continue to let
GitAsyncCat drive IMAP.
This also improves fairness by never processing more than one
request per ->event_step.
|
|
It must be a scalar reference, unlike ->write
|
|
None of our tests rely on this failing, so just bail out
if the system is out of resources.
|
|
Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient
stalls with compression enabled:
https://rt.cpan.org/Ticket/Display.html?id=132720
|
|
The RFC 3501 `sequence-set' definition allows comma-delimited
ranges, so we'll support it in case clients send them.
Coalescing overlapping ranges isn't required, so we won't
support it as such an attempt to save bandwidth would waste
memory on the server, instead.
|
|
Since we only support read-only operation, we can't save
subscriptions requested by clients. So just list no inboxes as
subscribed, some MUAs may blindly try to fetch everything its
subscribed to.
|
|
We do this for the C10K-oriented HTTP/NNTP/IMAP processes, and
we may support thousands of git-cat-file processes in the
future.
|
|
This ought to improve overall performance with multiple clients.
Single client performance suffers a tiny bit due to extra
syscall overhead from epoll.
This also makes the existing async interface easier-to-use,
since calling cat_async_begin is no longer required.
|
|
To work with our event loop, we must perform read buffering
ourselves or risk starvation, as there doesn't appear to be
a way to check the amount of data buffered in userspace by
by the PerlIO layers without resorting to C or XS.
This lets us perform fewer syscalls at the expense of more Perl
ops. As it stands, there seems to be a tiny performance
improvement, but more will be possible in the future.
|
|
Small array refs have considerable overhead in Perl, so reduce
AV/SV overhead and instead allow the inflight array to grow
twice as large.
|
|
While we can't memoize the regexp forever like we do with other
Eml users, we can still benefit from caching regexp compilation
on a per-request basis.
A FETCH request from mutt on a 4K message inbox is around 8%
faster after this. Since regexp compilation via qr// isn't
unbearably slow, a shared cache probably isn't worth the
trouble of implementing. A per-request cache seems enough.
|
|
It seems worthless to support CLOSE for read-only inboxes, but
mutt sends it, so don't return a BAD error with proper use.
|
|
They're not specified in RFC 3501 for responses, and at least
mutt fails to handle it.
|
|
We'll return dummy messages for now when sequence numbers go
missing, in case clients can't handle missing messages.
|
|
While the contents of normal %want hash keys are bounded in
size, %partial can cause more overhead and lead to repeated sort
calls on multi-message fetches. So sort it once and use
arrayrefs to make the data structure more compact.
|
|
We must keep the contents of {-partial} around when handling
a request to fetch multiple messages.
|
|
This makes the test code easier-to-manage and allows us to run
faster unit tests which don't involve loading Mail::IMAPClient.
|
|
Mail::IMAPClient doesn't seem to mind the lack of `resp-text';
but it's required by RFC 3501. Preliminary tests with
offlineimap(1) indicates the presence of `resp-text' is
necessary, even if it's just the freeform `text'.
And make the `text' more consistent, favoring "done" over
"complete" or "completed"; while we're at it.
|
|
IMAP supports a high level of granularity when it comes to
fetching, but fortunately Perl makes it fairly easy to support.
|
|
Instead of counts starting at 0, we start the single-part
message at 1 like we do with subparts of a multipart message.
This will make it easier to map offsets for "BODY[$SECTION]"
when using IMAP FETCH, since $SECTION must contain non-zero
numbers according to RFC 3501.
This doesn't make any difference for WWW URLs, since single part
messages cannot have downloadable attachments.
|
|
I'm not sure which clients use these, but it could be useful
down the line.
|