about summary refs log tree commit homepage
path: root/lib/PublicInbox/IMAP.pm
DateCommit message (Collapse)
2023-12-10imap: replace Mail::Address fallback with AddressPP
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the preferred Email::Address::XS (EAX) behavior than Mail::Address is for ->name support. EAX tends to be overkill with good spam filtering, and using our own fallback means life is easier for users with neither C/XS build tools nor a pre-built EAX package.
2023-10-11treewide: consolidate "From " line removal
Aside from our prior import bugs (fixed in a0c07cba0e5d8b6a (mda: drop leading "From " lines again, 2016-06-26)), we'll always have to be dealing with mutt piping messages to us and `git format-patch' output. So just share the regexp so we can use it everywhere. In may be desirable to allow importing messages with a leading "From " line for FUSE, even. Additionally, some instances of this regexp needlessly added optional `\r?' (CR) checks ahead of the `\n' (LF) element; but they're pointless anyways since [^\n]* is enough to exclude all non-LF bytes.
2023-09-09imapd: lazy-load IMAPsearchqp for Parse::RecDescent
This enables the t/pop3d.t test to pass when Parse::RecDescent is not available.
2023-05-02daemon: improve handling of Git->async_abort
The $oid arg for Git->cat_async is defined on async_abort using the original request, so use undefined $type to distinguish that case in caller-supplied callbacks. async_abort isn't common, of course, but sometimes git subprocesses can die unexpectedly.
2022-11-26eml: header_raw converts octets to Perl UTF-8
This fixes the display of raw (non-RFC 2047) names and subjects in HTML message views. SMTPUTF8 (RFC 6531) allows raw UTF-8 in headers without RFC 2047 encoding, so let Perl handle it as a character sequence for the rest of our consumers. Thus, the old special case in PublicInbox::Smsg->populate is no longer necessary and gone. The one regression notice so far (and fixed here) is compressed IMAP envelope responses still needs raw bytes since the zlib wrapper is designed for octets, not Perl UTF-8 chars. Thus we reverse utf8::decode with utf8::encode in PublicInbox::IMAP::_esc. ->header_set also forces encoding to bytes, since all existing callers would either be dealing with ->header_raw results or be RFC-2047-encoded anyways. Reindexing is not necessary with this change due to the prior PublicInbox::Smsg->populate special case. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20221124153715.3nenjpjzj43vqxr2@meerkat.local/
2022-08-20imap: remove some intermediate arrays
We can assign arrays directly without `[]' creating an extra immortal pad allocation.
2022-08-10daemon: rely on $SIG{__WARN__} for error output
warn/carp usage is unavoidable given Perl itself and standard libraries, so just rely on localized $SIG{__WARN__} from 60d262483a4d6ddf (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08) for all error reporting. While we're in the area, make some of the error handling more consistent between IMAP/NNTP/POP3.
2022-08-09daemon: use per-listener SIG{__WARN__} callbacks
This allows "-l $ADDRESS?err=/path/to/err.log to isolate normal warn() (and carp()) messages for a particular listen address to track down errors more easily.
2022-08-09imap: prioritize AUTH=ANONYMOUS clients
...by deprioritizing clients using a username + password. As IMAP provides AUTH=ANONYMOUS for designating anonymous access, we'll rely on it as a heuristic for favoring "good" clients. Clients using a username + password seem to (more often than not) be malicious and looking for info which doesn't belong in public inboxes. This copies the technique used by WWW + -httpd to deprioritize expensive mbox.gz downloads.
2022-08-09imap: only give AUTH=ANONYMOUS clients prefetch
Looking at IMAP traffic on public-inbox.org, it seems there is a fair amount of traffic coming from malicious clients assuming the IMAP server is compromised and searching for private information. Since AUTH=ANONYMOUS clients are more likely to be legitimate clients looking for publicly-archived mail, give them priority.
2022-08-04imap: ensure_slices_exist: drop needless map and array
We can reduce ops and temporary objects here by folding the stringification into the `for' loop and push directly into the {mailboxlist} array; relying on autovivification to turn it into a noop for the initial population.
2022-08-04imapd: use nntpd_cache to speed up startup/reload time
ConfigIter was still too slow despite being fair. The addition of ART_MIN in ALL->misc means it can be used as a startup/reload cache for -imapd, too. This results in a ~3x faster startup for -imapd with 50K inboxes.
2022-08-03daemon: reload TLS certs and keys on SIGHUP
This allows new TLS certificates to be loaded for new clients without having to timeout nor drop existing clients with established connections made with the old certs. This should benefit users with admins who expire certificates frequently (as encouraged by Let's Encrypt).
2022-08-03ds: use ->dflush to distinguish from ->zflush
->zflush is already for GzipFilter in PublicInbox::WWW, while we use DEFLATE for NNTP and IMAP. This ought to make the code easier-to-follow.
2022-07-23imap+nntp: share COMPRESS implementation
Their code was nearly identical to begin with, so save some memory in -netd and disk space for all of our tarball/distro users, at least. And I seem to have used multiple inheritance successfully, here, maybe...
2022-07-23ds: share long_step between NNTP and IMAP
It's not actually used by our POP3 code at the moment, but it may be soon to reduce memory usage when loading 50K smsg objects into memory.
2022-07-23ds: move requeue_once
It's the same subroutine everywhere.
2022-07-23ds: move no-op ->zflush to common base class
More deduplication, and POP3 never needed it.
2022-07-23ds: support greeting protocols
We can share some common code between IMAP, NNTP, and POP3 without too much trouble, so cut down our LoC.
2022-07-08imap: STATUS: count messages properly
This only affects the rarely-used STATUS command, our message count was consistely zero due to misusing ->imap_exists. Noticed while implementing POP3 server.
2022-05-16imap: remove unused args_ok sub
Noticed while reviewing pieces for POP3.
2021-11-01treewide: kill problematic "$h->{k} //= do {" assignments
As stated in the previous change, conditional hash assignments which trigger other hash assignments seem problematic, at times. So replace: $h->{k} //= do { $h->{x} = ...; $val }; $h->{k} // do { $h->{x} = ...; $hk->{k} = $val }; "||=" is affected the same way, and some instances of "||=" are replaced with "//=" or "// do {", now.
2021-10-17msgmap: do not cache num_highwater
Caching the value doesn't seem necessary from a performance perspective, and it adds a caveat for read-only users which may lead to bugs in future code.
2021-10-16imapd+nntpd: drop timer-based expiration
It's needlessly complex and O(n), so it doesn't scale well to a high number of clients nor is it easy-to-scale with the data structures available to us in pure Perl. In any case, I see no evidence of either -imapd nor -nntpd experiencing high connection loads on public-facing sites. -httpd has never had its own timer-based expiration, either. Fwiw, public-inbox.org itself has been running a public-facing HTTP/HTTPS server with no userspace idle client expiration for the past 8 years or with no ill effect. Clients can come and go as they wish, and SO_KEEPALIVE takes care of truly broken connections if they're gone for ~2 hours. Internet connections drop all time, so it should be harmless to drop connections w/o warning since both NNTP and IMAP protocols have well-defined semantics for determining if a message was truncated (as does HTTP/1.1+).
2021-09-29ds: drop ::later support
add_uniq_timer seems sufficient, and we'll drop the last user of ::later (IMAP) and switch to unique timers.
2021-09-16imapd: sort LIST response
While RFC 3501 doesn't require LIST responses be sorted, it makes reading protocol dumps easier and we memoize it once per-refresh, so it shouldn't be too expensive even with thousands of folders.
2021-08-25imap+nntp: die loudly if ->mm or ->over disappear
While the WWW front-end can gracefully handle ->mm and ->over disappearing (in most cases), IMAP+NNTP front-ends are completely dependent on these and failed mysteriously when they go missing after startup. These will hopefully make issues like what Konstantin encountered more obvious: Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
2021-06-24favor git(1) rather than libgit2 for ExtSearch
While both git and libgit2 take around 16 minutes to load 100K alternates there's already a proposed patch to make git faster: <https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/> It's also easier to patch and install git locally since the git.git build system defaults to prefix=$HOME and dealing with dynamic linking with libgit2 is more difficult for end users relying on Inline::C. libgit2 remains in use for the non-ALL.git case, but maybe it's not necessary (libgit2 is significantly slower than git in Debian 10 due to SHA-1 collision checking).
2021-02-07imap: avoid unnecessary on-stack delete
None of the Content-Type attributes are long-lived (and unlikely to be memory intensive). While these callsites won't trigger $DB::args segfaults via confess or longmess, it'll make future code audits easier. cf. commit 0795b0906cc81f40 ("ds: guard against stack-not-refcounted quirk of Perl 5")
2021-01-05imap: fix uninitialized var on MSN search miss
It seems only triggered by bots trying to steal information.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-28search: remove {mset} option for ->mset method
The ->mset method always returns a Xapian mset nowadays, so naming a parameter {mset} is too confusing. As it does with MiscSearch, setting the {relevance} parameter to -1 now sorts by ascending docid order. -2 is now supported for descending docid order, too, since it may be useful for lei users.
2020-12-17imap: rename parse_query => parse_imap_query
Avoid confusing hackers since this conflicts with a method name provided by (Search::)Xapian::QueryParser.
2020-12-05imap: support isearch and reduce Xapian queries
Since IMAP search (either with Isearch or traditional per-Inbox search) only returns UIDs, we can safely set the limit to the UID slice size(*). With isearch, we can also trust the Xapian result to fit any docid range we specify. Limiting Xapian results to 1000 was making ->ALL docid <=> per-Inbox UID impossible since results could overlap between ranges unpredictably. Finally, we can map the ->ALL docids into per-Inbox UIDs and show them to the client in the UID order of the Inbox, not the docid order of the ->ALL extindex. This also lets us get rid of the "uid:" query parser prefix and use the Xapian::Query API directly to reduce our search prefix footprint. For mbox.gz downloads in WWW, we'll also make a best effort to preserve the order from the Inbox, not the order of extindex; though it's possible large result sets can have non-overlapping windows. (*) by definition, UID slice size is a "safe" value which shouldn't OOM either the server or clients.
2020-10-30tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so bail out by closing the underlying socket rather than stopping the event loop and daemon process.
2020-09-26imap: avoid raising exception if client disconnects
This ought to save a few cycles if a client disconnects while in the middle of a (UID) FETCH. This avoids: Can't call method "git" on an undefined value at .../PublicInbox/IMAP.pm errors in stderr.
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-15imap: quiet uninitialized variable warning on FETCH
This was triggered by blindly trying to FETCH an MSN (not "UID FETCH") on an empty dummy inbox. It's harmless, and probably triggered by a wayward client or misbehaving bot.
2020-09-03imap: drop old, pre-Parse::RecDescent search parser
We switched to Parse::RecDescent during development and left some dead code behind.
2020-09-03search: replace ->query with ->mset
Nearly all of the search uses in the production code rely on a Xapian mset iterator being returned (instead of an array of $smsg objects). So default to returning the mset and move the burden of smsg array conversion into the test cases.
2020-08-20search: add mset_to_artnums method
We can avoid importing mdocid() in several places by using this method, simplifying callers.
2020-08-20search: export mdocid subroutine
No need to have awkward globrefs for this.
2020-07-26imap: introduce and use Git->async_prefetch
We can keep the git process more active by sending another request to it while fetch_run_ops() is running. This parallelization speeds up mutt's initial FETCH for headers by around ~35%(!).
2020-07-10imap: avoid warnings on non-slice mailboxes
Non-slice mailboxes never have messages themselves, so we must not assume a message exists when sending untagged EXISTS messages.
2020-07-06daemon: warn on missing blobs
Since -edit and -purge should be rare and TOCTOU around them rarer still; missing {blobs} could be indicative of a real bug elsewhere. Warn on them. And I somehow ended up with 3 different field names for Inbox objects. Perhaps they'll be made consistent in the future.
2020-06-28ds: remove fields.pm usage
Since the removal of pseudo-hash support in Perl 5.10, the "fields" module no longer provides the space or speed benefits it did in 5.8. It also does not allow for compile-time checks, only run-time checks. To me, the extra developer overhead in maintaining "use fields" args has become a hassle. None of our non-DS-related code uses fields.pm, nor do any of our current dependencies. In fact, Danga::Socket (which DS was originally forked from) and its subclasses are the only fields.pm users I've ever encountered in the wild. Removing fields may make our code more approachable to other Perl hackers. So stop using fields.pm and locked hashes, but continue to document what fields do for non-trivial classes.
2020-06-27imap: EXAMINE: avoid potential race conditions
We need to rely on num_highwater for UIDNEXT since the highest `num' stored in over.sqlite3 may be rolled back if the most recent messages were spam. We also need to load the uo2m immediately on EXAMINE to ensure EXISTS responses are always consistent with regard to future updates.
2020-06-27imap: always send EXISTS on uo2m_extend
Clients which are NOT in an IDLE state still need to be notified of message existence. Unlike the EXPUNGE response, untagged EXISTS responses seem to be allowed at any time according to RFC 3501. We'll also perform uo2m_extend on the NOOP command, since NOOP is the recommended command for message polling.
2020-06-25git_async_cat: remove circular reference
While this circular reference was carefully managed to not leak memory; it was still triggering a warning at -imapd/-nntpd shutdown due to the EPOLL_CTL_DEL op failing after the $Epoll FD gets closed. So remove the circular reference by providing a ref to `undef', instead.
2020-06-23imap: refill_xap: remove needless loop
There's no need to loop when the first iteration guarantees a `return'.