about summary refs log tree commit homepage
path: root/t
DateCommit message (Collapse)
2020-06-13imap: LIST shows "INBOX" in all caps
While selecting a mailbox is done case-insensitively, "INBOX" is special for the LIST command, according to RFC 3501 6.3.8: > The special name INBOX is included in the output from LIST, if > INBOX is supported by this server for this user and if the > uppercase string "INBOX" matches the interpreted reference and > mailbox name arguments with wildcards as described above. The > criteria for omitting INBOX is whether SELECT INBOX will > return failure; it is not relevant whether the user's real > INBOX resides on this or some other server. Thus, the existing news.public-inbox.org convention of naming newsgroups starting with "inbox." needs to be special-cased to not confuse clients. While we're at it, do not create ".0" for dummy newsgroups if they're selected, either.
2020-06-13index: account for CRLF conversion when storing bytes
NNTP and IMAP both require CRLF conversions on the wire. They're also the only components which care about $smsg->{bytes}, so store the CRLF-adjusted value in over.sqlite3 and Xapian DBs.. This will allow us to optimize RFC822.SIZE fetch item in IMAP without triggering size mismatch errors in some clients' default configurations (e.g. Mail::IMAPClient), but not most others. It could also fix hypothetical problems with NNTP clients that report discrepancies between overview and article data.
2020-06-13imap: compile UID FETCH to opcodes
This is just a hair faster and cacheable in the future, if we need it. Most notably, this avoids doing PublicInbox::Eml->new for simple "RFC822", "BODY[]", and "RFC822.SIZE" requests.
2020-06-13imap: remove dummies from sequence number FETCH
Dummy messages make for bad user experience with MUAs which still use sequence numbers. Not being able to fetch a message doesn't seem fatal in mutt, so just ignore (sometimes large) gaps.
2020-06-13search: index UID for IMAP search, too
We'll need to support searching UID ranges for IMAP, so make sure it's indexed, too.
2020-06-13search: index byte size of a message for IMAP search
Searching for messages smaller than a certain size is allowed by offlineimap(1), mbsync(1), and possibly other tools. Maybe public-inbox-watch will support it, too. I don't see a reason to expose searching by size via WWW search right now (but maybe in the future, I could be convinced to). Note: we only store the byte-size of the message in git, this is typically LF-only and we won't have the correct size after CRLF conversion for NNTP or IMAP.
2020-06-13imap: allow UID range search on timestamps
Since it seems somewhat common for IMAP clients to limit searches by sent Date: or INTERNALDATE, we can rely on the NNTP/WWW-optimized overview DB. For other queries, we'll have to depend on the Xapian DB.
2020-06-13imap: start parsing out queries for SQLite and Xapian
None of the new cases are wired up, yet, but existing cases still work.
2020-06-13imap: avoid uninitialized warnings on incomplete commands
No point in spewing "uninitialized" warnings into logs when the cat jumps on the Enter key.
2020-06-13t/config.t: always compare against git bool behavior
We'll use the xqx() to avoid losing too much performance compared to normal `backtick` (qx) when testing using "make check-run" + Inline::C.
2020-06-13imap: omit $UID_END from mailbox name, use index
Having two large numbers separated by a dash can make visual comparisons difficult when numbers are in the 3,000,000 range for LKML. So avoid the $UID_END value, since it can be calculated from $UID_MIN. And we can avoid large values of $UID_MIN, too, by instead storing the block index and just multiplying it by 50000 (and adding 1) on the server side. Of course, LKML still goes up to 72, at the moment.
2020-06-13imapd: ensure LIST is sorted alphabetically, for now
I'm not sure this matters, and it could be a waste of CPU cycles if no real clients care. However, it does make debugging over telnet or s_client a bit easier.
2020-06-13imap: require ".$UID_MIN-$UID_END" suffix
Finish up the IMAP-only portion of iterative config reloading, which allows us to create all sub-ranges of an inbox up front. The InboxIdler still uses ->each_inbox which will struggle with 100K inboxes. Having messages in the top-level newsgroup name of an inbox will still waste bandwidth for clients which want to do full syncs once there's a rollover to a new 50K range. So instead, make every inbox accessible exclusively via 50K slices in the form of "$NEWSGROUP.$UID_MIN-$UID_END". This introduces the DummyInbox, which makes $NEWSGROUP and every parent component a selectable, empty inbox. This aids navigation with mutt and possibly other MUAs. Finally, the xt/perf-imap-list maintainer test is broken, now, so remove it. The grep perlfunc is already proven effective, and we'll have separate tests for mocking out ~100k inboxes.
2020-06-13imap: case-insensitive mailbox name comparisons
IMAP RFC 3501 stipulates case-insensitive comparisons, and so does RFC 977 (NNTP). However, INN (nnrpd) uses case-sensitive comparisons, so we've always used case-sensitive comparisons for NNTP to match nnrpd behavior. Unfortunately, some IMAP clients insist on sending "INBOX" with caps, which causes problems for us. Since NNTP group names are typically all lowercase anyways, just force all comparisons to lowercase for IMAP and warn admins if uppercase-containing newsgroups won't be accessible over IMAP. This ensures our existing -nntpd behavior remains unchanged while being compatible with the expectations of real-world IMAP clients.
2020-06-13imap: support out-of-bounds ranges
"$UID_START:*" needs to return at least one message according to RFC 3501 section 6.4.8. While we're in the area, coerce ranges to (unsigned) integers by adding zero ("+ 0") to reduce memory overhead.
2020-06-13imapclient: wrapper for Mail::IMAPClient
We'll be using this wrapper class to workaround some upstream bugs in Mail::IMAPClient. There may also be experiments with new APIs for more performance.
2020-06-13git: async: automatic retry on alternates change
This matches the behavior of the existing synchronous ->cat_file method. In fact, ->cat_file now becomes a small wrapper around the ->cat_async method.
2020-06-13git: cat_async: provide requested OID + "missing" on missing blobs
This will make it easier to implement the retries on alternates_changed() of the synchronous ->cat_file API.
2020-06-13imap: FETCH: support comma-delimited ranges
The RFC 3501 `sequence-set' definition allows comma-delimited ranges, so we'll support it in case clients send them. Coalescing overlapping ranges isn't required, so we won't support it as such an attempt to save bandwidth would waste memory on the server, instead.
2020-06-13imap: use git-cat-file asynchronously
This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required.
2020-06-13imap: speed up HEADER.FIELDS[.NOT] range fetches
While we can't memoize the regexp forever like we do with other Eml users, we can still benefit from caching regexp compilation on a per-request basis. A FETCH request from mutt on a 4K message inbox is around 8% faster after this. Since regexp compilation via qr// isn't unbearably slow, a shared cache probably isn't worth the trouble of implementing. A per-request cache seems enough.
2020-06-13imap: support the CLOSE command
It seems worthless to support CLOSE for read-only inboxes, but mutt sends it, so don't return a BAD error with proper use.
2020-06-13imap: do not include ".PEEK" in responses
They're not specified in RFC 3501 for responses, and at least mutt fails to handle it.
2020-06-13imap: support sequence number FETCH
We'll return dummy messages for now when sequence numbers go missing, in case clients can't handle missing messages.
2020-06-13imap: fix multi-message partial header fetches
We must keep the contents of {-partial} around when handling a request to fetch multiple messages.
2020-06-13imap: split out unit tests and benchmarks
This makes the test code easier-to-manage and allows us to run faster unit tests which don't involve loading Mail::IMAPClient.
2020-06-13imap: allow fetch of partial of BODY[...] and headers
IMAP supports a high level of granularity when it comes to fetching, but fortunately Perl makes it fairly easy to support.
2020-06-13eml: each_part: single part $idx is 1
Instead of counts starting at 0, we start the single-part message at 1 like we do with subparts of a multipart message. This will make it easier to map offsets for "BODY[$SECTION]" when using IMAP FETCH, since $SECTION must contain non-zero numbers according to RFC 3501. This doesn't make any difference for WWW URLs, since single part messages cannot have downloadable attachments.
2020-06-13imap: support fetch for BODYSTRUCTURE and BODY
I'm not sure which clients use these, but it could be useful down the line.
2020-06-13t/imapd: support FakeInotify and KQNotify
We can fill in some missing pieces from the emulation APIs to enable IMAP IDLE tests on non-Linux platforms.
2020-06-13imap: support LIST command
We'll optimize for the common case of: $TAG LIST "" * and rely on the grep perlfunc to handle trickier cases.
2020-06-13imap: implement STATUS command
I'm not sure if there's much use for this command, but it's part of RFC3501 and works read-only.
2020-06-13imap: delay InboxIdle start, support refresh
InboxIdle should not be holding onto Inbox objects after the Config object they came from expires, and Config objects may expire on SIGHUP. Old Inbox objects still persist due to IMAP clients holding onto them, but that's a concern we'll deal with at another time, or not at all, since all clients expire, eventually. Regardless, stale inotify watch descriptors should not be left hanging after SIGHUP refreshes.
2020-06-13imap: support IDLE
It seems to be working as far as Mail::IMAPClient is concerned.
2020-06-13inboxidle: new class to detect inbox changes
This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of, yet. It uses Linux::Inotify2 or IO::KQueue depending on what packages are installed and what the kernel supports. It falls back to nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+) on systems lacking Linux::Inotify2 and IO::KQueue. In the future, a pure Perl alternative to Linux::Inotify2 may be supplied for users of architectures we already support signalfd and epoll on. v2 changes: - avoid O_TRUNC on lock file - change ctime on Linux systems w/o inotify - fix naming of comments and fields
2020-06-13preliminary imap server implementation
It shares a bit of code with NNTP. It's copy+pasted for now since this provides new ground to experiment with APIs for dealing with slow storage and many inboxes.
2020-06-08index: v2: parallel by default
InboxWritable should only set $v2w->{parallel} if the $parallel flag is defined to 0 or 1. We want indexing a new inbox to utilize SMP, just like --reindex. -index once again allows -j0/--jobs=0 to force single-process use, and we'll be ensuring that works in tests to maintain performance on small systems. Fixes: 61a2fff5b34a3e32 ("admin: move index_inbox over")
2020-06-03smsg: remove remaining accessor methods
We'll continue to favor simpler data models that can be used directly rather than wasting time and memory with accessor APIs. The ->from, ->to, -cc, ->mid, ->subject, >references methods can all be trivially replaced by hash lookups since all their values are stored in doc_data. Most remaining callers of those methods were test cases, anyways. ->from_name is only used in the PSGI code, so we can just use ->psgi_cull to take care of populating the {from_name} field.
2020-06-03www: remove smsg_mime API and adjust callers
To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-06-03smsg: introduce ->populate method
This will eventually replace the __hdr() calling methods and eradicate {mime} usage from Smsg. For now, we can eliminate PublicInbox::Smsg->new since most callers already rely on an open `bless' to avoid the old {mime} arg.
2020-05-29treat $INBOX_DIR/description and gitweb.owner as UTF-8
gitweb does the same with $GIT_DIR/description and gitweb.owner. Allowing UTF-8 description should not cause problems when used in responses for to the NNTP "LIST NEWSGROUPS" request, either, since RFC 3977 section 7.6.6 recommends the description be UTF-8 (but does not require it). Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
2020-05-27learn: fix buggy typo on List-ID mapping
There is obviously a typo here, so fix it and add a test case to guard against future regressions. Fixes: 74a3206babe0572a ("mda: support multiple List-ID matches")
2020-05-26view: do not offer links to 0-byte multipart attachments
Offering links to download 0-byte files is useless. We could waste memory by preserving $eml->{bdy} during iteration, but offering attachments of type "multipart" is not very useful, as users are usually interested in decoded attachments or the entire raw message. Fixes: e60231148eb604a3 ("descend into message/(rfc822|news|global) parts")
2020-05-24t/eml.t: favor ->header over ->header_str
This test may still run against ancient versions of Email::MIME for comparisons.
2020-05-20t/edit: use eml_load here, too
I missed this instance of file slurping into an Email::MIME-like object the other week when tearing Email::MIME usage out.
2020-05-17confine Email::MIME use even further
To avoid confusing future readers and users, recommend PublicInbox::Eml in our Import POD and refer to PublicInbox::Eml comments at the top of PublicInbox::MIME. mime_load() confined to t/eml.t, since we won't be using it anywhere else in our tests.
2020-05-17descend into message/(rfc822|news|global) parts
Email::MIME never supported this properly, but there's real instances of forwarded messages as message/rfc822 attachments. message/news is legacy thing which we'll see in archives, and message/global appears to be the new thing. gmime also supports message/rfc2822, so we'll support it anyways despite lacking other evidence of its existence. Existing attachments remain downloadable as a whole message, but individual attachments of subparts are now downloadable and can be displayed in HTML, too. Furthermore, ensure Xapian can now search for common headers inside those messages as well as the message bodies.
2020-05-17t/psgi_attach: assert message/* parts are downloadable
We'll be adding support to descend into message/rfc822 (and legacy message/news) attachments. First, we must ensure existing message/rfc822 attachments can be downloaded and remain downloadable in future commits.
2020-05-12rename "ContentId" to "ContentHash"
The old name may be confused with "Content-ID" as described in RFC 2392, so use an alternate name to avoid confusing future readers.
2020-05-10emlcontentfoo: drop the {discrete} and {composite} fields
We don't have to worry about compatibility with old installations of Email::MIME::ContentType any longer, so save some space.