public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-06-13	t/config.t: always compare against git bool behavior
	We'll use the xqx() to avoid losing too much performance compared to normal `backtick` (qx) when testing using "make check-run" + Inline::C.
2020-06-13	imap: omit $UID_END from mailbox name, use index
	Having two large numbers separated by a dash can make visual comparisons difficult when numbers are in the 3,000,000 range for LKML. So avoid the $UID_END value, since it can be calculated from $UID_MIN. And we can avoid large values of $UID_MIN, too, by instead storing the block index and just multiplying it by 50000 (and adding 1) on the server side. Of course, LKML still goes up to 72, at the moment.
2020-06-13	imapd: ensure LIST is sorted alphabetically, for now
	I'm not sure this matters, and it could be a waste of CPU cycles if no real clients care. However, it does make debugging over telnet or s_client a bit easier.
2020-06-13	imap: require ".$UID_MIN-$UID_END" suffix
	Finish up the IMAP-only portion of iterative config reloading, which allows us to create all sub-ranges of an inbox up front. The InboxIdler still uses ->each_inbox which will struggle with 100K inboxes. Having messages in the top-level newsgroup name of an inbox will still waste bandwidth for clients which want to do full syncs once there's a rollover to a new 50K range. So instead, make every inbox accessible exclusively via 50K slices in the form of "$NEWSGROUP.$UID_MIN-$UID_END". This introduces the DummyInbox, which makes $NEWSGROUP and every parent component a selectable, empty inbox. This aids navigation with mutt and possibly other MUAs. Finally, the xt/perf-imap-list maintainer test is broken, now, so remove it. The grep perlfunc is already proven effective, and we'll have separate tests for mocking out ~100k inboxes.
2020-06-13	imap: case-insensitive mailbox name comparisons
	IMAP RFC 3501 stipulates case-insensitive comparisons, and so does RFC 977 (NNTP). However, INN (nnrpd) uses case-sensitive comparisons, so we've always used case-sensitive comparisons for NNTP to match nnrpd behavior. Unfortunately, some IMAP clients insist on sending "INBOX" with caps, which causes problems for us. Since NNTP group names are typically all lowercase anyways, just force all comparisons to lowercase for IMAP and warn admins if uppercase-containing newsgroups won't be accessible over IMAP. This ensures our existing -nntpd behavior remains unchanged while being compatible with the expectations of real-world IMAP clients.
2020-06-13	imap: support out-of-bounds ranges
	"$UID_START:*" needs to return at least one message according to RFC 3501 section 6.4.8. While we're in the area, coerce ranges to (unsigned) integers by adding zero ("+ 0") to reduce memory overhead.
2020-06-13	imapclient: wrapper for Mail::IMAPClient
	We'll be using this wrapper class to workaround some upstream bugs in Mail::IMAPClient. There may also be experiments with new APIs for more performance.
2020-06-13	git: async: automatic retry on alternates change
	This matches the behavior of the existing synchronous ->cat_file method. In fact, ->cat_file now becomes a small wrapper around the ->cat_async method.
2020-06-13	git: cat_async: provide requested OID + "missing" on missing blobs
	This will make it easier to implement the retries on alternates_changed() of the synchronous ->cat_file API.
2020-06-13	imap: FETCH: support comma-delimited ranges
	The RFC 3501 `sequence-set' definition allows comma-delimited ranges, so we'll support it in case clients send them. Coalescing overlapping ranges isn't required, so we won't support it as such an attempt to save bandwidth would waste memory on the server, instead.
2020-06-13	imap: use git-cat-file asynchronously
	This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required.
2020-06-13	imap: speed up HEADER.FIELDS[.NOT] range fetches
	While we can't memoize the regexp forever like we do with other Eml users, we can still benefit from caching regexp compilation on a per-request basis. A FETCH request from mutt on a 4K message inbox is around 8% faster after this. Since regexp compilation via qr// isn't unbearably slow, a shared cache probably isn't worth the trouble of implementing. A per-request cache seems enough.
2020-06-13	imap: support the CLOSE command
	It seems worthless to support CLOSE for read-only inboxes, but mutt sends it, so don't return a BAD error with proper use.
2020-06-13	imap: do not include ".PEEK" in responses
	They're not specified in RFC 3501 for responses, and at least mutt fails to handle it.
2020-06-13	imap: support sequence number FETCH
	We'll return dummy messages for now when sequence numbers go missing, in case clients can't handle missing messages.
2020-06-13	imap: fix multi-message partial header fetches
	We must keep the contents of {-partial} around when handling a request to fetch multiple messages.
2020-06-13	imap: split out unit tests and benchmarks
	This makes the test code easier-to-manage and allows us to run faster unit tests which don't involve loading Mail::IMAPClient.
2020-06-13	imap: allow fetch of partial of BODY[...] and headers
	IMAP supports a high level of granularity when it comes to fetching, but fortunately Perl makes it fairly easy to support.
2020-06-13	eml: each_part: single part $idx is 1
	Instead of counts starting at 0, we start the single-part message at 1 like we do with subparts of a multipart message. This will make it easier to map offsets for "BODY[$SECTION]" when using IMAP FETCH, since $SECTION must contain non-zero numbers according to RFC 3501. This doesn't make any difference for WWW URLs, since single part messages cannot have downloadable attachments.
2020-06-13	imap: support fetch for BODYSTRUCTURE and BODY
	I'm not sure which clients use these, but it could be useful down the line.
2020-06-13	t/imapd: support FakeInotify and KQNotify
	We can fill in some missing pieces from the emulation APIs to enable IMAP IDLE tests on non-Linux platforms.
2020-06-13	imap: support LIST command
	We'll optimize for the common case of: $TAG LIST "" * and rely on the grep perlfunc to handle trickier cases.
2020-06-13	imap: implement STATUS command
	I'm not sure if there's much use for this command, but it's part of RFC3501 and works read-only.
2020-06-13	imap: delay InboxIdle start, support refresh
	InboxIdle should not be holding onto Inbox objects after the Config object they came from expires, and Config objects may expire on SIGHUP. Old Inbox objects still persist due to IMAP clients holding onto them, but that's a concern we'll deal with at another time, or not at all, since all clients expire, eventually. Regardless, stale inotify watch descriptors should not be left hanging after SIGHUP refreshes.
2020-06-13	imap: support IDLE
	It seems to be working as far as Mail::IMAPClient is concerned.
2020-06-13	inboxidle: new class to detect inbox changes
	This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of, yet. It uses Linux::Inotify2 or IO::KQueue depending on what packages are installed and what the kernel supports. It falls back to nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+) on systems lacking Linux::Inotify2 and IO::KQueue. In the future, a pure Perl alternative to Linux::Inotify2 may be supplied for users of architectures we already support signalfd and epoll on. v2 changes: - avoid O_TRUNC on lock file - change ctime on Linux systems w/o inotify - fix naming of comments and fields
2020-06-13	preliminary imap server implementation
	It shares a bit of code with NNTP. It's copy+pasted for now since this provides new ground to experiment with APIs for dealing with slow storage and many inboxes.
2020-06-08	index: v2: parallel by default
	InboxWritable should only set $v2w->{parallel} if the $parallel flag is defined to 0 or 1. We want indexing a new inbox to utilize SMP, just like --reindex. -index once again allows -j0/--jobs=0 to force single-process use, and we'll be ensuring that works in tests to maintain performance on small systems. Fixes: 61a2fff5b34a3e32 ("admin: move index_inbox over")
2020-06-03	smsg: remove remaining accessor methods
	We'll continue to favor simpler data models that can be used directly rather than wasting time and memory with accessor APIs. The ->from, ->to, -cc, ->mid, ->subject, >references methods can all be trivially replaced by hash lookups since all their values are stored in doc_data. Most remaining callers of those methods were test cases, anyways. ->from_name is only used in the PSGI code, so we can just use ->psgi_cull to take care of populating the {from_name} field.
2020-06-03	www: remove smsg_mime API and adjust callers
	To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-06-03	smsg: introduce ->populate method
	This will eventually replace the __hdr() calling methods and eradicate {mime} usage from Smsg. For now, we can eliminate PublicInbox::Smsg->new since most callers already rely on an open `bless' to avoid the old {mime} arg.
2020-05-29	treat $INBOX_DIR/description and gitweb.owner as UTF-8
	gitweb does the same with $GIT_DIR/description and gitweb.owner. Allowing UTF-8 description should not cause problems when used in responses for to the NNTP "LIST NEWSGROUPS" request, either, since RFC 3977 section 7.6.6 recommends the description be UTF-8 (but does not require it). Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
2020-05-27	learn: fix buggy typo on List-ID mapping
	There is obviously a typo here, so fix it and add a test case to guard against future regressions. Fixes: 74a3206babe0572a ("mda: support multiple List-ID matches")
2020-05-26	view: do not offer links to 0-byte multipart attachments
	Offering links to download 0-byte files is useless. We could waste memory by preserving $eml->{bdy} during iteration, but offering attachments of type "multipart" is not very useful, as users are usually interested in decoded attachments or the entire raw message. Fixes: e60231148eb604a3 ("descend into message/(rfc822\|news\|global) parts")
2020-05-24	t/eml.t: favor ->header over ->header_str
	This test may still run against ancient versions of Email::MIME for comparisons.
2020-05-20	t/edit: use eml_load here, too
	I missed this instance of file slurping into an Email::MIME-like object the other week when tearing Email::MIME usage out.
2020-05-17	confine Email::MIME use even further
	To avoid confusing future readers and users, recommend PublicInbox::Eml in our Import POD and refer to PublicInbox::Eml comments at the top of PublicInbox::MIME. mime_load() confined to t/eml.t, since we won't be using it anywhere else in our tests.
2020-05-17	descend into message/(rfc822\|news\|global) parts
	Email::MIME never supported this properly, but there's real instances of forwarded messages as message/rfc822 attachments. message/news is legacy thing which we'll see in archives, and message/global appears to be the new thing. gmime also supports message/rfc2822, so we'll support it anyways despite lacking other evidence of its existence. Existing attachments remain downloadable as a whole message, but individual attachments of subparts are now downloadable and can be displayed in HTML, too. Furthermore, ensure Xapian can now search for common headers inside those messages as well as the message bodies.
2020-05-17	t/psgi_attach: assert message/* parts are downloadable
	We'll be adding support to descend into message/rfc822 (and legacy message/news) attachments. First, we must ensure existing message/rfc822 attachments can be downloaded and remain downloadable in future commits.
2020-05-12	rename "ContentId" to "ContentHash"
	The old name may be confused with "Content-ID" as described in RFC 2392, so use an alternate name to avoid confusing future readers.
2020-05-10	emlcontentfoo: drop the {discrete} and {composite} fields
	We don't have to worry about compatibility with old installations of Email::MIME::ContentType any longer, so save some space.
2020-05-10	t/mime: fix test to work w/o Email::MIME
	Although the lazy loading changes were correct, the code was still using PublicInbox::MIME as a fixed class. Use the `$cls' variable from the loop. Favor ->subparts to ->parts, instead, too, since ->parts is discouraged by the Email::MIME manpage and not implemented for Eml.
2020-05-10	eml: rename limits to match postfix names
	They're still part of our internal API at this point, but reusing the same names as those used by postfix makes sense for now to reduce cognitive overheads of learning new things. There's no "mime_parts_limit", but the name is consistent with "mime_nesting_limit".
2020-05-10	eml: enforce a maximum header length
	While our header processing is more efficient than Email::*::Header, capping the maximum size for a `m//g' match still limits memory growth on a header we care for. Use the same limit as postfix (header_size_limit=102400), since messages fetched via git/HTTP/NNTP/etc can bypass MTA limits.
2020-05-09	remove most internal Email::MIME usage
	We no longer load or use Email::MIME outside of comparison tests.
2020-05-09	EmlContentFoo: Email::MIME::ContentType replacement
	Since we're getting rid of Email::MIME, get rid of Email::MIME::ContentType, too; since we may introduce speedups down the line specific to our codebase.
2020-05-09	replace most uses of PublicInbox::MIME with Eml
	PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-05-09	eml: pure-Perl replacement for Email::MIME
	Email::MIME eats memory, wastes time parsing out all the headers, and some problems can't be fixed without breaking compatibility for other projects which depend on it. Informal benchmarks show a ~2x improvement in general stats gathering scripts and ~10% improvement in HTML view rendering. We also don't need the ability to create MIME messages, just parse them and maybe drop an attachment. While this isn't the zero-copy or streaming MIME parser of my dreams; it's still an improvement in that it doesn't keep a scalar copy of the raw body around along with subparts. It also doesn't parse subparts up front, so it can also replace our uses of Email::Simple.
2020-05-09	msg_iter: pass $idx as a scalar, not array
	This doesn't make any difference for most multipart messages (or any single part messages). However, this starts having space savings when parts start nesting. It also slightly simplifies callers.
2020-05-09	search: support searching on List-Id
	We'll support both probabilistic matches via `l:' and boolean matches via `lid:' for exact matches, similar to how both `m:' and `mid:' are supported. Only text inside angle braces (`<' and `>') are supported, since I'm not sure if there's value in searching on the optional phrases (which would require decoding with ->header_str instead of ->header_raw).