about summary refs log tree commit homepage
path: root/MANIFEST
DateCommit message (Collapse)
2020-06-28watch: use our own "git credential" wrapper
Git.pm may not be installed on some systems; or some users have multiple Perl installations and Git.pm is not available to the Perl running -watch. Accomodate both those types of users by providing our own "git credential" wrapper.
2020-06-28watch: add NNTP support
This is similar to IMAP support, but only supports polling. Automatic altid support is not yet supported, yet; but may be in the future. v2: small grammar fix by Kyle Meyer Link: https://public-inbox.org/meta/87sgeg5nxf.fsf@kyleam.com/
2020-06-28watch: remove Filesys::Notify::Simple dependency
Since we already use inotify and EVFILT_VNODE (kqueue) in -imapd, we might as well use them directly in -watch, too. This will allow public-inbox-watch to use PublicInbox::DS for timers to watch newsgroups/mailboxes and have saner signal handling in future commits.
2020-06-28kqnotify|fake_inotify: detect Maildir write ops
We need to detect link(2) and rename(2) in other apps writing to the Maildir. We'll be removing the Filesys::Notify::Simple from -watch in favor of using IO::KQueue or Linux::Inotify2 directly. Ensure non-inotify emulations can support everything we expect for Maildir writers.
2020-06-28watch: preliminary IMAP support
Only servers with IDLE are supported, for now. Polling will be needed since users may need to watch many inboxes with a few active connections due to IMAP server limitations.
2020-06-28URI IMAP support
We'll be supporting the IMAP URL scheme described in RFC 5092 for -watch, so add this module to fill in what the `URI' package lacks.
2020-06-28imaptracker: use ~/.local/share/public-inbox/imap.sqlite3
Respect XDG_DATA_HOME to avoid cluttering ~/.public-inbox/. Existing users of ~/.public-inbox/imap.sqlite3 will remain supported, but the preference for new data is to use ~/.local/share and other paths standardized by XDG. Cc: "Eric W. Biederman" <ebiederm@xmission.com>
2020-06-28IMAPTracker: Add a helper to track our place in reading imap mailboxes
This removes the need to delete from an imap mailbox when downloading it's messages. [ew: minor style changes] Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-06-16imap: *SEARCH: use Parse::RecDescent
For properly parsing IMAP search requests, it's easier to use a recursive descent parser generator to deal with subqueries and the "OR" statement. Parse::RecDescent was chosen since it's mature, well-known, widely available and already used by our optional dependencies: Inline::C and Mail::IMAPClient. While it's possible to build Xapian queries without using the Xapian string query parser; this iteration of the IMAP parser still builds a string which is passed to Xapian's query parser for ease-of-diagnostics. Since this is a recursive descent parser dealing with untrusted inputs, subqueries have a nesting limit of 10. I expect that is more than adequate for real-world use.
2020-06-16MANIFEST: add missing 1.6.0 release notes entry
2020-06-13imap: introduce memory-efficient uo2m mapping
Since we limit our mailboxes slices to 50K and can guarantee a contiguous UID space for those mailboxes, we can store a mapping of "UID offsets" (not full UIDs) to Message Sequence Numbers as an array of 16-bit unsigned integers in a 100K scalar. For UID-only FETCH responses, we can momentarily unpack the compact 100K representation to a ~1.6M Perl array of IV/UV elements for a slight speedup. Furthermore, we can (ab)use hash key deduplication in Perl5 to deduplicate this 100K scalar across all clients with the same mailbox slice open. Technically we can increase our slice size to 64K w/o increasing our storage overhead, but I suspect humans are more accustomed to slices easily divisible by 10.
2020-06-13imap: require ".$UID_MIN-$UID_END" suffix
Finish up the IMAP-only portion of iterative config reloading, which allows us to create all sub-ranges of an inbox up front. The InboxIdler still uses ->each_inbox which will struggle with 100K inboxes. Having messages in the top-level newsgroup name of an inbox will still waste bandwidth for clients which want to do full syncs once there's a rollover to a new 50K range. So instead, make every inbox accessible exclusively via 50K slices in the form of "$NEWSGROUP.$UID_MIN-$UID_END". This introduces the DummyInbox, which makes $NEWSGROUP and every parent component a selectable, empty inbox. This aids navigation with mutt and possibly other MUAs. Finally, the xt/perf-imap-list maintainer test is broken, now, so remove it. The grep perlfunc is already proven effective, and we'll have separate tests for mocking out ~100k inboxes.
2020-06-13xt: add imapd-validate and imapd-mbsync-oimap
imapd-validate is a beefed up version of our nntpd-validate test which hammers the server with parallel connections over regular IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of each of those. It uses $START_UID:$END_UID fetch ranges to reduce requests and slurp many responses at once to saturate "git cat-file --batch" processes. mbsync(1) also uses pipelining extensively (but IMHO unnecessarily), so it was able to shake out some bugs in the async git code. Finally, we remove xt/cmp-imapd-compress.t since it's redundant now that we have PublicInbox::IMAPClient to work around bugs in Mail::IMAPClient.
2020-06-13imapclient: wrapper for Mail::IMAPClient
We'll be using this wrapper class to workaround some upstream bugs in Mail::IMAPClient. There may also be experiments with new APIs for more performance.
2020-06-13add imapd compression test
Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient stalls with compression enabled: https://rt.cpan.org/Ticket/Display.html?id=132720
2020-06-13imap: use git-cat-file asynchronously
This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required.
2020-06-13imap: split out unit tests and benchmarks
This makes the test code easier-to-manage and allows us to run faster unit tests which don't involve loading Mail::IMAPClient.
2020-06-13imap: allow fetch of partial of BODY[...] and headers
IMAP supports a high level of granularity when it comes to fetching, but fortunately Perl makes it fairly easy to support.
2020-06-13inboxidle: new class to detect inbox changes
This will be used to implement IMAP IDLE, first. Eventually, it may be used to trigger other things: * incremental internal updates for manifest.js.gz * restart `git cat-file' processes on pack index unlink * IMAP IDLE-like long-polling HTTP endpoint And maybe more things we haven't thought of, yet. It uses Linux::Inotify2 or IO::KQueue depending on what packages are installed and what the kernel supports. It falls back to nanosecond-aware Time::HiRes::stat() (available with Perl 5.10.0+) on systems lacking Linux::Inotify2 and IO::KQueue. In the future, a pure Perl alternative to Linux::Inotify2 may be supplied for users of architectures we already support signalfd and epoll on. v2 changes: - avoid O_TRUNC on lock file - change ctime on Linux systems w/o inotify - fix naming of comments and fields
2020-06-13preliminary imap server implementation
It shares a bit of code with NNTP. It's copy+pasted for now since this provides new ground to experiment with APIs for dealing with slow storage and many inboxes.
2020-05-17t/psgi_attach: assert message/* parts are downloadable
We'll be adding support to descend into message/rfc822 (and legacy message/news) attachments. First, we must ensure existing message/rfc822 attachments can be downloaded and remain downloadable in future commits.
2020-05-12rename "ContentId" to "ContentHash"
The old name may be confused with "Content-ID" as described in RFC 2392, so use an alternate name to avoid confusing future readers.
2020-05-12xt/eml_check_limits: check limits against an inbox
This allows maintainers to easily check limits against the contents of existing inboxes. This script covers most of the new limits enforced by PublicInbox::Eml. Usage is similar to most xt/*.t scripts: GIANT_INBOX_DIR=/path/to/inbox prove -bvw xt/eml_check_limits.t Setting `TEST_CLASS=PublicInbox::MIME' allows us to check performance and memory use against the old subclass of Email::MIME.
2020-05-09xt: eml comparison tests
While our codebase can still work with either MIME implementation, add comparison tests to ensure we handle corner cases in existing archives.
2020-05-09EmlContentFoo: Email::MIME::ContentType replacement
Since we're getting rid of Email::MIME, get rid of Email::MIME::ContentType, too; since we may introduce speedups down the line specific to our codebase.
2020-05-09eml: pure-Perl replacement for Email::MIME
Email::MIME eats memory, wastes time parsing out all the headers, and some problems can't be fixed without breaking compatibility for other projects which depend on it. Informal benchmarks show a ~2x improvement in general stats gathering scripts and ~10% improvement in HTML view rendering. We also don't need the ability to create MIME messages, just parse them and maybe drop an attachment. While this isn't the zero-copy or streaming MIME parser of my dreams; it's still an improvement in that it doesn't keep a scalar copy of the raw body around along with subparts. It also doesn't parse subparts up front, so it can also replace our uses of Email::Simple.
2020-04-27doc: add clients.txt
Since some client tools exist for dealing with public-inbox specifically, it seems like a good idea to list some of them. Cc: Danh Doan <congdanhqx@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Leah Neukirchen <leah@vuxu.org>
2020-04-26tests: remove Email::MIME->create use entirely
Replace them with .eml files generated with the help of Email::MIME, but without some extraneous and unnecessary headers, and strip mime_load down to just loading files. This will give us more freedom to experiment with other mail libraries which may be more correct, better maintained, use less memory and/or be faster than Email::MIME.
2020-04-20watchmaildir: support multiple watchheader values
The watchheader key supports only a single value. Supporting multiple watchheader values was mentioned in discussion [1] of 8d3e3bd8 (doc: explain publicinbox.<name>.watchheader, 2019-10-09), and it wasn't clear if there was a need. One scenario in which matching multiple headers would be convenient is when someone wants to set up public-inbox archives for some small projects but does _not_ want to run mailing lists for them, instead allowing others to follow the project by any of the pull mechanisms. Using a common underlying address, an address alias for each project is configured via a third-party email provider, with messages for each alias being exposed as a separate public-inbox archive. In this setup, messages for an inbox cannot be selected by a List-ID header but can be identified by the inbox's address in either the To or Cc header. To support such a use case, update the watchheader handling to consider multiple values, accepting a message if it matches any value. While selecting a message based on matching _any_ rather than _all_ values is motivated by the above scenario, it's worth noting that the "any" behavior is consistent with how multiple listid config values are handled. [1] https://public-inbox.org/meta/20191010085118.r3amey4cayazfycb@dcvr/
2020-04-19doc: start writeup on semi-automatic memory management
I don't consider Perl's memory management "automatic". Instead, having an extra bit of control as a hacker is nice and there's no need to burden ordinary users with GC tuning knobs.
2020-04-19inboxwritable: mime_from_path: reuse in more places
There's nothing Maildir-specific about the function, so `maildir_path_load' was a bad name. So give it a more appropriate name and use it in our tests. This save ourselves some code and inconsistency by reusing an existing internal library routine in more places. We can drop the "From_" line in some of our (formerly) mbox sample files.
2020-04-17doc: update 1.4.0 relnotes with date, start 1.5.0
2020-04-15MANIFEST update
2020-03-25www: add endpoint to retrieve altid dumps
This ensures all our indexed data, including data from altid searches (e.g. "gmane:$ARTNUM") is retrievable. It uses a "POST" request to avoid wasting cycles when invoked by crawlers, since it could potentially be several megabytes of data not indexable by search engines.
2020-03-25qspawn: reinstate filter support, add gzip filter
We'll be supporting gzipped from sqlite3(1) dumps for altid files in future commits. In the future (and if we survive), we may replace Plack::Middleware::Deflater with our own GzipFilter to work better with asynchronous responses without relying on memory-intensive anonymous subs.
2020-03-22v2: SDBM-based multi Message-ID queue
This lets us store author and committer times for deferred indexing messages with ambiguous Message-IDs. This allows us to reproducibly reindex messages with the git commit and author times when a rare message lacks Received and/or Date headers while having ambiguous Message-IDs.
2020-03-22rename PublicInbox::SearchMsg => PublicInbox::Smsg
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-03-22index: use git commit times on missing Date/Received
When indexing messages without Date: and/or Received: headers, fall back to using timestamps originally recorded by git in the commit object. This allows git mirrors to preserve the import datestamp and timestamp of a message according to what was fed into git, instead of blindly falling back to the current time.
2020-02-24doc: technical: document data structures
Can't code without data structures, and we emphasize data over code just about everywhere.
2020-02-15t/msg_iter: test for X-UNKNOWN charset from Alpine
A long overdue test for behavior established in 2016. Fixes: 1b28cc7f00a866cb ("view: try assuming UTF-8 for bogus charsets")
2020-02-09doc: update v1.3.0.eml with actual headers, start v1.4.0
Bigger changes coming :>
2020-02-07syscall: support Linux x32 ABI
The x32 ABI allows users to take advantage of the extra registers on x86-64 without the bloat of 64-bit pointers and longs. This ought to be significant since Perl was designed when 32-bit was prevalent; and the common structs for ops, hashes, scalars, and arrays use longs (SSize_t/Size_t) for things which should never need 64-bits when processing emails. Debian's x32 port seems to work quite nicely under a chroot on an amd64 Linux system. All tests pass under x32, now.
2020-02-06MANIFEST: add flow.{ge,txt}
Oops :x
2020-02-02t/multi-mid.t: extra test for -convert highwater mark
This is derived from a real-world test case where I encounterd multiple Message-IDs in a v1 inbox causing regen problems. Fixes: eea47b676127bcdb ("convert: preserve highwater mark from v1 msgmap")
2020-01-11doc: technical/ds.txt: describe PublicInbox::DS divergences
Danga::Socket 1.62 was released a few months back and the maintainer indicated it would be the last release. We've diverged significantly in incompatible ways... While most of this should've already been documented in commit messages, putting it all into one document could make it easier-to-digest. It's also a strange design for anybody used to conventional event loops. Maybe this is an unconventional project :P
2020-01-05view: msg_html: reduce memory use on reused MIDs
In rare cases where Message-IDs get reused, we do not want to hold onto the large Email::MIME objects in memory after showing the first message. So discard each message as soon as we're done using it so we can save memory for the next message. The new and expensive xt/mem-msgview.t test shows a nearly 14MB reduction for two ~7MB messages. run_script() also gets upgraded to make it easier to pass large inputs via IO GLOBs.
2020-01-04xt/solver.t: real-world regression tests
There's a lot of test cases which we should probably make self-contained at some point, but right now it's easier to just mark them off in a maintainer test.
2020-01-03examples: add empty "lib" dir to placate plackup
This is necessary for Filesys::Notify::Simple 0.13 using Linux::Inotify2, since 0.13 started croaking on inotify_add_watch failures.
2020-01-02doc: release notes: set Date for 1.2.0, start 1.3.0
Seems like a lot's happened since 1.2, but it's mostly internal stuff...
2020-01-01wwwstatic: add directory listing + index.html support
It's now possible to use WwwStatic as a standalone PSGI app to serve static files and recreate the award-winning web design of https://public-inbox.org/ :>