about summary refs log tree commit homepage
path: root/xt
DateCommit message (Collapse)
2021-04-22lei: flesh out `forwarded' kw support for Maildir and IMAP
Maildir and IMAP can both handle `forwarded'. Ensure we don't lose `forwarded' when reading from stores which do not support it, but ensure we can set it when reading from IMAP and Maildir stores.
2021-04-20lei-sigpipe: update and move test from xt => t
We have "lei import" and better test infrastructure for lei, now, so we can more easily test SIGPIPE without relying on an already-configured instance.
2021-04-11www: do not obfuscate addresses in URLs
As they are likely Message-IDs. If an email address ends up in a URL, then it's likely public, so there's even less reason to obfuscate that particular address. [km: add xt/perf-obfuscate.t] [ew: modernize perf test (5.10.1), use diag instead of print] This version of the patch avoids the massive slowdown noted by Kyle in <https://public-inbox.org/meta/87wnt9or6t.fsf@kyleam.com/>. Performance remains roughly the same, if not slightly faster (which may be due to me testing this on a busy server). Results from xt/perf-obfuscate.t against 6078 messages on a local mirror of <https://public-inbox.org/meta/>: before: 6.67 usr + 0.04 sys = 6.71 CPU after: 6.64 usr + 0.04 sys = 6.68 CPU Reported-by: Kyle Meyer <kyle@kyleam.com> Helped-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87a6q8p5qa.fsf@kyleam.com/
2021-04-03xt/lei-auth-fail: test more failure cases
Because failures are often overlooked, unfortunately.
2021-03-31lei: fix IMAP auth failure handling
We must use the $ops hashref returned by lei->workers_start, since it's modified to include extra handlers for auth failures and whatnot. Fixes: 954581b8e575966a ("lei: simplify PktOp callers")
2021-03-28test_common: require_mods bundles
This makes it easier to manage test dependencies on systems where optional stuff isn't installed. This fixes some lei tests which didn't check for Plack before starting -httpd, and ensures Parse::RecDescent is available for -imapd in case Mail::IMAPClient stops using it.
2021-03-19xt/create-many-inboxes: adjust for detect_nproc, no fsync
detect_nproc is in the IPC module, now; and we can safely disable fsync when creating test data. And "modernize" up to 5.10.1 while we're at it. The use fsync was causing this to run for hours instead of minutes since I forgot to use eatmydata.
2021-03-12msg_part_text: discover text in application/octet-stream
Some poorly-configured MUAs will send application/octet-stream even for text-only attachments. We can't make expect all MUAs are configured with proper MIME types, and there is plenty of historical mail that falls into this unfortunate criteria. v2: simplify the check and ensures returned text is Perl "utf8"
2021-03-10watch: IMAP: ignore \Deleted and \Draft messages
This matches existing Maildir behavior, as trash and draft messages have little reason to be exposed publicly.
2021-03-09lei q: remove angle brackets around Message-IDs
They're unnecessary visual noise, and angle brackets don't always work as intended when going through Xapian's query parser. Since we already use "m:" and "refs:" instead of the actual header names, it should be obvious we're at liberty to abbreviate such things Link: https://public-inbox.org/meta/20210304184348.GA19350@dcvr/
2021-03-05search: use "z:" instead of "bytes:" prefix
So far, searching by size has never been publicly documented, and IMHO, of questionable utility. In any case, "z:" is what mairix(1) uses, so it may be familiar to existing mairix users (I've never used this prefix myself). So far, this prefix is only used internally in tests and in auto-translated queries from IMAP; thus this incompatible change is unlikely to affect anyone.
2021-03-04lei q: support --import-augment for IMAP
IMAP is similar to Maildir and we can now preserve keyword updates done on IMAP folders.
2021-02-26lei convert: support IMAP output and "-F eml" inputs
eml ("message/rfc822" MIME type) is supported by "lei import", so it probably makes sense to support via convert, at least for tests. And IMAP support is supported in "lei q -o $MFOLDER", so this only required renaming {nrd} => {net} and initializing outputs before augment preparation (creating the IMAP folder)
2021-02-21net_reader: use and accept URIimap objects in more places
This flexibility should save us some code down-the-line.
2021-02-21lei q: support IMAP/IMAPS --output destinations
Augment (and dedupe) aren't parallel, yet, so its more sensitive to high-latency networks.
2021-02-19URIimap: overload "" to ->as_string
This interpolation is used by the upstream URI package and we rely on it elsewhere for HTTP(S) URIs, so save ourselves some surprises down the line.
2021-02-19net_writer: start implementing IMAP write support
Requiring TEST_IMAP_WRITE_URL to be set to a writable IMAP server URL isn't ideal, but it works for now until we have time to setup a mock dovecot/cyrus/etc... instance for testing.
2021-02-19tests: require Mail::IMAPClient for IMAP tests
All of our current IMAP code relies on Mail::IMAPClient at the moment, so ensure we skip those tests on systems without that module.
2021-02-18lei: check for IMAP auth errors
We need to ensure authentication failures and error codes get propagated to the parent process(es) properly. v2: update MANIFEST v3: LeiAuth.pm ->_lei_cfg bit moved to a previous commit
2021-02-08tests: favor IPv6
IPv4 gets plenty of real-world coverage, and apparently there's Debian buildd hosts which lack IPv4(*). So ensure everything can work on IPv6 and not cause problems for odd setups. (*) https://bugs.debian.org/979432
2021-02-07ipc: wq_do => wq_io_do
We will have a ->wq_do that doesn't pass FDs for I/O.
2021-02-03lei q: do not leave temporary files after oneshot exit
Avoid on-stack shortcuts which may prevent destructors from firing since we're not inside the event loop. We'll also tidy up the unlink mechanism in LeiOverview while we're at it.
2021-02-03lei q: emit progress and counting via PktOp
Sometimes it can be confusing for "lei q" to finish writing to a Maildir|mbox and not know if it did anything. So show some per-external progress and stats. These can be disabled via the new --quiet/-q switch. We differ slightly from mairix(1) here, as we use stderr instead of stdout for reporting totals (and we support parallel queries from various sources).
2021-02-01sharedkv: use lock_for_scope_fast
This allows us to avoid repeated open() and close() syscalls and speeds up the new xt/stress-sharedkv.t maintainer test by roughly 7%.
2021-01-26use defined-or in a few more places
Mainly around fork() calls, but some nearby places as well.
2021-01-21lei q: fix SIGPIPE handling from lei2mail workers
We need to properly propagate SIGPIPE to the top-level lei-daemon process and avoid relying on auto-close, since auto-close triggers Perl warnings when explicit close() does not.
2021-01-14lei: test SIGPIPE, stop xsearch workers on client abort
The new test ensures consistency between oneshot and client/daemon users. Cancelling an in-progress result now also stops xsearch workers to avoid wasted CPU and I/O. Note the lei->atfork_child_wq usage changes, it is to workaround a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> This switches the internal protocol to use SOCK_SEQPACKET AF_UNIX sockets to prevent merging messages from the daemon to client to run pager and kill/exit the client script.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-23xt: add create-many-inboxes helper test
I've been using something like this to mock out thousands of inboxes for testing.
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-10-17actually remove xt/eml_check_roundtrip.t
Fixes: 6550226296e9db79 ("xt: remove eml_check_roundtrip")
2020-09-26xt: add eml ->as_string round trip checker
Unlike Email::MIME, PublicInbox::Eml::as_string should be able to round trip from the Perl object to a raw scalar and back without changes.
2020-09-10use "\&" where possible when referring to subroutines
"*foo" is ambiguous in that it may refer to a bareword file handle; so we'll use it where we can without triggering warnings. PublicInbox::TestCommon::run_script_exit required dropping the prototype, however. We'll also future-proof by dropping "use warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm while we're in the area.
2020-09-10xt/solver: test with public-inbox-httpd, too
We'll be making changes to solver to make it even fairer to slow clients on slow storage. Ensure we test with public-inbox-httpd-specific codepaths, since the generic PSGI code paths are rare in production use.
2020-09-03tests: add "use strict" and declare v5.10.1 compatibility
strict.pm helped me find a typo in an upcoming recent change, so ensure we use it since it does more good than harm. We'll also take the opportunity here to declare v5.10.1 compatibility level to future-proof against Perl incompatibilities.
2020-09-03use more idiomatic internal API for ->over access
{over_ro} being a part of the Search object is a historical oddity which will go away, soon. Lets start removing its use in tests and rarely-used helper scripts.
2020-07-26xt/imapd-mbsync-oimapd: fix noop due to case sensitivity
mbsync was not retrieving anything since it was looking for "inbox" when we need to return "INBOX" as a special case for IMAP. Fixes: 8af34015e9aa94e5 (imap: LIST shows "INBOX" in all caps)
2020-07-13xt/mem-imapd-tls: avoid EMFILE in -imapd process
Test::More dups standard FDs and may create FDs for other purposes. run_mode => 0 lets us rely on FD_CLOEXEC to ensure -imapd has enough FDs to accept all incoming connections at the cost of higher (one-off) startup time.
2020-07-06xt/httpd-async-stream: allow more options
We want to be able to parallelize and stress test more endpoints and toggle `--compressed' and possibly other options in curl.
2020-07-06mboxgz: do asynchronous git blob retrievals
This lets the -httpd worker process make better use of time instead of waiting for git-cat-file to respond. With 4 jobs in the new test case against a clone of <https://public-inbox.org/meta/>, a speedup of 10-12% is shown. Even a single job shows a 2-5% improvement on an SSD.
2020-06-28ds: remove fields.pm usage
Since the removal of pseudo-hash support in Perl 5.10, the "fields" module no longer provides the space or speed benefits it did in 5.8. It also does not allow for compile-time checks, only run-time checks. To me, the extra developer overhead in maintaining "use fields" args has become a hassle. None of our non-DS-related code uses fields.pm, nor do any of our current dependencies. In fact, Danga::Socket (which DS was originally forked from) and its subclasses are the only fields.pm users I've ever encountered in the wild. Removing fields may make our code more approachable to other Perl hackers. So stop using fields.pm and locked hashes, but continue to document what fields do for non-trivial classes.
2020-06-16imap: *SEARCH: use Parse::RecDescent
For properly parsing IMAP search requests, it's easier to use a recursive descent parser generator to deal with subqueries and the "OR" statement. Parse::RecDescent was chosen since it's mature, well-known, widely available and already used by our optional dependencies: Inline::C and Mail::IMAPClient. While it's possible to build Xapian queries without using the Xapian string query parser; this iteration of the IMAP parser still builds a string which is passed to Xapian's query parser for ease-of-diagnostics. Since this is a recursive descent parser dealing with untrusted inputs, subqueries have a nesting limit of 10. I expect that is more than adequate for real-world use.
2020-06-15testcommon: allow OR-ing module dependencies
IMAP requires either the Email::Address::XS or Mail::Address package (part of perl-MailTools RPM or libmailtools-perl deb); and Email::Address::XS is not officially packaged for some older distros, most notably CentOS 7.x.
2020-06-13imap: introduce memory-efficient uo2m mapping
Since we limit our mailboxes slices to 50K and can guarantee a contiguous UID space for those mailboxes, we can store a mapping of "UID offsets" (not full UIDs) to Message Sequence Numbers as an array of 16-bit unsigned integers in a 100K scalar. For UID-only FETCH responses, we can momentarily unpack the compact 100K representation to a ~1.6M Perl array of IV/UV elements for a slight speedup. Furthermore, we can (ab)use hash key deduplication in Perl5 to deduplicate this 100K scalar across all clients with the same mailbox slice open. Technically we can increase our slice size to 64K w/o increasing our storage overhead, but I suspect humans are more accustomed to slices easily divisible by 10.
2020-06-13xt/*: show some tunable parameters
This will make it easier to show parameters used for testing and potential tweaks to be made.
2020-06-13imap: omit $UID_END from mailbox name, use index
Having two large numbers separated by a dash can make visual comparisons difficult when numbers are in the 3,000,000 range for LKML. So avoid the $UID_END value, since it can be calculated from $UID_MIN. And we can avoid large values of $UID_MIN, too, by instead storing the block index and just multiplying it by 50000 (and adding 1) on the server side. Of course, LKML still goes up to 72, at the moment.
2020-06-13imap: require ".$UID_MIN-$UID_END" suffix
Finish up the IMAP-only portion of iterative config reloading, which allows us to create all sub-ranges of an inbox up front. The InboxIdler still uses ->each_inbox which will struggle with 100K inboxes. Having messages in the top-level newsgroup name of an inbox will still waste bandwidth for clients which want to do full syncs once there's a rollover to a new 50K range. So instead, make every inbox accessible exclusively via 50K slices in the form of "$NEWSGROUP.$UID_MIN-$UID_END". This introduces the DummyInbox, which makes $NEWSGROUP and every parent component a selectable, empty inbox. This aids navigation with mutt and possibly other MUAs. Finally, the xt/perf-imap-list maintainer test is broken, now, so remove it. The grep perlfunc is already proven effective, and we'll have separate tests for mocking out ~100k inboxes.
2020-06-13xt/perf-imap-list: time refresh_inboxlist
It's useful to know how fast SIGHUP can be handled, too.
2020-06-13xt: add imapd-validate and imapd-mbsync-oimap
imapd-validate is a beefed up version of our nntpd-validate test which hammers the server with parallel connections over regular IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of each of those. It uses $START_UID:$END_UID fetch ranges to reduce requests and slurp many responses at once to saturate "git cat-file --batch" processes. mbsync(1) also uses pipelining extensively (but IMHO unnecessarily), so it was able to shake out some bugs in the async git code. Finally, we remove xt/cmp-imapd-compress.t since it's redundant now that we have PublicInbox::IMAPClient to work around bugs in Mail::IMAPClient.
2020-06-13add imapd compression test
Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient stalls with compression enabled: https://rt.cpan.org/Ticket/Display.html?id=132720