public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-10-17	actually remove xt/eml_check_roundtrip.t
	Fixes: 6550226296e9db79 ("xt: remove eml_check_roundtrip")
2020-09-26	xt: add eml ->as_string round trip checker
	Unlike Email::MIME, PublicInbox::Eml::as_string should be able to round trip from the Perl object to a raw scalar and back without changes.
2020-09-10	use "\&" where possible when referring to subroutines
	"*foo" is ambiguous in that it may refer to a bareword file handle; so we'll use it where we can without triggering warnings. PublicInbox::TestCommon::run_script_exit required dropping the prototype, however. We'll also future-proof by dropping "use warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm while we're in the area.
2020-09-10	xt/solver: test with public-inbox-httpd, too
	We'll be making changes to solver to make it even fairer to slow clients on slow storage. Ensure we test with public-inbox-httpd-specific codepaths, since the generic PSGI code paths are rare in production use.
2020-09-03	tests: add "use strict" and declare v5.10.1 compatibility
	strict.pm helped me find a typo in an upcoming recent change, so ensure we use it since it does more good than harm. We'll also take the opportunity here to declare v5.10.1 compatibility level to future-proof against Perl incompatibilities.
2020-09-03	use more idiomatic internal API for ->over access
	{over_ro} being a part of the Search object is a historical oddity which will go away, soon. Lets start removing its use in tests and rarely-used helper scripts.
2020-07-26	xt/imapd-mbsync-oimapd: fix noop due to case sensitivity
	mbsync was not retrieving anything since it was looking for "inbox" when we need to return "INBOX" as a special case for IMAP. Fixes: 8af34015e9aa94e5 (imap: LIST shows "INBOX" in all caps)
2020-07-13	xt/mem-imapd-tls: avoid EMFILE in -imapd process
	Test::More dups standard FDs and may create FDs for other purposes. run_mode => 0 lets us rely on FD_CLOEXEC to ensure -imapd has enough FDs to accept all incoming connections at the cost of higher (one-off) startup time.
2020-07-06	xt/httpd-async-stream: allow more options
	We want to be able to parallelize and stress test more endpoints and toggle `--compressed' and possibly other options in curl.
2020-07-06	mboxgz: do asynchronous git blob retrievals
	This lets the -httpd worker process make better use of time instead of waiting for git-cat-file to respond. With 4 jobs in the new test case against a clone of <https://public-inbox.org/meta/>, a speedup of 10-12% is shown. Even a single job shows a 2-5% improvement on an SSD.
2020-06-28	ds: remove fields.pm usage
	Since the removal of pseudo-hash support in Perl 5.10, the "fields" module no longer provides the space or speed benefits it did in 5.8. It also does not allow for compile-time checks, only run-time checks. To me, the extra developer overhead in maintaining "use fields" args has become a hassle. None of our non-DS-related code uses fields.pm, nor do any of our current dependencies. In fact, Danga::Socket (which DS was originally forked from) and its subclasses are the only fields.pm users I've ever encountered in the wild. Removing fields may make our code more approachable to other Perl hackers. So stop using fields.pm and locked hashes, but continue to document what fields do for non-trivial classes.
2020-06-16	imap: *SEARCH: use Parse::RecDescent
	For properly parsing IMAP search requests, it's easier to use a recursive descent parser generator to deal with subqueries and the "OR" statement. Parse::RecDescent was chosen since it's mature, well-known, widely available and already used by our optional dependencies: Inline::C and Mail::IMAPClient. While it's possible to build Xapian queries without using the Xapian string query parser; this iteration of the IMAP parser still builds a string which is passed to Xapian's query parser for ease-of-diagnostics. Since this is a recursive descent parser dealing with untrusted inputs, subqueries have a nesting limit of 10. I expect that is more than adequate for real-world use.
2020-06-15	testcommon: allow OR-ing module dependencies
	IMAP requires either the Email::Address::XS or Mail::Address package (part of perl-MailTools RPM or libmailtools-perl deb); and Email::Address::XS is not officially packaged for some older distros, most notably CentOS 7.x.
2020-06-13	imap: introduce memory-efficient uo2m mapping
	Since we limit our mailboxes slices to 50K and can guarantee a contiguous UID space for those mailboxes, we can store a mapping of "UID offsets" (not full UIDs) to Message Sequence Numbers as an array of 16-bit unsigned integers in a 100K scalar. For UID-only FETCH responses, we can momentarily unpack the compact 100K representation to a ~1.6M Perl array of IV/UV elements for a slight speedup. Furthermore, we can (ab)use hash key deduplication in Perl5 to deduplicate this 100K scalar across all clients with the same mailbox slice open. Technically we can increase our slice size to 64K w/o increasing our storage overhead, but I suspect humans are more accustomed to slices easily divisible by 10.
2020-06-13	xt/*: show some tunable parameters
	This will make it easier to show parameters used for testing and potential tweaks to be made.
2020-06-13	imap: omit $UID_END from mailbox name, use index
	Having two large numbers separated by a dash can make visual comparisons difficult when numbers are in the 3,000,000 range for LKML. So avoid the $UID_END value, since it can be calculated from $UID_MIN. And we can avoid large values of $UID_MIN, too, by instead storing the block index and just multiplying it by 50000 (and adding 1) on the server side. Of course, LKML still goes up to 72, at the moment.
2020-06-13	imap: require ".$UID_MIN-$UID_END" suffix
	Finish up the IMAP-only portion of iterative config reloading, which allows us to create all sub-ranges of an inbox up front. The InboxIdler still uses ->each_inbox which will struggle with 100K inboxes. Having messages in the top-level newsgroup name of an inbox will still waste bandwidth for clients which want to do full syncs once there's a rollover to a new 50K range. So instead, make every inbox accessible exclusively via 50K slices in the form of "$NEWSGROUP.$UID_MIN-$UID_END". This introduces the DummyInbox, which makes $NEWSGROUP and every parent component a selectable, empty inbox. This aids navigation with mutt and possibly other MUAs. Finally, the xt/perf-imap-list maintainer test is broken, now, so remove it. The grep perlfunc is already proven effective, and we'll have separate tests for mocking out ~100k inboxes.
2020-06-13	xt/perf-imap-list: time refresh_inboxlist
	It's useful to know how fast SIGHUP can be handled, too.
2020-06-13	xt: add imapd-validate and imapd-mbsync-oimap
	imapd-validate is a beefed up version of our nntpd-validate test which hammers the server with parallel connections over regular IMAP, IMAPS, IMAP+STARTTLS; and COMPRESS=DEFLATE variants of each of those. It uses $START_UID:$END_UID fetch ranges to reduce requests and slurp many responses at once to saturate "git cat-file --batch" processes. mbsync(1) also uses pipelining extensively (but IMHO unnecessarily), so it was able to shake out some bugs in the async git code. Finally, we remove xt/cmp-imapd-compress.t since it's redundant now that we have PublicInbox::IMAPClient to work around bugs in Mail::IMAPClient.
2020-06-13	add imapd compression test
	Include a test for Mail::IMAPTalk, here, since Mail::IMAPClient stalls with compression enabled: https://rt.cpan.org/Ticket/Display.html?id=132720
2020-06-13	imap: use git-cat-file asynchronously
	This ought to improve overall performance with multiple clients. Single client performance suffers a tiny bit due to extra syscall overhead from epoll. This also makes the existing async interface easier-to-use, since calling cat_async_begin is no longer required.
2020-06-13	imap: split out unit tests and benchmarks
	This makes the test code easier-to-manage and allows us to run faster unit tests which don't involve loading Mail::IMAPClient.
2020-05-12	xt/eml_check_limits: check limits against an inbox
	This allows maintainers to easily check limits against the contents of existing inboxes. This script covers most of the new limits enforced by PublicInbox::Eml. Usage is similar to most xt/*.t scripts: GIANT_INBOX_DIR=/path/to/inbox prove -bvw xt/eml_check_limits.t Setting `TEST_CLASS=PublicInbox::MIME' allows us to check performance and memory use against the old subclass of Email::MIME.
2020-05-09	remove most internal Email::MIME usage
	We no longer load or use Email::MIME outside of comparison tests.
2020-05-09	xt: eml comparison tests
	While our codebase can still work with either MIME implementation, add comparison tests to ensure we handle corner cases in existing archives.
2020-04-20	testcommon: spawn-aware system() and qx[] workalikes
	Barely noticeable on Linux, but this gives a 1-2% speedup on a FreeBSD 11.3 VM and lets us use built-in redirects rather than relying on /bin/sh.
2020-04-07	xt/perf-msgview: update to use git->cat_async
	It's about 5-10% faster on an SMP machine with an SSD, even on a hot Linux page cache.
2020-04-05	xt/msgtime_cmp: fix false positives from msgtime change
	commit d857e7dc0d816b635a7ead09c3273f8c2d2434be ("msgtime: assume +0000 if TZ missing when using Date::Parse") introduced a behavior change which was causes false positives when compared to the old code. Update the "old" implementation to match this overdue behavior change.
2020-02-16	view: remove mhref arg from multipart_text_as_html
	No point in passing something on stack only to stash it into the $ctx which holds most other parameters used for rendering the HTML.
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-01-27	viewdiff: add "b=" param with non-standard diff prefix
	<20180228012207.GB251290@aiede.svl.corp.google.com> (posted to git@vger) uses "i" and "w" prefixes instead of the standard "a" and "b" prefixes, ensure we emit a "b=$FILENAME" param for the solver endpoint to improve search accuracy, syntax highlighting, and information density in the URL itself.
2020-01-27	xt/perf-msgview: switch to multipart_text_as_html
	It's a more widely-used (but still internal) API which will probably last longer than msg_html. It also reaches deeper into the stack and avoids the overhead of ->getline via PSGI, so it's faster and gives a more accurate measurement of lower-level parts.
2020-01-13	xt/git_async_cmp: do not slurp large OID list into memory
	I somehow thought "foreach (<$cat>)" could work like "while (<$cat>)" when it came to iterating over file handles...
2020-01-12	xt/mem-msgview.t: change to test one multipart message
	A single multipart message is far more common than a reused Message-ID, so rewrite the test to only have a single multipart message. Memory improvements will be implemented in the next commit.
2020-01-11	make Plack optional for non-WWW and non-httpd users
	Some users just want to run -mda, -watch, and/or -nntpd. Let them run just those without forcing them to pull in a bunch of dependencies.
2020-01-05	view: msg_html: reduce memory use on reused MIDs
	In rare cases where Message-IDs get reused, we do not want to hold onto the large Email::MIME objects in memory after showing the first message. So discard each message as soon as we're done using it so we can save memory for the next message. The new and expensive xt/mem-msgview.t test shows a nearly 14MB reduction for two ~7MB messages. run_script() also gets upgraded to make it easier to pass large inputs via IO GLOBs.
2020-01-04	solver: allow literal '\r' character in diff lines
	While filenames are escaped, the actual diff contents may contain an unescaped "\r" carriage return byte not in front of the "\n" line feed. So just allow "\r" to appear in the middle of a line.
2020-01-04	solver: do not enforce order on extended headers
	This is needed to work with patches with many renames, such as what makes "git/eebf7a8/s/?b=t%2Ftest-lib.sh"
2020-01-04	xt/solver.t: real-world regression tests
	There's a lot of test cases which we should probably make self-contained at some point, but right now it's easier to just mark them off in a maintainer test.
2020-01-01	wwwstatic: implement Last-Modified and If-Modified-Since
	We're already serving static files for cgit, and will serve more static files, soon.
2019-12-24	testcommon: add require_mods method and use it
	This cuts down on lines of code in individual test cases and fixes some misnamed error messages by using "$0" consistently. This will also provide us with a method of swapping out dependencies which provide equivalent functionality (e.g "Xapian" SWIG can replace "Search::Xapian" XS bindings).
2019-12-19	tests: move t/common.perl to PublicInbox::TestCommon
	We want to be able to use run_script with *.t files, so t/common.perl putting subs into the top-level "main" namespace won't work. Instead, make it a module which uses Exporter like other libraries.
2019-12-12	add msgtime_cmp maintainer test
	Changes will be coming for MsgTime to stop depending on Date::Parse due to lack of package availability on OpenBSD and suboptimal performance on RFC822 dates.
2019-12-12	git: async batch interface
	This is a transitionary interface which does NOT require an event loop. It can be plugged into in current synchronous code without major surgery. It allows HTTP/1.1 pipelining-like functionality by taking advantage of predictable and well-specified POSIX pipe semantics by stuffing multiple git cat-file requests into the --batch pipe With xt/git_async_cmp.t and GIANT_GIT_DIR=git.git, the async interface is 10-25% faster than the synchronous interface since it can keep the "git cat-file" process busier. This is expected to improve performance on systems with slower storage (but multiple cores).
2019-11-24	tests: move giant inbox/git dependent tests to xt/
	xt/ is typically reserved for "eXtended tests" intended for the maintainers and not ordinary users. Since these require special configuration and do nothing by waste cycles during startup, they qualify.