about summary refs log tree commit homepage
DateCommit message (Collapse)
2019-12-19tests: move t/common.perl to PublicInbox::TestCommon
We want to be able to use run_script with *.t files, so t/common.perl putting subs into the top-level "main" namespace won't work. Instead, make it a module which uses Exporter like other libraries.
2019-12-19t/*.t: avoid sharing "my" variables in subs
These usages of file-local global variables make the *.t files incompatible with run_script(). Instead, use anonymous subs, "our", or pass the parameter as appropriate.
2019-12-19msgiter: msg_part_text returns undef on text/html
We want HTML parts to be downloadable, but not displayed as unreadable (but injection-safe) HTML source in our own web and Atom interfaces. This affects indexing, too, as HTML tags/comments won't be indexed anymore, but existing indices are only cleaned after --reindex. HTML-only mail won't be indexed at all, but we won't cross that bridge until somebody cares about that crap. We'll continue to actively discourage such waste of CPU cycles, bandwidth, cache and storage. Fixes: 7d82a8bc04ce2e68 (handle "multipart/mixed" messages which are not multipart')
2019-12-18Makefile.PL: sort target and var lists
Sorting makes it easier to review the generated result.
2019-12-18viewvcs: flesh out some functionality and test
Expose MAX_SIZE via "our" will make it possible to use in tests, and configure, later. Additionally, returning HTTP 500 code for big files is not an Internal Server Error, just a memory limit... Some browsers won't show our HTML response with the link to the raw file in case of errors, either, so we'll return 200 to ensure users can use the link to access the raw blob. Finally, throw in some tests to the existing solver_git testcase, since that was incomplete and was pointlessly loading Plack modules without testing PSGI.
2019-12-18TODO: add UUCP address item
We should support historical archives from the old days, but I'm not sure how to best go about it, for now, given how tricky correct handling of modern email addresses is. We can deal with it if/when somebody decides to import some ancient archives...
2019-12-17Makefile.PL: allow overriding "prove" from make CLI
Development versions of Perl install "prove$VERSION" where $VERSION is something like "5.31.7". This makes it easier to test everything we have against development versions of Perl5. Note: I could not find a way to get quoting right to use the "--exec $(PERL)" option of prove(1), but that would be the best option for working transparently after running: perl5.31.7 Makefile.PL
2019-12-17t/edit.t: drop redundant "delete local $ENV{...}"
"delete local" is only in Perl v5.11.0, and we only depend on Perl v5.10.1. We already localize and delete it as two separate statements immediately above. I wish this was hidden behind a "use feature" flag like other new-fangled things :<. Oh well, I think the oldest Perl actually in use for this project is 5.16 (CentOS 7.x).
2019-12-16daemon: drop listeners early in master on graceful shutdown
For users not relying on socket activation via systemd (or similar), we want to drop listeners ASAP so another process can bind to their address. While we're at it, disable TTIN and HUP handlers since we have no chance of starting usable workers without listeners.
2019-12-16daemon: shorten lifetime of listener_names mapping
Keeping a ref to the IO::Socket handle was preventing close(2) from being invoked on graceful shutdown of worker.
2019-12-15address: explicitly reject local-only addresses
Apparently, neither our previous address parsing code nor Email::Address::XS recognizes local, username-only addresses in the form of <username> (without "@host"). Without this change, Email::Address::XS->address would return "undef", so we need to filter it out via "grep { defined }" It seems the cases where users email each other on the same machine is small and public-inbox won't be able to index addresses for those cases... Oh well :/
2019-12-15address: use Email::Address::XS if available
Email::Address::XS is a dependency of modern versions of Email::MIME, so it's likely loaded and installed on newer systems, already; and capable of handling more corner-cases than our pure-Perl fallback. We still fallback to the imperfect-but-good-enough-in-practice pure-Perl code while avoiding the non-XS Email::Address (which was susceptible to DoS attacks (CVE-2015-7686)). We just need to keep "git fast-import" happy.
2019-12-15address: use comment as name if no phrase available
Some users will set their From: headers in the form of: "<user@example.com> (A U Thor)", where their name is in the parenthesized comment. Use that instead of the email address, if available.
2019-12-15inbox: fix periodic git process cleanup
We need to use $PublicInbox::DS::in_loop instead of ::running(). The latter is not valid for systems with signalfd or kqueue and is now gone, completely. Not needing periodic cleanups at all to deal with unlinked pack indices will be a tougher task...
2019-12-15searchidx: do not modify read-only $1 via git_unquote
git_unquote works in-place, and we sometimes see strange filenames, or badly munged diffs with terminal escape characters (for colorization) end up in emails.
2019-12-14daemon: use DESTROY for unlinking --pid-file
This gets rid of the last "END{}" block in our code and cleans up a (temporary) circular reference. Furthermore, ensure the cleanup code still works in all configurations by adding tests and testing both the -W1 (default, 1 worker) and -W0 (no workers) code paths.
2019-12-14ds: move NNTP-only expiration code into DS
We'll be supporting idle timeout for the HTTP code in the future to deal directly with Internet-exposed clients w/o Varnish or nginx.
2019-12-14ds: move EvCleanup code into DS
EvCleanup only existed since Danga::Socket was a separate component, and cleanup code belongs with the event loop.
2019-12-12Makefile.PL: fix "dsyn" target
The "dsyn" target needs to remain working, despite still being dependent on GNU-isms at the moment. Fixes: 73fe3421f1ecbdc8 ("build: support doc generation w/o GNU make")
2019-12-12mbox: do not try to create filename from empty string
This was causing warnings to pop up in syslogs for messages with empty Subject headers.
2019-12-12msgtime: avoid obviously out-of-range dates (for now)
Wacky dates show up in lore for valid messages. Lets ignore them and let future generations deal with Y10K and time-travel problems.
2019-12-12Date::Parse is now optional
-mda should not be dealing with broken Date: headers nowadays, and deprioritize it in our documentation and internal checks.
2019-12-12msgtime: drop Date::Parse for RFC2822
Date::Parse is not optimized for RFC2822 dates and isn't packaged on OpenBSD. It's still useful for historical email when email clients were less conformant, but is less relevant for new emails.
2019-12-12add msgtime_cmp maintainer test
Changes will be coming for MsgTime to stop depending on Date::Parse due to lack of package availability on OpenBSD and suboptimal performance on RFC822 dates.
2019-12-12git: async batch interface
This is a transitionary interface which does NOT require an event loop. It can be plugged into in current synchronous code without major surgery. It allows HTTP/1.1 pipelining-like functionality by taking advantage of predictable and well-specified POSIX pipe semantics by stuffing multiple git cat-file requests into the --batch pipe With xt/git_async_cmp.t and GIANT_GIT_DIR=git.git, the async interface is 10-25% faster than the synchronous interface since it can keep the "git cat-file" process busier. This is expected to improve performance on systems with slower storage (but multiple cores).
2019-12-11build: support doc generation w/o GNU make
We can replace the GNU-isms for building docs with Perl5 equivalents. The only downside is the resulting Makefile gets larger, but that's the price of portability.
2019-12-11import: (cleanup) drop redundant env arg to run_die
run_die() doesn't require an $env arg, so there's no point passing "undef" to it.
2019-12-11spawn: remove support for clearing the env
It's unnecessary code which I'm not sure we ever used. In retrospect, completely clearing the environment doesn't make sense for the processes we spawn. We don't need to clobber individual environment variables in our code, either (and if we did for tests, we can use 'local').
2019-12-11tests: don't repeatly validate NEWS.atom
We can create a stamp to avoid rerunning the check unless NEWS.atom changes (and it will, soon, I hope :>).
2019-12-11TODO: update and add a few more items
SpamAssassin has used re2c (via sa-compile) for many years, now, and it seems to work fine, there. GMime also looks promising when combined with Inline::C since GMime can operate on mmap-ed regions. Given the inevitable demise of many .orgs when price rise; supporting a URL rewriter similar to .mailmap makes sense. And HTTP CONNECT seems like something our -httpd can support to let firewalled users read over NNTP.
2019-12-11ds: ->Reset initializes $nextq
I haven't noticed this being a problem in practice, but be consistent with the rest of the singleton stuff. Since we always call Reset() at load time, only do initialization in that sub and not at declaration.
2019-12-11t/common: set $0 when running script w/o fork
We can localize changes to $0 so $0 is restored when the "script" sub is done. This will be helpful when we encounter a stuck/slow processes during our tests (hopefully never!)
2019-12-11t: localize the PI_CONFIG env
We don't want the user's ~/.public-inbox/config to be read from during tests. I only noticed this because I had a non-existent pathname for one of my inboxes :x I've also verified this change by running "inotifywait ~/.public-inbox/config -m" in another terminal while running "make check"; (perhaps a portable solution could make it into the test suite).
2019-11-29replace: quiet "git gc" invocation
Since we give users no indication or control of how "git gc" runs, showing its progress is confusing.
2019-11-29t/replace: quiet "git fsck" invocation
Test output can be a terminal if running as "perl -I lib t/$FOO.t", and showing fsck progress is pointless for tests.
2019-11-28t/httpd-unix: FreeBSD expects to fail with EADDRINUSE
Tested FreeBSD 11.2. I'm starting to think I'm too conservative with this check and it could be safely expanded to cover any OS with UNIX sockets.
2019-11-27Makefile.PL: MANIFEST dependency fix
We need to force an update to Makefile (not Makefile.PL) when MANIFEST changes. Since "Makefile" (aka. "$(FIRST_MAKEFILE)") is already a single-colon make target; we can't create a double-colon rule to augment it. So we'll continue using a "Makefile.PL" rule, but have it recreate the resulting Makefile Finally, change the "check" target to use "prove -b" instead of "prove -l" so we test against "blib/lib", since what's in the "blib" dir will be installed. Fixes: 4c20de0694d06ff3 ("Makefile.PL: add dependency on MANIFEST contents")
2019-11-27httpd|nntpd: avoid missed signal wakeups
Our attempt at using a self-pipe in signal handlers was ineffective, since pure Perl code execution is deferred and Perl doesn't use an internal self-pipe/eventfd. In retrospect, I actually prefer the simplicity of Perl in this regard... We can use sigprocmask() from Perl, so we can introduce signalfd(2) and EVFILT_SIGNAL support on Linux and *BSD-based systems, respectively. These OS primitives allow us to avoid a race where Perl checks for signals right before epoll_wait() or kevent() puts the process to sleep. The (few) systems nowadays without signalfd(2) or IO::KQueue will now see wakeups every second to avoid missed signals.
2019-11-27dskqxs: fix missing EV_DISPATCH define
Oops, IO::KQueue support was broken due to this missing constant. Add a new ds-kqxs.t test case to ensure we test the IO::KQueue path if IO::KQueue is available.
2019-11-27msgtime: deal with strange minutes in TZ offsets
I'm not sure if TZ minute offsets aside from '00' or '30' exist, but lets just deal with them properly when negative. Examples taken from various inboxes on lore.kernel.org. These are mostly message from spammers, but some are legitimate messages.
2019-11-27t/msgtime: add more checks for known cases
Broken email clients sent the darndest things, make sure we can still support them when we make Date::Parse optional.
2019-11-27t/msgtime: show date in test descriptions
Otherwise it's hard to figure what fails.
2019-11-24tests: move giant inbox/git dependent tests to xt/
xt/ is typically reserved for "eXtended tests" intended for the maintainers and not ordinary users. Since these require special configuration and do nothing by waste cycles during startup, they qualify.
2019-11-24t/perf-*.t: use $ENV{GIANT_INBOX_DIR} consistently
It's more consistent with our current terminology and "PI_DIR" is already used to override ~/.public-inbox/ (which holds "config" and possibly other files which affect all inboxes for a particular user, but is not an inbox itself); so stop advertising GIANT_PI_DIR in skip messages.
2019-11-24tests: quiet down commit graph
Newer versions of git enable the commit graph by default. Since we blow away our temporary directories every test, generating graphis is a waste and clutters stderr with "Computing commit graph generation numbers" messages.
2019-11-24tests: use File::Temp->newdir instead of tempdir()
We'll also introduce a tmpdir() API to give tempdirs consistent names.
2019-11-24xapcmd: replace Xtmpdirs with File::Temp->newdir
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't need Xtmpdirs at all for cleaning up tempdirs on failure and can just rely on the DESTROY handler provided by File::Temp.
2019-11-24t/nntpd-validate: get rid of threads dependency
Threads are officially discouraged by perl5-porters and proves problematic with my Perl installation when using run_mode=1 to speed up tests. So just use fork() and pipes to share results from Net::NNTP.
2019-11-24t/common: start_script replaces spawn_listener
We can shave several hundred milliseconds off tests which spawn daemons by preloading and avoiding startup time for common modules which are already loaded in the parent process. This also gives ENV{TAIL} support to all tests which support daemons which log to stdout/stderr.
2019-11-24daemon: avoid race when quitting workers
While the master process has a self-pipe to avoid missing signals, worker processes lack that aside from a pipe to detect master death. That pipe doesn't exist when there's no master process, so it's possible DS::close never finishes because it never woke up from epoll_wait. So create a pipe on the worker_quit signal and force it into epoll/kevent so it wakes up right away.