about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2019-12-24searchidx: call "++" on PostingIterator instead of "->inc"
The "++" is not yet available in the SWIG-based "Xapian.pm" Perl bindings, so use "++" where it's supported in both the XS (Search::Xapian) and SWIG-based Xapian binding.
2019-12-24testcommon: add require_mods method and use it
This cuts down on lines of code in individual test cases and fixes some misnamed error messages by using "$0" consistently. This will also provide us with a method of swapping out dependencies which provide equivalent functionality (e.g "Xapian" SWIG can replace "Search::Xapian" XS bindings).
2019-12-24remove "no warnings 'once'" in a few places
We can use "use" to get the namespace into the "BEGIN" phase of the interpreter. While we're at it, use \&coderef syntax explicitly instead of globbing everything.
2019-12-22nntp: cmd_xover: use named sub for long_response
Introduce xover_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: hdr_msg_id: use named sub for long_response
Introduce hdr_msgid_range_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: cmd_newnews: use named sub for long_response
Introduce newnews_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: cmd_listgroup: use named subs for long_response
Introduce listgroup_range_i and listgroup_all_i subs which do the same things as the anonymous subs they replace.
2019-12-22nntp: cmd_xrover: use named sub for long_response
Introduce xrover_i which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: hdr_searchmsg: use named sub for numeric range response
Introduce searchmsg_range_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: remove cyclic refs from long_response
Leftover cyclic references are a source of memory leaks. While our code is AFAIK unaffected by such leaks at the moment, eliminating a potential source of bugs will make maintenance easier. We make the long_response API cycle-free by stashing the callback into the NNTP object. However, callers will need to be updated to get rid of the circular reference to $self. We do that be replacing anonymous subs with name subroutine references, such as xref_range_i replacing the formerly anonymous sub inside hdr_xref.
2019-12-22nntp: get_range: return scalarref for $beg
...Instead of just returning a plain scalar inside an arrayref. This is because we usually pass the result of NNTP::get_range to Msgmap::msg_range. Upcoming changes will move us away from anonymous subroutines, so this change will make followup commits easier-to-digest as modifications to the underlying scalar can be more easily propagated between non-anonymous-subs.
2019-12-22http: avoid anonymous sub for getline callback
We can avoid the danger of self-referential subs entirely for code internal to PublicInbox::HTTP. This change was only made possible by commit 8e1c3155da4edc082e8e3d8b30351f0c861757a7 ("ds: pass $self to code references")
2019-12-22http: get rid of anonymous subs for write/close
Each sub costs us several kilobytes of memory for every response we make. An arrayref only costs 80 bytes on 64-bit, so bless that to packages with appropriate ->write and ->close methods.
2019-12-22nntp: get rid of some unused imports
Our NNTP code no longer relies on search or Xapian. Msgmap and Git modules are loaded anyways through Inbox->(git|mm|over) methods, however.
2019-12-22nntp: simplify method detection using UNIVERSAL::can
No need to do an eval dance or disable strict refs.
2019-12-22testcommon: require_git: use "plan" from Test::More
require_git is no longer in the "::main" namespace, so we must call Test::More::plan() explicitly.
2019-12-21searchview: save a column in &x=t thread skeleton
Displaying "100%" wastes a precious column. Show "99%" instead since there's little practical difference and <xapian/mset.h> states: Note that these generally aren't percentages of anything meaningful (unless you use a custom weighting formula where they are!) And we're not using a custom weighting formula.
2019-12-20view: show percentage in search results thread skeleton
The displays the Xapian ->get_percent value in the skeleton to improve scanning of relevancy; irrelevant results do not display that. This fixes broken #anchor links introduced in the previous commit, irrelevant messages now link to the /$INBOX/$MESSAGE_ID page.
2019-12-20searchthread: fix usage of user-supplied parameter
Instead of only passing an Inbox object, we'll pass the $ctx reference as PublicInbox::SearchView::mset_thread did. So although mset_thread was wrong, we now make it's usage of SearchThread::thread correct and update other callers to favor the new style of passing the entire $ctx (with ->{-inbox}) instead of just the Inbox object. This makes the thread skeleton at the bottom of the search page to show subjects of messages, but unfortunately links to non-existent #anchors. The next commit will fix that. While we're at it, favor "\&foo" over "*foo" since the former makes the code reference (aka "function pointer) obvious so it won't be confused for other things named "foo" in that scope (e.g. $foo/@foo/%foo).
2019-12-20testcommon: fix run_script for older Perls
Using Perl "open" to dup(2) and save the old handles is required since "local *STDIN = *STDIN" does not work on old Perls. Even worse, this was silently a no-op when tested with Perl 5.24.1 on Debian 9.x and led to confusing failures in the t/httpd-corner.t lsof(1) tests when run after t/v2mirror.t from the same worker process using t/run.perl.
2019-12-19t/run.perl: to avoid repeated process spawning for *.t
Spawning a new Perl interpreter for every test case means Perl has to reparse and recompile every single file it needs, costing us performance and development time. Now that we've modified our code to avoid global state, we can preload everything we need. The new "check-run" test target is now 20-30% faster than the original "check" target.
2019-12-19tests: move t/common.perl to PublicInbox::TestCommon
We want to be able to use run_script with *.t files, so t/common.perl putting subs into the top-level "main" namespace won't work. Instead, make it a module which uses Exporter like other libraries.
2019-12-19msgiter: msg_part_text returns undef on text/html
We want HTML parts to be downloadable, but not displayed as unreadable (but injection-safe) HTML source in our own web and Atom interfaces. This affects indexing, too, as HTML tags/comments won't be indexed anymore, but existing indices are only cleaned after --reindex. HTML-only mail won't be indexed at all, but we won't cross that bridge until somebody cares about that crap. We'll continue to actively discourage such waste of CPU cycles, bandwidth, cache and storage. Fixes: 7d82a8bc04ce2e68 (handle "multipart/mixed" messages which are not multipart')
2019-12-18viewvcs: flesh out some functionality and test
Expose MAX_SIZE via "our" will make it possible to use in tests, and configure, later. Additionally, returning HTTP 500 code for big files is not an Internal Server Error, just a memory limit... Some browsers won't show our HTML response with the link to the raw file in case of errors, either, so we'll return 200 to ensure users can use the link to access the raw blob. Finally, throw in some tests to the existing solver_git testcase, since that was incomplete and was pointlessly loading Plack modules without testing PSGI.
2019-12-16daemon: drop listeners early in master on graceful shutdown
For users not relying on socket activation via systemd (or similar), we want to drop listeners ASAP so another process can bind to their address. While we're at it, disable TTIN and HUP handlers since we have no chance of starting usable workers without listeners.
2019-12-16daemon: shorten lifetime of listener_names mapping
Keeping a ref to the IO::Socket handle was preventing close(2) from being invoked on graceful shutdown of worker.
2019-12-15address: explicitly reject local-only addresses
Apparently, neither our previous address parsing code nor Email::Address::XS recognizes local, username-only addresses in the form of <username> (without "@host"). Without this change, Email::Address::XS->address would return "undef", so we need to filter it out via "grep { defined }" It seems the cases where users email each other on the same machine is small and public-inbox won't be able to index addresses for those cases... Oh well :/
2019-12-15address: use Email::Address::XS if available
Email::Address::XS is a dependency of modern versions of Email::MIME, so it's likely loaded and installed on newer systems, already; and capable of handling more corner-cases than our pure-Perl fallback. We still fallback to the imperfect-but-good-enough-in-practice pure-Perl code while avoiding the non-XS Email::Address (which was susceptible to DoS attacks (CVE-2015-7686)). We just need to keep "git fast-import" happy.
2019-12-15address: use comment as name if no phrase available
Some users will set their From: headers in the form of: "<user@example.com> (A U Thor)", where their name is in the parenthesized comment. Use that instead of the email address, if available.
2019-12-15inbox: fix periodic git process cleanup
We need to use $PublicInbox::DS::in_loop instead of ::running(). The latter is not valid for systems with signalfd or kqueue and is now gone, completely. Not needing periodic cleanups at all to deal with unlinked pack indices will be a tougher task...
2019-12-15searchidx: do not modify read-only $1 via git_unquote
git_unquote works in-place, and we sometimes see strange filenames, or badly munged diffs with terminal escape characters (for colorization) end up in emails.
2019-12-14daemon: use DESTROY for unlinking --pid-file
This gets rid of the last "END{}" block in our code and cleans up a (temporary) circular reference. Furthermore, ensure the cleanup code still works in all configurations by adding tests and testing both the -W1 (default, 1 worker) and -W0 (no workers) code paths.
2019-12-14ds: move NNTP-only expiration code into DS
We'll be supporting idle timeout for the HTTP code in the future to deal directly with Internet-exposed clients w/o Varnish or nginx.
2019-12-14ds: move EvCleanup code into DS
EvCleanup only existed since Danga::Socket was a separate component, and cleanup code belongs with the event loop.
2019-12-12mbox: do not try to create filename from empty string
This was causing warnings to pop up in syslogs for messages with empty Subject headers.
2019-12-12msgtime: avoid obviously out-of-range dates (for now)
Wacky dates show up in lore for valid messages. Lets ignore them and let future generations deal with Y10K and time-travel problems.
2019-12-12Date::Parse is now optional
-mda should not be dealing with broken Date: headers nowadays, and deprioritize it in our documentation and internal checks.
2019-12-12msgtime: drop Date::Parse for RFC2822
Date::Parse is not optimized for RFC2822 dates and isn't packaged on OpenBSD. It's still useful for historical email when email clients were less conformant, but is less relevant for new emails.
2019-12-12git: async batch interface
This is a transitionary interface which does NOT require an event loop. It can be plugged into in current synchronous code without major surgery. It allows HTTP/1.1 pipelining-like functionality by taking advantage of predictable and well-specified POSIX pipe semantics by stuffing multiple git cat-file requests into the --batch pipe With xt/git_async_cmp.t and GIANT_GIT_DIR=git.git, the async interface is 10-25% faster than the synchronous interface since it can keep the "git cat-file" process busier. This is expected to improve performance on systems with slower storage (but multiple cores).
2019-12-11import: (cleanup) drop redundant env arg to run_die
run_die() doesn't require an $env arg, so there's no point passing "undef" to it.
2019-12-11spawn: remove support for clearing the env
It's unnecessary code which I'm not sure we ever used. In retrospect, completely clearing the environment doesn't make sense for the processes we spawn. We don't need to clobber individual environment variables in our code, either (and if we did for tests, we can use 'local').
2019-12-11ds: ->Reset initializes $nextq
I haven't noticed this being a problem in practice, but be consistent with the rest of the singleton stuff. Since we always call Reset() at load time, only do initialization in that sub and not at declaration.
2019-11-29replace: quiet "git gc" invocation
Since we give users no indication or control of how "git gc" runs, showing its progress is confusing.
2019-11-27httpd|nntpd: avoid missed signal wakeups
Our attempt at using a self-pipe in signal handlers was ineffective, since pure Perl code execution is deferred and Perl doesn't use an internal self-pipe/eventfd. In retrospect, I actually prefer the simplicity of Perl in this regard... We can use sigprocmask() from Perl, so we can introduce signalfd(2) and EVFILT_SIGNAL support on Linux and *BSD-based systems, respectively. These OS primitives allow us to avoid a race where Perl checks for signals right before epoll_wait() or kevent() puts the process to sleep. The (few) systems nowadays without signalfd(2) or IO::KQueue will now see wakeups every second to avoid missed signals.
2019-11-27dskqxs: fix missing EV_DISPATCH define
Oops, IO::KQueue support was broken due to this missing constant. Add a new ds-kqxs.t test case to ensure we test the IO::KQueue path if IO::KQueue is available.
2019-11-27msgtime: deal with strange minutes in TZ offsets
I'm not sure if TZ minute offsets aside from '00' or '30' exist, but lets just deal with them properly when negative. Examples taken from various inboxes on lore.kernel.org. These are mostly message from spammers, but some are legitimate messages.
2019-11-24xapcmd: replace Xtmpdirs with File::Temp->newdir
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't need Xtmpdirs at all for cleaning up tempdirs on failure and can just rely on the DESTROY handler provided by File::Temp.
2019-11-24daemon: avoid race when quitting workers
While the master process has a self-pipe to avoid missing signals, worker processes lack that aside from a pipe to detect master death. That pipe doesn't exist when there's no master process, so it's possible DS::close never finishes because it never woke up from epoll_wait. So create a pipe on the worker_quit signal and force it into epoll/kevent so it wakes up right away.
2019-11-24daemon: use sigprocmask when respawning workers
We need to block signals in workers during respawns until they're ready to receive signals.
2019-11-24daemon: use sigprocmask to block signals at startup
`$SIG{FOO} = "IGNORE"' will cause the daemon to miss signals entirely. Instead, we can use sigprocmask to block signal delivery until we have our signal handlers setup. This closes a race where a PID file can be written for an init script and a signal to be dropped via "IGNORE".