public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-09-01	watch: block signals before fork on non-signalfd/kevent systems
	In case there's non-Linux or BSD users w/o IO::KQueue, we shouldn't let signal handlers fire in the child processes. The child processes always assumed signals were blocked by the parent, so no changes were necessary, there.
2020-09-01	watch: avoid unnecessary spawning on spam removals
	This should further mitigate lock contention problems when -watch is configured to watch on a Maildir for spam while performing a large NNTP import. There is now a small risk a message won't get removed because if it's in the current (uncommitted) fast-import batch, but unlikely given the batch size is now only 10 messages. If a that small window is hit, flipping the \Seen flag (e.g. marking it unread, and then read again) will trigger another removal attempt via IMAP or Maildir.
2020-09-01	rename WatchMaildir => Watch
	This is no longer limited to Maildirs now that IMAP and NNTP support exist; so give it a shorter name.
2020-09-01	watchmaildir: use v5.10.1, drop warnings
	Declare 5.10.1 to avoid potential compatibility problems with Perl 7/8 down the line. We'll rely on the command-line to set or drop warnings during development, at least.
2020-09-01	watch: limit batch size of NNTP and IMAP workers, too
	We don't want to monopolize locks because processes can easily block each other if using `watchspam' on a Maildir while a big NNTP or IMAP import is happening. This can also happen if somebody configured a single inbox to watch from several sources to merge several mailboxes into one (e.g. both an IMAP and Maildir are watched).
2020-08-30	imapd: filter out unusable flags from search
	Quiet down logs from -imapd when clients are blindly sending some unsupported flag conditions (e.g. "DRAFT", "DELETED") specified in RFC 3501.
2020-08-28	imaptracker: update_last: simplify callers
	By making it a no-op if last_uid is not defined. This isn't a hot code path, so the extra method dispatch isn't an issue. It'll save some indentation/wrapping in future commits.
2020-08-28	watch: flush changes to inbox before updating IMAPTracker
	Data needs to hit inboxes, first. Otherwise it's possible to skip messages in case git-fast-import is killed before it sees "done\n". Now, -watch will just waste a little bandwidth in re-downloading a seen message if it's interrupted immediately before updating IMAPTracker.
2020-08-28	www: more descriptive pagination
	Being an easily confused person, I find "next" and "prev" ambiguous as to whether messages on the next or previous page will be newer or older than the current page. Clarify that for the threaded /$INBOX/ view and search results. For search results sorted by relevance, we'll use "[>= $SCORE]" or "[<= $SCORE]" to indicate to indicate directionality. This also fixes $INBOX/new.html for unindexed v1 inboxes.
2020-08-28	www: improve navigation around contemporary threads
	Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-08-27	watch: imap: only remove \Seen spam
	This matches the behavior of Maildir `watchspam' handling in not removing unseen messages. NNTP can't match this behavior, since NNTP servers don't store flags, clients do.
2020-08-27	overidx: inline create_ghost sub
	There's no need for this to be a separate sub since there's only a single caller. This saves a few kilobytes at least in short-lived processes.
2020-08-27	imaptracker: preserve WAL journal_mode if set by user
	It's no problem for most users to enable WAL, here, since there's only a single process doing both reading and writing (unlike the read-only daemons). However, WAL doesn't work on network filesystems, so it can't be enabled by default.
2020-08-27	watchmaildir: ensure I:/W:/E: prefixes in warnings
	For consistency in output, any URL/path-context-dependent prefixes should have the same prefix as the actual warning which triggered it.
2020-08-27	git: show more context info on failures
	I'm seeing "read: Connection timed out" from in my syslog from -httpd. The fail() calls in PublicInbox::Git seems to be the only code path of ours which could trigger it... ETIMEDOUT shouldn't happen on pipes, only sockets; and all of our socket operations are non-blocking. So this could be cgit-wwwhighlight-filter.lua, but that's connecting over localhost, though on fairly loaded HW.
2020-08-27	search: allow testing with current xapian.git and 1.5.x
	A `PI_XAPIAN' environment variable is now exposed for testing purposes. We'll also deal with the removal of `NumberValueRangeProcessor' and use `NumberRangeProcessor' in its place, but continue favoring the old Search::Xapian since that's all that's packaged for Debian 10.x stable.
2020-08-27	msgmap: use v5.10.1
	We use the defined-or (`//', `//=') operators in 5.10, so require 5.10.1 like the rest of our codebase. Update an outdated comment while we're at it.
2020-08-27	over*: use v5.10.1, drop warnings
	v5.10.1 lets us use the lighter parent.pm instead of base.pm, and we'll rely on the shebang to enable warnings (or not). While we're in the area, drop a no-longer-necessary import for PublicInbox::Search, since OverIdx doesn't require search.
2020-08-27	over: recent: remove expensive COUNT query
	As noted in commit 87dca6d8d5988c5eb54019cca342450b0b7dd6b7 ("www: rework query responses to avoid COUNT in SQLite"), COUNT on many rows is expensive on big SQLite DBs. We've already stopped using that code path long ago in WWW while -imapd and -nntpd never used it. So we'll adjust our remaining test cases to not need it, either.
2020-08-27	over: rename ->disconnect to ->dbh_close
	Since we got rid of over->connect, `disconnect' no longer pairs with it. So name it after the `close(2)' syscall it ultimately issues.
2020-08-27	over: rename ->connect method to ->dbh
	`->connect' is confused with the perlfunc for the `connect(2)' syscall, and also `DBI->connect'. Since SQLite doesn't use sockets, the word "connect" needlessly confuses me. Give it a short name to match the field name we use for it, which also matches the variable name used by the DBI(3pm) and DBD::SQLite(3pm) manpages.
2020-08-26	v2writable: compatibility with SWIG Xapian binding
	The SWIG binding won't auto-convert IV/UV to PV like the XS Search::Xapian binding would, so workaround that shortcoming for now. Fixes: a367ec1b15a2458 ("mbox: disable "&t" on existing Xapian until full reindex")
2020-08-26	over+msgmap: respect WAL journal_mode if set
	WAL actually seems to have ideal locking characteristics given concurrency problems I'm experiencing with --reindex running in parallel with expensive read-only SQLite queries: <https://public-inbox.org/meta/20200825001204.GA840@dcvr/> Unfortunately, we cannot blindly use WAL while preserving compatibility with existing setups nor our guarantees that read-only daemons are indeed "read-only". However, respect an user's the choice to set WAL on their own if they're comfortable with giving -nntpd/-httpd/-imapd processes write permission to the directory storing SQLite DBs.
2020-08-26	msgmap: use "CREATE TABLE IF NOT EXISTS"
	It's fewer queries and matches what we do in OverIdx.
2020-08-26	over: skip nodatacow on the journal
	This file gets truncated anyhow, so it won't fragment.
2020-08-25	searchidx: croak for Xapian DB open failure
	croak() can give more context on the failure, and setting `PERL5OPT=-MCarp=verbose' can force a stacktrace.
2020-08-23	index: --sequential-shard checkpoints after each shard
	There's no reason we'd want Xapian to defer flushing once we've indexed everything belonging to a particular shard.
2020-08-23	mbox: disable "&t" on existing Xapian until full reindex
	Expanding threads via over.sqlite3 for mbox.gz downloads without Xapian effectively collapsing on the THREADID column leads to repeated messages getting downloaded. To avoid that situation, use a "has_threadid" Xapian metadata flag that's only set on --reindex (and brand new Xapian DBs). This allows admins to upgrade WWW or do --reindex in any order; without worrying about users eating up bandwidth and CPU cycles.
2020-08-23	search: support downloading mboxes results with full thread
	Finally, the addition of THREADID for collapsing results in Xapian lets us emulate the "mairix --threads" feature. That is, instead of returning only the matching messages, the entire thread is included in the downloaded mbox.gz This requires a "public-inbox-index --reindex" to be usable.
2020-08-23	searchidx: index THREADID in Xapian
	This is the `tid' column from over.sqlite3; and will be used for IMAP and JMAP search (among other things).
2020-08-23	searchidx: put all shard-related stuff in SearchIdxShard.pm
	We'll also rename the /^remote_/ prefix to "shard_", since remote implies the process is on a different host. These methods only pass messages to a child process on the same host OR perform operations within the same process.
2020-08-23	searchidxshard: clear $msgref buffer properly
	Merely assigning `undef' to a scalar does not free the underlying buffer memory of a scalar.
2020-08-22	searchview: fix mbox.gz downloads for lynx users
	Unlike w3m and links, the lynx browser seems to require a `name' attribute for `<input type=submit>' elements. Maybe some other browsers do, too. The `name' attribute for submit elements doesn't seem to cause any harm for w3m or links, users, either; despite not (AFAIK) being part of historical or current HTML specs.
2020-08-20	search: add mset_to_artnums method
	We can avoid importing mdocid() in several places by using this method, simplifying callers.
2020-08-20	init+index: support --skip-docdata for Xapian
	Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction).
2020-08-20	smsg: remove from_mitem
	We no longer read docdata.glass from anywhere in our code base. Some adjustments were needed to t/search.t to deal with the Xapian::WritableDatabase committing at different times, since our ->query is avoided from PublicInbox::SearchIdx to avoid needing a {over_ro} field.
2020-08-20	mbox: avoid Xapian docdata in search results
	Another place where we can reduce kernel page cache overhead by hitting over.sqlite3 instead of docdata.glass.
2020-08-20	extmsg: avoid using Xapian docdata
	Once again, over.sqlite3 contains everything necessary for Message-ID resolution. Also, Xapian may be completely unnecessary with the advent of over.sqlite3, but that's for another time.
2020-08-20	searchview: convert nested and Atom display to over.sqlite3
	git blob retrieval dominates on these, "&x=t" (nested) is roughly the same due to increased overhead for ->get_percent storage balancing out the mass-loading from SQLite. Atom "&x=A" is sped up slightly and uses less memory in the long-lived response.
2020-08-20	searchview: speed up search summary by ~10%
	Instead of loading one article at-a-time from over.sqlite3, we can use SQL to mass-load IN (?,?, ...) all results with a single SQLite query. Despite SQLite being in-process and having no network latency, the reduction in SQL query executions from loading multiple rows at once speeds things up significantly. We'll keep the over->get_art optimizations from the previous commit, since it still speeds up long-lived responses, slightly.
2020-08-20	searchview: use over.sqlite3 instead of Xapian docdata
	This is a step towards improving kernel page cache hit rates by relying on over.sqlite3 for document data instead of Xapian. Some micro-optimization to over->get_art was required to maintain performance.
2020-08-20	smsg: reduce utf8::decode call sites
	Both callers of load_from_data call utf8::decode, so just do utf8::decode in load_from_data.
2020-08-20	search: make qparse_new an internal function
	We'll probably be reusing it from another package in a future commit.
2020-08-20	searchquery: split off from searchview
	Since this was already a separate package, split it off into its own file since SearchView may not handle inbox groups.
2020-08-20	search: export mdocid subroutine
	No need to have awkward globrefs for this.
2020-08-20	search: improve comments around constants
	We'll probably be adding more value columns like THREADID to sort on.
2020-08-20	www: reduce long-lived PublicInbox::Search references
	While this is unlikely to be a problem in current practice, keeping Xapian DBs open for long responses can interfere with free space recovery after -compact. In the future, it will interfere with inbox search grouping and lead to unexpected results.
2020-08-20	xapcmd: simplify {reindex} parameter passing
	No need to localize it, here, since we can just refer to it in the `$opt' hashref. Hopefully this improves readability for others like it does for me. I sometimes wonder if the concept of a stack in high-level languages is even necessary...
2020-08-20	search: v2: ensure shards are numerically sorted
	This seems required to correctly get the NNTP article number from Xapian docid on combined Xapian DBs. The default (ASCII-betical) sorting was only acceptable for -imapd users until somebody hit 11 (or more) shards, which is a rare case.
2020-08-20	admin: progress shows the inbox being indexed
	This is helpful with --all, or when multiple inboxes are being indexed.