about summary refs log tree commit homepage
DateCommit message (Collapse)
2020-09-01rename WatchMaildir => Watch
This is no longer limited to Maildirs now that IMAP and NNTP support exist; so give it a shorter name.
2020-09-01watchmaildir: use v5.10.1, drop warnings
Declare 5.10.1 to avoid potential compatibility problems with Perl 7/8 down the line. We'll rely on the command-line to set or drop warnings during development, at least.
2020-09-01watch: limit batch size of NNTP and IMAP workers, too
We don't want to monopolize locks because processes can easily block each other if using `watchspam' on a Maildir while a big NNTP or IMAP import is happening. This can also happen if somebody configured a single inbox to watch from several sources to merge several mailboxes into one (e.g. both an IMAP and Maildir are watched).
2020-08-31doc: expand on indexBatchSize regarding fragmentation
And change the documentation reference in -tuning to point to the -index manpage while we're at it.
2020-08-30imapd: filter out unusable flags from search
Quiet down logs from -imapd when clients are blindly sending some unsupported flag conditions (e.g. "DRAFT", "DELETED") specified in RFC 3501.
2020-08-29tests: check-run: fixup un-squashed simplification
Link: https://public-inbox.org/meta/20200828221803.GA89978@dcvr/
2020-08-28tests: check-run: show skipped tests
We'll deduplicate redundant lines and show counts of skipped tests to ensure it's easy to notice if something is unexpectedly skipped.
2020-08-28imaptracker: update_last: simplify callers
By making it a no-op if last_uid is not defined. This isn't a hot code path, so the extra method dispatch isn't an issue. It'll save some indentation/wrapping in future commits.
2020-08-28watch: flush changes to inbox before updating IMAPTracker
Data needs to hit inboxes, first. Otherwise it's possible to skip messages in case git-fast-import is killed before it sees "done\n". Now, -watch will just waste a little bandwidth in re-downloading a seen message if it's interrupted immediately before updating IMAPTracker.
2020-08-28Makefile.PL: run check-man for <= 80 columns on check-run, too
I mostly use "make check-run" instead of the slower "make check" target, nowadays, so add this check to ensure the rendered manpage is always be visible to more users who need big fonts.
2020-08-28www: more descriptive pagination
Being an easily confused person, I find "next" and "prev" ambiguous as to whether messages on the next or previous page will be newer or older than the current page. Clarify that for the threaded /$INBOX/ view and search results. For search results sorted by relevance, we'll use "[>= $SCORE]" or "[<= $SCORE]" to indicate to indicate directionality. This also fixes $INBOX/new.html for unindexed v1 inboxes.
2020-08-28www: improve navigation around contemporary threads
Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-08-28doc: watch: expand on NNTP and IMAP-specific knobs
There's a few more, but maybe they're too esoteric to be worth documenting at the moment (batch sizes, timeouts, etc).
2020-08-27doc: move watch config docs to -watch manpage
The -config manpage is a bit long and the -watch stuff is isolated from the rest of it while we start documenting NNTP and IMAP support. I'm not entirely happy with the way IMAP and NNTP are configured, it's still good enough for small setups. This also fixes a long-standing misplaced comment about `publicinboxwatch.spamcheck' affecting all configured inboxes, that comment was actually for `publicinboxwatch.watchspam'. We'll omit documenting NNTP for `watchspam', for now, given the lack of \Seen flags in NNTP and I'm not sure if it's even useful. There may not be any newsgroups for sharing confirmed spam, either...
2020-08-27watch: imap: only remove \Seen spam
This matches the behavior of Maildir `watchspam' handling in not removing unseen messages. NNTP can't match this behavior, since NNTP servers don't store flags, clients do.
2020-08-27doc: speling fickses
2020-08-27doc: document graceful shutdown signals
Same as the read-only daemons.
2020-08-27overidx: inline create_ghost sub
There's no need for this to be a separate sub since there's only a single caller. This saves a few kilobytes at least in short-lived processes.
2020-08-27imaptracker: preserve WAL journal_mode if set by user
It's no problem for most users to enable WAL, here, since there's only a single process doing both reading and writing (unlike the read-only daemons). However, WAL doesn't work on network filesystems, so it can't be enabled by default.
2020-08-27watchmaildir: ensure I:/W:/E: prefixes in warnings
For consistency in output, any URL/path-context-dependent prefixes should have the same prefix as the actual warning which triggered it.
2020-08-27git: show more context info on failures
I'm seeing "read: Connection timed out" from in my syslog from -httpd. The fail() calls in PublicInbox::Git seems to be the only code path of ours which could trigger it... ETIMEDOUT shouldn't happen on pipes, only sockets; and all of our socket operations are non-blocking. So this could be cgit-wwwhighlight-filter.lua, but that's connecting over localhost, though on fairly loaded HW.
2020-08-27search: allow testing with current xapian.git and 1.5.x
A `PI_XAPIAN' environment variable is now exposed for testing purposes. We'll also deal with the removal of `NumberValueRangeProcessor' and use `NumberRangeProcessor' in its place, but continue favoring the old Search::Xapian since that's all that's packaged for Debian 10.x stable.
2020-08-27msgmap: use v5.10.1
We use the defined-or (`//', `//=') operators in 5.10, so require 5.10.1 like the rest of our codebase. Update an outdated comment while we're at it.
2020-08-27over*: use v5.10.1, drop warnings
v5.10.1 lets us use the lighter parent.pm instead of base.pm, and we'll rely on the shebang to enable warnings (or not). While we're in the area, drop a no-longer-necessary import for PublicInbox::Search, since OverIdx doesn't require search.
2020-08-27over: recent: remove expensive COUNT query
As noted in commit 87dca6d8d5988c5eb54019cca342450b0b7dd6b7 ("www: rework query responses to avoid COUNT in SQLite"), COUNT on many rows is expensive on big SQLite DBs. We've already stopped using that code path long ago in WWW while -imapd and -nntpd never used it. So we'll adjust our remaining test cases to not need it, either.
2020-08-27over: rename ->disconnect to ->dbh_close
Since we got rid of over->connect, `disconnect' no longer pairs with it. So name it after the `close(2)' syscall it ultimately issues.
2020-08-27over: rename ->connect method to ->dbh
`->connect' is confused with the perlfunc for the `connect(2)' syscall, and also `DBI->connect'. Since SQLite doesn't use sockets, the word "connect" needlessly confuses me. Give it a short name to match the field name we use for it, which also matches the variable name used by the DBI(3pm) and DBD::SQLite(3pm) manpages.
2020-08-26v2writable: compatibility with SWIG Xapian binding
The SWIG binding won't auto-convert IV/UV to PV like the XS Search::Xapian binding would, so workaround that shortcoming for now. Fixes: a367ec1b15a2458 ("mbox: disable "&t" on existing Xapian until full reindex")
2020-08-26grok-pull.post_update_hook: flock(2) before SQLite check
Unlike DBD::SQLite, the sqlite3(1) CLI does not have a default busy timeout enabled, so it easily times out while acquiring a SHARED lock for read-only queries. We can avoid battery-wasting polling from the SQLite timeout handler by relying on flock(2) as we do in our Perl code. Furthermore, this avoids triggering some locking problems[1] from a long "SELECT COUNT(*) ..." query and reindex. While there may be other SQLite-related parallelism issues[1], this works around one of them by relying on flock(2). [1] https://public-inbox.org/meta/20200825001204.GA840@dcvr/
2020-08-26over+msgmap: respect WAL journal_mode if set
WAL actually seems to have ideal locking characteristics given concurrency problems I'm experiencing with --reindex running in parallel with expensive read-only SQLite queries: <https://public-inbox.org/meta/20200825001204.GA840@dcvr/> Unfortunately, we cannot blindly use WAL while preserving compatibility with existing setups nor our guarantees that read-only daemons are indeed "read-only". However, respect an user's the choice to set WAL on their own if they're comfortable with giving -nntpd/-httpd/-imapd processes write permission to the directory storing SQLite DBs.
2020-08-26msgmap: use "CREATE TABLE IF NOT EXISTS"
It's fewer queries and matches what we do in OverIdx.
2020-08-26over: skip nodatacow on the journal
This file gets truncated anyhow, so it won't fragment.
2020-08-26doc: 1.6.0 release notes update
A few more things happened, here.
2020-08-26doc: add some more tuning notes
I've learned a thing or three about btrfs in the past few weeks and remembered some old HDD things, too. The Xapian MultiDatabase problem will need to be addressed for 1.7...
2020-08-25searchidx: croak for Xapian DB open failure
croak() can give more context on the failure, and setting `PERL5OPT=-MCarp=verbose' can force a stacktrace.
2020-08-25examples: add imapd systemd examples
We've got examples for all the other daemons, too!
2020-08-23index: --sequential-shard checkpoints after each shard
There's no reason we'd want Xapian to defer flushing once we've indexed everything belonging to a particular shard.
2020-08-23mbox: disable "&t" on existing Xapian until full reindex
Expanding threads via over.sqlite3 for mbox.gz downloads without Xapian effectively collapsing on the THREADID column leads to repeated messages getting downloaded. To avoid that situation, use a "has_threadid" Xapian metadata flag that's only set on --reindex (and brand new Xapian DBs). This allows admins to upgrade WWW or do --reindex in any order; without worrying about users eating up bandwidth and CPU cycles.
2020-08-23search: support downloading mboxes results with full thread
Finally, the addition of THREADID for collapsing results in Xapian lets us emulate the "mairix --threads" feature. That is, instead of returning only the matching messages, the entire thread is included in the downloaded mbox.gz This requires a "public-inbox-index --reindex" to be usable.
2020-08-23searchidx: index THREADID in Xapian
This is the `tid' column from over.sqlite3; and will be used for IMAP and JMAP search (among other things).
2020-08-23searchidx: put all shard-related stuff in SearchIdxShard.pm
We'll also rename the /^remote_/ prefix to "shard_", since remote implies the process is on a different host. These methods only pass messages to a child process on the same host OR perform operations within the same process.
2020-08-23searchidxshard: clear $msgref buffer properly
Merely assigning `undef' to a scalar does not free the underlying buffer memory of a scalar.
2020-08-22searchview: fix mbox.gz downloads for lynx users
Unlike w3m and links, the lynx browser seems to require a `name' attribute for `<input type=submit>' elements. Maybe some other browsers do, too. The `name' attribute for submit elements doesn't seem to cause any harm for w3m or links, users, either; despite not (AFAIK) being part of historical or current HTML specs.
2020-08-20search: add mset_to_artnums method
We can avoid importing mdocid() in several places by using this method, simplifying callers.
2020-08-20init+index: support --skip-docdata for Xapian
Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction).
2020-08-20t/nntpd-v2: set PI_TEST_VERSION=2 properly
Numbers are hard :<
2020-08-20smsg: remove from_mitem
We no longer read docdata.glass from anywhere in our code base. Some adjustments were needed to t/search.t to deal with the Xapian::WritableDatabase committing at different times, since our ->query is avoided from PublicInbox::SearchIdx to avoid needing a {over_ro} field.
2020-08-20mbox: avoid Xapian docdata in search results
Another place where we can reduce kernel page cache overhead by hitting over.sqlite3 instead of docdata.glass.
2020-08-20extmsg: avoid using Xapian docdata
Once again, over.sqlite3 contains everything necessary for Message-ID resolution. Also, Xapian may be completely unnecessary with the advent of over.sqlite3, but that's for another time.
2020-08-20searchview: convert nested and Atom display to over.sqlite3
git blob retrieval dominates on these, "&x=t" (nested) is roughly the same due to increased overhead for ->get_percent storage balancing out the mass-loading from SQLite. Atom "&x=A" is sped up slightly and uses less memory in the long-lived response.