public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-08-26	over+msgmap: respect WAL journal_mode if set
	WAL actually seems to have ideal locking characteristics given concurrency problems I'm experiencing with --reindex running in parallel with expensive read-only SQLite queries: <https://public-inbox.org/meta/20200825001204.GA840@dcvr/> Unfortunately, we cannot blindly use WAL while preserving compatibility with existing setups nor our guarantees that read-only daemons are indeed "read-only". However, respect an user's the choice to set WAL on their own if they're comfortable with giving -nntpd/-httpd/-imapd processes write permission to the directory storing SQLite DBs.
2020-08-26	msgmap: use "CREATE TABLE IF NOT EXISTS"
	It's fewer queries and matches what we do in OverIdx.
2020-08-26	over: skip nodatacow on the journal
	This file gets truncated anyhow, so it won't fragment.
2020-08-26	doc: 1.6.0 release notes update
	A few more things happened, here.
2020-08-26	doc: add some more tuning notes
	I've learned a thing or three about btrfs in the past few weeks and remembered some old HDD things, too. The Xapian MultiDatabase problem will need to be addressed for 1.7...
2020-08-25	searchidx: croak for Xapian DB open failure
	croak() can give more context on the failure, and setting `PERL5OPT=-MCarp=verbose' can force a stacktrace.
2020-08-25	examples: add imapd systemd examples
	We've got examples for all the other daemons, too!
2020-08-23	index: --sequential-shard checkpoints after each shard
	There's no reason we'd want Xapian to defer flushing once we've indexed everything belonging to a particular shard.
2020-08-23	mbox: disable "&t" on existing Xapian until full reindex
	Expanding threads via over.sqlite3 for mbox.gz downloads without Xapian effectively collapsing on the THREADID column leads to repeated messages getting downloaded. To avoid that situation, use a "has_threadid" Xapian metadata flag that's only set on --reindex (and brand new Xapian DBs). This allows admins to upgrade WWW or do --reindex in any order; without worrying about users eating up bandwidth and CPU cycles.
2020-08-23	search: support downloading mboxes results with full thread
	Finally, the addition of THREADID for collapsing results in Xapian lets us emulate the "mairix --threads" feature. That is, instead of returning only the matching messages, the entire thread is included in the downloaded mbox.gz This requires a "public-inbox-index --reindex" to be usable.
2020-08-23	searchidx: index THREADID in Xapian
	This is the `tid' column from over.sqlite3; and will be used for IMAP and JMAP search (among other things).
2020-08-23	searchidx: put all shard-related stuff in SearchIdxShard.pm
	We'll also rename the /^remote_/ prefix to "shard_", since remote implies the process is on a different host. These methods only pass messages to a child process on the same host OR perform operations within the same process.
2020-08-23	searchidxshard: clear $msgref buffer properly
	Merely assigning `undef' to a scalar does not free the underlying buffer memory of a scalar.
2020-08-22	searchview: fix mbox.gz downloads for lynx users
	Unlike w3m and links, the lynx browser seems to require a `name' attribute for `<input type=submit>' elements. Maybe some other browsers do, too. The `name' attribute for submit elements doesn't seem to cause any harm for w3m or links, users, either; despite not (AFAIK) being part of historical or current HTML specs.
2020-08-20	search: add mset_to_artnums method
	We can avoid importing mdocid() in several places by using this method, simplifying callers.
2020-08-20	init+index: support --skip-docdata for Xapian
	Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction).
2020-08-20	t/nntpd-v2: set PI_TEST_VERSION=2 properly
	Numbers are hard :<
2020-08-20	smsg: remove from_mitem
	We no longer read docdata.glass from anywhere in our code base. Some adjustments were needed to t/search.t to deal with the Xapian::WritableDatabase committing at different times, since our ->query is avoided from PublicInbox::SearchIdx to avoid needing a {over_ro} field.
2020-08-20	mbox: avoid Xapian docdata in search results
	Another place where we can reduce kernel page cache overhead by hitting over.sqlite3 instead of docdata.glass.
2020-08-20	extmsg: avoid using Xapian docdata
	Once again, over.sqlite3 contains everything necessary for Message-ID resolution. Also, Xapian may be completely unnecessary with the advent of over.sqlite3, but that's for another time.
2020-08-20	searchview: convert nested and Atom display to over.sqlite3
	git blob retrieval dominates on these, "&x=t" (nested) is roughly the same due to increased overhead for ->get_percent storage balancing out the mass-loading from SQLite. Atom "&x=A" is sped up slightly and uses less memory in the long-lived response.
2020-08-20	searchview: speed up search summary by ~10%
	Instead of loading one article at-a-time from over.sqlite3, we can use SQL to mass-load IN (?,?, ...) all results with a single SQLite query. Despite SQLite being in-process and having no network latency, the reduction in SQL query executions from loading multiple rows at once speeds things up significantly. We'll keep the over->get_art optimizations from the previous commit, since it still speeds up long-lived responses, slightly.
2020-08-20	searchview: use over.sqlite3 instead of Xapian docdata
	This is a step towards improving kernel page cache hit rates by relying on over.sqlite3 for document data instead of Xapian. Some micro-optimization to over->get_art was required to maintain performance.
2020-08-20	smsg: reduce utf8::decode call sites
	Both callers of load_from_data call utf8::decode, so just do utf8::decode in load_from_data.
2020-08-20	search: make qparse_new an internal function
	We'll probably be reusing it from another package in a future commit.
2020-08-20	searchquery: split off from searchview
	Since this was already a separate package, split it off into its own file since SearchView may not handle inbox groups.
2020-08-20	search: export mdocid subroutine
	No need to have awkward globrefs for this.
2020-08-20	search: improve comments around constants
	We'll probably be adding more value columns like THREADID to sort on.
2020-08-20	www: reduce long-lived PublicInbox::Search references
	While this is unlikely to be a problem in current practice, keeping Xapian DBs open for long responses can interfere with free space recovery after -compact. In the future, it will interfere with inbox search grouping and lead to unexpected results.
2020-08-20	xapcmd: simplify {reindex} parameter passing
	No need to localize it, here, since we can just refer to it in the `$opt' hashref. Hopefully this improves readability for others like it does for me. I sometimes wonder if the concept of a stack in high-level languages is even necessary...
2020-08-20	search: v2: ensure shards are numerically sorted
	This seems required to correctly get the NNTP article number from Xapian docid on combined Xapian DBs. The default (ASCII-betical) sorting was only acceptable for -imapd users until somebody hit 11 (or more) shards, which is a rare case.
2020-08-20	init: drop -N alias for --skip-artnum
	It may be too easily confused for --newsgroup or --ng. This is too rarely used and never made it into a release, so it should be fine.
2020-08-20	init: support --newsgroup option
	We can reduce the need to edit the config file for NNTP group names this way.
2020-08-20	init: support --help and -?
	And speed those up with some lazy loading, too.
2020-08-20	compact: support --help/-? and perform lazy loading
	This probably won't be used much, but --help can still make sense.
2020-08-20	admin: progress shows the inbox being indexed
	This is helpful with --all, or when multiple inboxes are being indexed.
2020-08-20	doc: note -compact and -xcpdb are rarely used
	Slowly improving the learning curve...
2020-08-19	v2writable: show newline after "indexing all of .. " message
	Otherwise things get very confusing when verbosity is enabled :x
2020-08-19	smsg: handle wide characters in raw mail headers
	There may be messages in the wild with wide characters in headers which aren't non-RFC2047 encoded. Assume UTF-8 so those fields can round trip through over.sqlite3. This doesn't affect docdata.glass in Xapian, but it does affect how over.sqlite3 stores the same deflated info.
2020-08-16	doc: add public-inbox-tuning(7) manpage
	Determining storage device speed and latencies doesn't seem portable or even possible with the wide variety of storage layers in use. This means we need to write a tuning document and hope users read and improve on it :P
2020-08-14	grok-pull.post_update_hook: favor --sequential-shard for HDD
	--sequential-shard offers better performance on HDD than -j0 since the on-disk active set can be kept small (with -j $HIGH_NUM). --batch-size can also be helpful for systems with much RAM.
2020-08-14	index\|compact\|xcpdb: support --all switch
	For -index, this is a convenient way to quickly index all inboxes after a grok-pull. Might as well support it for rarely used commands like -compact and -xcpdb, too.
2020-08-13	v2writable: remove IdxStack import
	We use IdxStack via log2stack() from SearchIdx, now.
2020-08-13	xcpdb: wire up new index options and --help
	--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed.
2020-08-13	admin: don't warn when --jobs exceeds shards
	Established tools like make(1), prove(1) and xargs(1) don't warn when the desired parallelism level can't be met, either.
2020-08-13	xapcmd: reduce CPU idling when shards exceeds job count
	In case there's unbalanced shards AND we're limiting parallelism while using many shards, spawn the next task in the queue ASAP once a task is done, instead of waiting for all tasks to finish before spawning the next batch. Unbalanced shards probably isn't a big issue for most users; however many smaller shards with few jobs can be useful for HDD users to reduce the effect of random writes.
2020-08-13	xcpdb: support --no-fsync from CLI
	This was omitted in 8b1950055d51d436 :x Fixes: 8b1950055d51d436 ("index+xcpdb: rename `--no-sync' to `--no-fsync'")
2020-08-13	xapcmd: simplify sub reference
	We don't need to fully-qualify when referring to subs in the same namespace, nor do we need make a SCALAR ref only to dereference it (Yes, still learning Perl :x)
2020-08-10	convert: set No_COW on copied SQLite files
	We'll use our existing logic and use sqlite_backup_from_file, which appeared in 1.39 (along with sqlite_backup_to_file).
2020-08-10	convert: check ARGV more correctly
	Instead of silently ignoring excessive args, don't let a user specify an extra directory. Furthermore, we'll support the odd case where BOFH wants to name an $INBOX_DIR to be `0' :P