public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2021-10-15	lei: give workers their own process group
	This lets users Ctrl-Z from their terminal to pause an entire git-clone process hierarchy.
2021-10-14	lei: -d (--dir) and -O (only) shortcuts
	`-d' seems like a non-brainer for --dir with inspect. I find myself using `--only' a bit, too, and `-O' seems like a reasonable shortcut for it.
2021-10-14	lei add-external --mirror: respect client umask
	While lei is intended for non-public mail and runs umask(077) by default, externals are one area which can safely defer to the user's umask. Instead of sending it unconditionally with every command, only have lei-daemon request it when necessary.
2021-10-14	clone+fetch: respect umask for all downloaded files
	Since public inboxes are usually intended to be public, the File::Temp default permission of 0600 is wrong. Just respect the user's umask in this case as git-clone does. This doesn't work for "lei add-external --mirror", yet; but it will...
2021-10-14	lei inspect: account for non-extindex inboxes
	Inbox->xdb does not exist, but this code path was apparently never tested :x I noticed this on basic v2 inbox, but it could happen with any v1/v2 inbox. Move ->num2docid into Search so it's less awkward to use.
2021-10-14	extindex: guard against buggy unrefs
	I noticed some unref messages which shouldn't have been happening, but they were. Which is troubling. So add a guard around an unref path until we can get to the bottom of this.
2021-10-13	fetch: support --try-remote/-T for alternate remote names
	This allows -fetch to work out-of-the-box on using the grokmirror 2.x default of "_grokmirror".
2021-10-13	t/nntpd-tls: change diag() to like() assertion
	This test wasn't finished when I initially wrote it :x
2021-10-13	t/git: avoid "once" warning for async_warn
	No point in testing use_ok when we have no outside dependencies nor exports in this case.
2021-10-13	t/lei-mirror: avoid reading ~/.public-inbox/config in test
	Oops, we shouldn't attempt to read a users' actual HOME when running -index, since mine has a bunch of invalid entries in there.
2021-10-13	eml: avoid Encode 2.87..3.12 leak
	Encode::FB_CROAK leaks memory in old versions of Encode: <https://rt.cpan.org/Public/Bug/Display.html?id=139622> Since I expect there's still many users on old systems and old Perls, we can use "$SIG{__WARN__} = \&croak" here with Encode::FB_WARN to emulate Encode::FB_CROAK behavior.
2021-10-13	t/www_listing: require opt-in for grokmirror tests
	grokmirror 2.x seems to idle in several places for 5s at-a-time, causing t/www_listing.t to take longer than "make check-run" on a 4-core system when run without grokmirror. So make it optional but add some test knobs to allow tailing the log output so I can see what's going on.
2021-10-13	test_common: hoist out tail_f sub
	We'll be reusing this in more places. While we're at it, allow it to tail all run_script() users, including lei() in TestCommon.
2021-10-13	xt/perf-msgview: drop unnecessary use_ok
	require_mods covers it, and we're not testing Plack itself.
2021-10-13	www: preload: load ExtSearch via ->ALL
	This ought to give us more CoW savings and fragmentation avoidance in -httpd.
2021-10-13	extindex: set {current_info} in eidxq processing
	This gives context as to where warnings are coming from.
2021-10-13	treewide: use warn() or carp() instead of env->{psgi.errors}
	Large chunks of our codebase and 3rd-party dependencies do not use ->{psgi.errors}, so trying to standardize on it was a fruitless endeavor. Since warn() and carp() are standard mechanism within Perl, just use that instead and simplify a bunch of existing code.
2021-10-13	lei: use standard warn() in more places
	warn() is easier to augment with context information, and frankly unavoidable in the presence of 3rd-party libraries we don't control.
2021-10-13	extindex: show OID on bad blob failure
	AFAIK I've never hit these messages, but I might be glad if I ever do.
2021-10-13	daemon: set $SIG{__WARN__} properly
	Eml->warn_ignore_cb itself returns a callback, so creating a reference to it was wrong when assigning it to $SIG{__WARN__}; Fixes: 176cd51f9aa81b74 ("daemon: quiet down Eml-related warnings")
2021-10-13	lei up --all: show output for warnings
	This helps users make sense of which saved searches some warnings were coming from. Since I often create and discard externals, some warnings from saved searches were confusing to me without output context: "`$FOO' is unknown" "$FOO not indexed by Xapian"
2021-10-13	doc: relnotes: note some recent improvements

2021-10-13	index: optimize after all SQLite DB commits
	This covers v1 inboxes, as well. We also guard the execution since "PRAGMA optimize" was only introduced in SQLite 3.18.0 (2017-03-30)
2021-10-13	lei/store: use remove_doc to save some LoC

2021-10-13	extindex: flush pending reindex before unref
	This prevents unnecessary message renumbering and I/O. Without this change, there is a small window for long-running WWW streaming requests to miss a message that was unref-ed before reindexing. If we expose an "All Mail" mailbox via IMAP/JMAP, this will save client traffic.
2021-10-12	www: _/text/config/raw Last-Modified: is mm->created_at
	This allows IMAP mirrors to keep UIDVALIDITY synchronized (and "LIST ACTIVE.TIMES" in NNTP). "lei add-external --mirror" will automatically set it, as will the combination of public-inbox-clone + public-inbox-index. This avoids the need for extra endpoints or config entries, at least...
2021-10-12	msgmap: ->new_file to supports $ibx arg, drop ->new
	The original Msgmap->new API was v1-specific and not necessary. The ->new_file API now supports an $ibx object being passed to it, simplify -no_fsync use. It will also make an upcoming change easier...
2021-10-12	daemon: unconditionally close Xapian shards on cleanup
	The cost of opening a Xapian DB (even with shards) isn't high, so save some FDs and just close it. We hit Xapian far less than over.sqlite3 and we discard the MSet ASAP even when streaming large responses. This simplifies our code a bit and hopefully helps reduce fragmentation by increasing mortality of late allocations.
2021-10-12	msgmap: share most of check_inodes w/ over
	We still need to account for msgmap being open all the time and not having separate read-only vs. read-write packages.
2021-10-12	msgmap: use DBI->prepare_cached
	msgmap is not performance-critical enough to justify doing our own prepared statement caching. Just rely on the functionality of DBI here so future changes will be easier. There's also minor style changes to avoid dirtying refcount cache lines bumping by repeating hash lookups rather than attempting to store them as locals.
2021-10-12	nntp: use defined-OR from Perl 5.10 for msgid check
	"<0>" could be a valid Message-ID, maybe...
2021-10-12	search: delete QueryParser along with DB handle
	Xapian::QueryParser is attached to the Xapian::Database, so holding onto the QueryParser was preventing us from releasing DB handles if a query was performed.
2021-10-12	daemon: quiet down Eml-related warnings
	Email::Address::XS is quite noisy and there's nothing we can really do about messages we're serving from read-only daemons.
2021-10-12	daemon: use v5.10.1, disable local warnings
	We're moving towards relying on "perl -w" for warnings and v5.12 for strict.
2021-10-12	isearch: do not access Extsearch->{over} directly
	It may not exist due to periodic cleanup to avoid excessive FD use.
2021-10-12	extindex: avoid invalid blobs after unref
	When unref-ing a blob from xref3, make sure the "preferred" smsg->{blob} doesn't point to the blob we just unrefed. This is necessary because we periodically checkpoint our extindex process to allow -watch and -mda processes to run. This also gets rid of a lot of redundant code for ->remove_xref3, since it's all handled in ExtSearchIdx, now.
2021-10-12	extindex: more consistent doc removal
	We need to ensure a message is consistently removed from eidxq, over and Xapian in all cases. Removing from eidxq saves users from some noisy error messages.
2021-10-12	extindex: share unref logic in more places
	We can use the same logic for --gc and --reindex and 'd' log entries They're similar enough and the actual need to unref should be fairly rare. We could go a lot faster if we didn't show progress for --gc and --reindex, actually.
2021-10-12	extindex: rename var: active => active_shards
	We also have the idea of active inboxes, too, so "active shards" ought to make the purpose of the data structure more obvious.
2021-10-12	sqlite: PRAGMA optimize on close
	As recommended by SQLite documentation[1]: To achieve the best long-term query performance without the need to do a detailed engineering analysis of the application schema and SQL, it is recommended that applications run "PRAGMA optimize" (with no arguments) just before closing each database connection. Hopefully that works for our use cases and can make things faster for us. [1] https://www.sqlite.org/pragma.html#pragma_optimize
2021-10-12	extindex: speed up --reindex --fast
	This required some tweaking of xref3 indices in over.sqlite3, but the end result is it brings no-op "--reindex --fast --all" checks down to roughly 20 minutes (from 30-40 minutes) on lore/all. This is faster because a bunch of small SQLite queries are still slower en-mass than a bunch of perlops. Despite the lack of IPC overhead, crossing .so boundaries and repeating lookups over btrees is still slower than doing the same with Perl hash tables.
2021-10-11	doc: lei-refresh-mail-sync: drop repeated word

2021-10-10	extindex: sync each inbox before checking for missed messages
	Otherwise, it gets too noisy and we repeat some work when we do an actual sync, since the last_commit info will be out-of-date.
2021-10-10	lei/store: keep ".err-XXXX" in stderr tmpfile
	This is slighly more meaningful since the file is already in ~/.local/share/lei/store, so "lei_store" was redundant (and the "XXXX" are random characters replaced by File::Temp)
2021-10-10	extindex: --gc doesn't touch ghost entries
	We were deleting ghost entries, this was usually harmless since other messages could fill-in-the-blanks, but could cause misthreading in odd cases where a big chunk of a thread is missing and the latest messages only referenced ghosts. We'll also save some cycles when scanning Xapian shards since docids won't be <= 0.
2021-10-10	extindex: minor cost reductions
	Don't bother decoding the 20-byte SHA-1 to a 40-byte hex value since we don't read it, anyways. We can also use the on-stack ibx->eidx_key value instead of dispatching the method again.
2021-10-10	extindex: speed up Xapian cleanup in --gc
	Avoiding repeated SQL statements brings --gc down to 2-3 minutes from around 10. We'll also add some checkpoints around over and xref3 cleanups.
2021-10-10	set nodatacow on more SQLite files
	We'll set nodatacow when detecting existing but empty files, and also their directories in more cases (for auxiliary -wal, -journal, -shm files). Hopefully this keeps performance reasonable on CoW FSes.
2021-10-10	admin: add '# ' prefix for progress messages
	It's more consistent with TAP output and hopefully puts users at ease in case they don't understand the meaning of a message.
2021-10-10	lei_to_mail: show --output on augment progress failure
	Just in case it fails when there's many parallel invocations.