about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-10-13t/lei-mirror: avoid reading ~/.public-inbox/config in test
Oops, we shouldn't attempt to read a users' actual HOME when running -index, since mine has a bunch of invalid entries in there.
2021-10-13eml: avoid Encode 2.87..3.12 leak
Encode::FB_CROAK leaks memory in old versions of Encode: <https://rt.cpan.org/Public/Bug/Display.html?id=139622> Since I expect there's still many users on old systems and old Perls, we can use "$SIG{__WARN__} = \&croak" here with Encode::FB_WARN to emulate Encode::FB_CROAK behavior.
2021-10-13t/www_listing: require opt-in for grokmirror tests
grokmirror 2.x seems to idle in several places for 5s at-a-time, causing t/www_listing.t to take longer than "make check-run" on a 4-core system when run without grokmirror. So make it optional but add some test knobs to allow tailing the log output so I can see what's going on.
2021-10-13test_common: hoist out tail_f sub
We'll be reusing this in more places. While we're at it, allow it to tail all run_script() users, including lei() in TestCommon.
2021-10-13xt/perf-msgview: drop unnecessary use_ok
require_mods covers it, and we're not testing Plack itself.
2021-10-13www: preload: load ExtSearch via ->ALL
This ought to give us more CoW savings and fragmentation avoidance in -httpd.
2021-10-13extindex: set {current_info} in eidxq processing
This gives context as to where warnings are coming from.
2021-10-13treewide: use warn() or carp() instead of env->{psgi.errors}
Large chunks of our codebase and 3rd-party dependencies do not use ->{psgi.errors}, so trying to standardize on it was a fruitless endeavor. Since warn() and carp() are standard mechanism within Perl, just use that instead and simplify a bunch of existing code.
2021-10-13lei: use standard warn() in more places
warn() is easier to augment with context information, and frankly unavoidable in the presence of 3rd-party libraries we don't control.
2021-10-13extindex: show OID on bad blob failure
AFAIK I've never hit these messages, but I might be glad if I ever do.
2021-10-13daemon: set $SIG{__WARN__} properly
Eml->warn_ignore_cb itself returns a callback, so creating a reference to it was wrong when assigning it to $SIG{__WARN__}; Fixes: 176cd51f9aa81b74 ("daemon: quiet down Eml-related warnings")
2021-10-13lei up --all: show output for warnings
This helps users make sense of which saved searches some warnings were coming from. Since I often create and discard externals, some warnings from saved searches were confusing to me without output context: "`$FOO' is unknown" "$FOO not indexed by Xapian"
2021-10-13doc: relnotes: note some recent improvements
2021-10-13index: optimize after all SQLite DB commits
This covers v1 inboxes, as well. We also guard the execution since "PRAGMA optimize" was only introduced in SQLite 3.18.0 (2017-03-30)
2021-10-13lei/store: use remove_doc to save some LoC
2021-10-13extindex: flush pending reindex before unref
This prevents unnecessary message renumbering and I/O. Without this change, there is a small window for long-running WWW streaming requests to miss a message that was unref-ed before reindexing. If we expose an "All Mail" mailbox via IMAP/JMAP, this will save client traffic.
2021-10-12www: _/text/config/raw Last-Modified: is mm->created_at
This allows IMAP mirrors to keep UIDVALIDITY synchronized (and "LIST ACTIVE.TIMES" in NNTP). "lei add-external --mirror" will automatically set it, as will the combination of public-inbox-clone + public-inbox-index. This avoids the need for extra endpoints or config entries, at least...
2021-10-12msgmap: ->new_file to supports $ibx arg, drop ->new
The original Msgmap->new API was v1-specific and not necessary. The ->new_file API now supports an $ibx object being passed to it, simplify -no_fsync use. It will also make an upcoming change easier...
2021-10-12daemon: unconditionally close Xapian shards on cleanup
The cost of opening a Xapian DB (even with shards) isn't high, so save some FDs and just close it. We hit Xapian far less than over.sqlite3 and we discard the MSet ASAP even when streaming large responses. This simplifies our code a bit and hopefully helps reduce fragmentation by increasing mortality of late allocations.
2021-10-12msgmap: share most of check_inodes w/ over
We still need to account for msgmap being open all the time and not having separate read-only vs. read-write packages.
2021-10-12msgmap: use DBI->prepare_cached
msgmap is not performance-critical enough to justify doing our own prepared statement caching. Just rely on the functionality of DBI here so future changes will be easier. There's also minor style changes to avoid dirtying refcount cache lines bumping by repeating hash lookups rather than attempting to store them as locals.
2021-10-12nntp: use defined-OR from Perl 5.10 for msgid check
"<0>" could be a valid Message-ID, maybe...
2021-10-12search: delete QueryParser along with DB handle
Xapian::QueryParser is attached to the Xapian::Database, so holding onto the QueryParser was preventing us from releasing DB handles if a query was performed.
2021-10-12daemon: quiet down Eml-related warnings
Email::Address::XS is quite noisy and there's nothing we can really do about messages we're serving from read-only daemons.
2021-10-12daemon: use v5.10.1, disable local warnings
We're moving towards relying on "perl -w" for warnings and v5.12 for strict.
2021-10-12isearch: do not access Extsearch->{over} directly
It may not exist due to periodic cleanup to avoid excessive FD use.
2021-10-12extindex: avoid invalid blobs after unref
When unref-ing a blob from xref3, make sure the "preferred" smsg->{blob} doesn't point to the blob we just unrefed. This is necessary because we periodically checkpoint our extindex process to allow -watch and -mda processes to run. This also gets rid of a lot of redundant code for ->remove_xref3, since it's all handled in ExtSearchIdx, now.
2021-10-12extindex: more consistent doc removal
We need to ensure a message is consistently removed from eidxq, over and Xapian in all cases. Removing from eidxq saves users from some noisy error messages.
2021-10-12extindex: share unref logic in more places
We can use the same logic for --gc and --reindex and 'd' log entries They're similar enough and the actual need to unref should be fairly rare. We could go a lot faster if we didn't show progress for --gc and --reindex, actually.
2021-10-12extindex: rename var: active => active_shards
We also have the idea of active inboxes, too, so "active shards" ought to make the purpose of the data structure more obvious.
2021-10-12sqlite: PRAGMA optimize on close
As recommended by SQLite documentation[1]: To achieve the best long-term query performance without the need to do a detailed engineering analysis of the application schema and SQL, it is recommended that applications run "PRAGMA optimize" (with no arguments) just before closing each database connection. Hopefully that works for our use cases and can make things faster for us. [1] https://www.sqlite.org/pragma.html#pragma_optimize
2021-10-12extindex: speed up --reindex --fast
This required some tweaking of xref3 indices in over.sqlite3, but the end result is it brings no-op "--reindex --fast --all" checks down to roughly 20 minutes (from 30-40 minutes) on lore/all. This is faster because a bunch of small SQLite queries are still slower en-mass than a bunch of perlops. Despite the lack of IPC overhead, crossing .so boundaries and repeating lookups over btrees is still slower than doing the same with Perl hash tables.
2021-10-11doc: lei-refresh-mail-sync: drop repeated word
2021-10-10extindex: sync each inbox before checking for missed messages
Otherwise, it gets too noisy and we repeat some work when we do an actual sync, since the last_commit info will be out-of-date.
2021-10-10lei/store: keep ".err-XXXX" in stderr tmpfile
This is slighly more meaningful since the file is already in ~/.local/share/lei/store, so "lei_store" was redundant (and the "XXXX" are random characters replaced by File::Temp)
2021-10-10extindex: --gc doesn't touch ghost entries
We were deleting ghost entries, this was usually harmless since other messages could fill-in-the-blanks, but could cause misthreading in odd cases where a big chunk of a thread is missing and the latest messages only referenced ghosts. We'll also save some cycles when scanning Xapian shards since docids won't be <= 0.
2021-10-10extindex: minor cost reductions
Don't bother decoding the 20-byte SHA-1 to a 40-byte hex value since we don't read it, anyways. We can also use the on-stack ibx->eidx_key value instead of dispatching the method again.
2021-10-10extindex: speed up Xapian cleanup in --gc
Avoiding repeated SQL statements brings --gc down to 2-3 minutes from around 10. We'll also add some checkpoints around over and xref3 cleanups.
2021-10-10set nodatacow on more SQLite files
We'll set nodatacow when detecting existing but empty files, and also their directories in more cases (for auxiliary -wal, -journal, -shm files). Hopefully this keeps performance reasonable on CoW FSes.
2021-10-10admin: add '# ' prefix for progress messages
It's more consistent with TAP output and hopefully puts users at ease in case they don't understand the meaning of a message.
2021-10-10lei_to_mail: show --output on augment progress failure
Just in case it fails when there's many parallel invocations.
2021-10-09extindex: support --reindex --fast
This mode only checks history for missed/stale messages and doesn't attempt to reindex messages which are already indexed.
2021-10-09view: save memory by dropping smsg->{from_name} on use
We'll also save a few LoC when generating it. $smsg objects can linger a while when rendering large threads, so saving a few bytes here can add up to several hundred KB saved. I noticed this while chasing the ref cycle leak in commit b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03). While there's no longer a leak, releasing memory earlier can allow it to be reused sooner and reduce both memory traffic and memory pressure.
2021-10-09http: avoid Perl target cache for psgi.input
By using syswrite to populate env->{psgi.input}. The substr() call IO::Handle->write will trigger Perl's target/scratchpad and result in a permanent allocation. Since this is a cold path, that allocation is pointless, and syswrite() can already write a substring. Allowing Perl to cache a large allocation in a cold path only result in fragmentation and wasted RAM. write(2) on a regular file won't result in short writes unless the FS quotas or free space limits are hit, or the buffer is close to overflowing (e.g. the 0x7ffff000-byte Linux limit). Since our HTTP server will never buffer that much in RAM, there's no need to retry syswrite nor rely on the retrying implicit in IO::Handle->write and the "print" perlop.
2021-10-09view: discard Eml->{bdy} when done using
We can release the raw body buffer once we've obtained a copy of the decoded buffer. This reduces memory pressure ahead of some expensive diff processing.
2021-10-09solver_git: shorten scalar lifetimes
Some of these scalar buffers may be large patches, so try to keep them as short-lived as possible to reduce memory pressure.
2021-10-09net_reader: hoist out _imap_fetch_bodies
We'll be supporting pipelining in a future commit, since Tor is too slow and increasing batch size can use too much memory.
2021-10-08git: fatalize async callback errors by default
This should help us catch BUG: errors (and then some) in -extindex and other read-write code paths. Only read-only daemons should warn on async callback failures, since those aren't capable of causing data loss.
2021-10-08git: async_abort includes --batch-check requests
We need to abort both check-only and cat requests when aborting, since we'll be aborting more aggressively in in read-write paths.
2021-10-08git: use async_wait_all everywhere
Some code paths may use maximum size checks, so ensure any checks are waited on, too.