about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-10-06xt/perf-msgview: modernize, support TEST_BLOB
This helped me quickly reproduce a bug in Encode[1] and will help me determine performance implications of workarounds for the aforementioned bug. [1] https://rt.cpan.org/Public/Bug/Display.html?id=139622
2021-10-05index: --reindex w/ --{since,until,before,after}
This lets administrators reindex specific time ranges according to git "approxidate" formats. These arguments are passed directly to underlying git-log(1) invocations and may still reach into old epochs. Since these options rely on git committer dates (which we infer from the most recent Received: header), they are not guaranteed to be strictly tied to git history and it's possible to over/under-reindex some messages. It's probably not a major problem in practice, though; reindexing a few extra messages is generally harmless aside from some extra device wear. Since this currently relies on git-log, these options do not affect -extindex, yet.
2021-10-05extsearchidx: favor 20-byte OID comparison
As with most of our internal-only code, favor smaller comparisons to reduce memory traffic.
2021-10-05overidx: update comment for new sub name
`shard_remove_eidx_info' was made unnecessary with commit 82b805db3ad9 (searchidxshard: IPC conversion, part 2, 2021-01-03) and we now call `remove_eidx_info' directly.
2021-10-04{dir,inbox}idle: use level-triggered epoll
Both read(2) on inotify and kevent(2) return a finite amount of events. Let the kernel notify us again in cases where we'd need to retry instead of looping ourselves. This can prevent missed/delayed notifications while still ensuring fairness in busy event loops.
2021-10-04hl_mod: don't memoize highlight::codeGenerator objects
Making them immortal doesn't seem worth it, since doing immortal allocations after process startup leads to fragmentation. While the allocations made by highlight are small, those small allocations can break up contiguous regions and prevent consolidation by the malloc implementation. Since instantiating code generators doesn't seem too expensive, just use and delete them ASAP.
2021-10-04www: fix ref cycle from threading w/ extindex
Unlike v1 inboxes (which don't accept duplicate Message-IDs at all), and v2 inboxes (which generate a new Message-ID for duplicates), extindex must accept duplicate Message-IDs as-is. This was fine for storage, but prevented the reference-cycle mechanism of our message threading display algorithm from working reliably. It could no longer delete the ->{parent} field from clobbered entries in the %id_table. So we now take into account reused Message-IDs and never clobber entries in %id_table. Instead, we mark reused Message-IDs as "imposters" and special-case them by injecting them as children after all other threading is complete. This cycle was noticed using a pre-release of Devel::Mwrap::PSGI: https://80x24.org/mwrap-perl.git
2021-10-04t/thread-cycle: make Email::Simple optional
We only use it if Mail::Thread is available, and often it's not.
2021-10-02extsearchidx: emit diagnostics for missing blobs
I'm not sure why they weren't emitted, earlier.
2021-10-02content_hash: normalize whitespace before hashing addresses
This should prevent some false duplicates. I noticed this while implementing "lei mail-diff", and only noticed it when I implemented the ContentDigestDbg wrapper for mail-diff.
2021-10-02lei mail-diff: diagnostic command to diff mail contents
This is useful in finding the cause of deduplication bugs, and possibly the cause of missing threads reported by Konstantin in <20211001130527.z7eivotlgqbgetzz@meerkat.local> usage: u=https://yhbt.net/lore/all/87czop5j33.fsf@tynnyri.adurom.net/raw lei mail-diff $u
2021-10-02extsearchidx: attach_config: set {ibx_map} value to $ibx
It doesn't seem to matter, actually, but this matches the behavior of attach_inbox and the comment in ->new.
2021-10-02lei inspect: fix "mid:" prefix, expand to Xapian
This fixes inspect for uninitialized instances, and adds Xapian ("xdoc") output if available. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Message-ID: <20211001204943.l4yl6xvc45c5eapz@meerkat.local>
2021-10-02lei inspect: integerize "bytes" and "lines" fields
These are always numeric, but none of the Perl code cares; we want to prevent JSON from quoting them.
2021-10-02extsearchidx: do not process eidxq w/o config
When indexing a single inbox, do not attempt reindexing code paths without a full config, otherwise ordering comparisons won't work.
2021-10-02doc: lei-daemon: new manpage
In case users see "lei-daemon" in ps(1) or syslog and wonder. Helped-by: Kyle Meyer <kyle@kyleam.com>
2021-10-01ds: inline set_cloexec
I'm thinking we can drop support for Linux <2.6.27 soonish and just use EPOLL_CLOEXEC. Perl without signalfd (or EVFILT_SIGNAL) is miserable, actually.
2021-10-01inbox: keep DB handles if git processes are live
Having git processes outlive DB handles is likely to hurt from a fragmentation perspective if the DB handle needs to be recreated immediately due to a git->cat_async callback. So only unref DB handles when we're sure there's no live git users left, otherwise check the inodes. We'll also avoid needless localization checks in git->cleanup and make the return value more obvious since the pid fields are unconditionally deleted nowadays.
2021-10-01inbox: inline and eliminate git_cleanup
It was probably incorrect to use from max_git_epoch, and it's small enough to inline into do_cleanup. We'll also eliminate the unnecessary deletion of {-altid_map} while we're in the area, since we no longer cache/memoize that. Fixes: 7e5cea05f061e757 ("inbox: rewrite cleanup to be more aggressive")
2021-10-01ds: simplify signalfd use
Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-10-01ipc: run Net::SSLeay::randomize
Currently we don't use OpenSSL from child processes of parents which use OpenSSL, but we may in the future. So ensure OpenSSL initializes its PRNG after these forks to avoid one security pitfall down the line.
2021-10-01daemon: make SO_ACCEPTFILTER a shared variable
Constant subroutines use more memory and there's no need to optimize it for inlining since it's only used at startup.
2021-10-01listener: switch to level-triggered epoll
On second thought, the ->requeue + accept retry code path isn't worth the userspace complexity and overhead. Level-triggered epoll has always annoyed me since it takes an inefficient code path in the kernel; but taking our less-efficient code path in Perl seems even worse. We also need to take load distribution into account for multi-worker systems.
2021-10-01doc: lei-security: some more updates
Virtual users will probably be used for read-write IMAP/JMAP support. The potential for various kernel/hardware bugs and attacks also needs to be highlighted.
2021-10-01search_view: various navigation tweaks
This improves the "&x=t" navigation between the thread overview (skeleton) section at the bottom and jumping back to the top for the mbox download form. The "--links below ..." text ought to be helpful for users unfamiliar with the /$MSGID/T/ and /$MSGID/t/ views.
2021-09-29git: shorten --git-dir= in CLI with chdir in spawn
Long pathnames are difficult to read and distinguish in ps(1) output. Deep paths can also slow down pathname resolution when dealing with loose objects, so we put "cat-file --batch" deeper into the directory tree. Since v2 processes are in the form of $INBOXDIR/all.git, keep the basename of $INBOXDIR in --git-dir= so it's easy to distinguish between processes just by looking at ps(1). While "git -C" also exists, it's only present in git 1.8.5+. We also need to keep in mind the "directory" pointed to by --git-dir= need not be a directory (nor a symlink pointing to one). This reduces pathname resolution overhead for v1 and v2 inbox git processes, but unfortunately not for extindex since that needs to store alternates as absolute paths.
2021-09-29ds: drop ::later support
add_uniq_timer seems sufficient, and we'll drop the last user of ::later (IMAP) and switch to unique timers.
2021-09-29ds: simplify idle time expiry, slightly
While it doesn't look like $EXPMAP can be populated in non-obvious ways via ->DESTROY, it still makes sense to keep it close to some of our other code around cleanup to reduce the likelyhood of subtle bugs in case semantics change..
2021-09-29t/solver_git: fix test to work with git <2.29
'git diff --abbrev=40' did not abbreviate /^index / lines of diff output with git <2.29, and 40 will be insufficient for SHA-256. --full-index has been around since 2005, so it's safe to rely on. Tested git version 2.20.0 (Debian buster). Fixes: 751df49e7db8ba77 ("lei rediff: add --drq and --dequote-only")
2021-09-29inbox: do not vivify {-repo_objs} during cleanup
This caused config->repo_objs to not fill in {-repo_objs} properly before starting solver. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87o88cqobd.fsf@kyleam.com/ Fixes: 63d7b8ceee55a34 ("daemons: revamp periodic cleanup task")
2021-09-29inbox: drop memoization/preload, cleanup expires caches
cloneurl, description, and base_url are no longer memoized. The non-$env form of base_url is rare in WWW, and is fast enough to not require memoization. cloneurl and description are now expired during cleanup, allowing admins to change these files without restarting (or SIGHUP). -altid_map is no longer cached nor memoized at all, since the endpoint(s) which hit it seem rarely accessed. nntp_url and imap_url are now cached (instead of memoized) in case an inbox is unvisited for a long time. They remain cached since the truthiness check gets called in every per-inbox HTML page, which can potentially be expensive.
2021-09-29inbox: rewrite cleanup to be more aggressive
Avoid relying on a giant cleanup hash and instead use the new DS->add_uniq_timer API to amortize the pause times associated with having to cleanup many inboxes. We can also use smaller intervals for this, as well. We now discard SQLite DB handles at cleanup. Each of these can use several megabytes of memory, which adds up with hundreds/thousands of inboxes. Since per-inbox access intervals are unpredictable and opening an SQLite handle is relatively inexpensive, release memory more aggressively to avoid the heap having to hit swap.
2021-09-29www: do not bump {over} refcnt on long responses
SQLite files may be replaced or removed by admins while generating a large threads or mailbox responses. Ensure we don't hold onto DBI handles and associated file descriptors past their cleanup.
2021-09-28www+httpd: lower priority of large mbox downloads
While each git blob request is treated fairly w.r.t other git blob requests, responses triggering thousands of git blob requests can still noticeably increase latency for less-expensive responses. Move large mbox results and the nasty all.mbox endpoint to a low priority queue which only fires once per-event loop iteration. This reduces the response time of short HTTP responses while many gigantic mboxes are being downloaded simultaneously, but still maximizes use of available I/O when there's no inexpensive HTTP responses happening. This only affects PublicInbox::WWW users who use public-inbox-httpd, not generic PSGI servers.
2021-09-28doc: lei-rediff: grammar fixes for --drq and --dequote-only
2021-09-27lei completion: workaround old Perl bug
While `$argv[-1]' is `undef' on an empty @argv, using `$argv[-1]' as a subroutine argument would fail incorrectly with: Modification of non-creatable array value attempted, subscript -1 at ... ...even though we'd never attempt to modify @_ itself in the subroutines being called. Work around the bug (tested on 5.16.3) by passing `undef' explicitly when `$argv[-1]' is already `undef'. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27t/lei-index: IMAP and NNTP dependencies are optional
"lei index" support for IMAP and NNTP is incomplete, so there's no point in requiring them. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27fetch: support running as root
The "-w" perlop always succeeds as root, so we need to check st_mode for writability bits to detect directories we shouldn't write to. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27t/cmd_ipc: allow extra errors and add diagnostics
Apparently, sendmsg can fail in less common ways when network buffers are gigantic. Add some diagnostics for future failures, as well. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27xt/net_writer_imap: env knobs for compress/debug/proxy
It can be useful to test with some of these, but we can't enable them universally for all servers (and debug + compress is gross)
2021-09-27config: get_1: use full parameter name
Instead of passing the prefix section and key separately, pass them together as is commonly done with git-config(1) usage as well as our ->get_all API. This inconsistency in the get_1 API is a needless footgun and confused me a bit while working on "lei up" the other week.
2021-09-27lei rediff: add --drq and --dequote-only
More switches which can be useful for users who pipe from text editors. --drq can be helpful while writing patch review email replies, and perhaps --dequote-only, too.
2021-09-27lei rediff: quiet warnings from Import and Eml
lei rediff is expected to see partial patch fragments and such, so silence warnings when something isn't exactly a valid email message.
2021-09-26net_reader: drop support for IgnoreSizeErrors option
Only the ->message_string method of Mail::IMAPClient uses it, and we have no intention of using ->message_string outside of tests.
2021-09-26lei: ensure refresh_watches isn't called from workers
Only the top-level lei-daemon will do inotify/kevent.
2021-09-26t/run.perl: less confusing error reporting
The $sigchld handler was reporting the last test (successful or not) for a given PID in case a worker dies prematurely. Instead, redisplay all failed test in $run_log to ensure the report only shows failed tests, and not the last started (and possibly successful) one.
2021-09-26inbox: cloneurl: avoid undef to hash table value
This saves us some memory for the hash slot in the common case the `cloneurl' file doesn't exist.
2021-09-26lei -f reply: fix Cc: header combining
When combining lines from To: and Cc: headers, ", " needs to be used to separate them.
2021-09-26www_listing: support /all/ search as a 302 redirect
This allows users to search /all/ from the top-level WwwListing without extra manual steps, although there's still extra network roundtrips incurred. No vertical whitespace is added, and there's no clumsy radio buttons nor menus to deal with. Users only have to use a different <input type=submit /> button. I forgot how to do this until I realized we already do something similar with multiple submit buttons for threaded vs non-threaded mboxrd.gz downloads. Link: https://public-inbox.org/meta/20210827120845.29682-1-e@80x24.org/
2021-09-26lei note-event: ignore kw_changed exceptions
The note-event worker may see changes before a Xapian shard commit happens, meaning keyword lookups fail as a result. Just emit the request to the lei/store worker since it's a fairly cheap operation at this point. We'll try harder to look for kw changes, too, since deduplication changes may lead to multiple docids being resolved for a single message.