about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-10-01daemon: make SO_ACCEPTFILTER a shared variable
Constant subroutines use more memory and there's no need to optimize it for inlining since it's only used at startup.
2021-10-01listener: switch to level-triggered epoll
On second thought, the ->requeue + accept retry code path isn't worth the userspace complexity and overhead. Level-triggered epoll has always annoyed me since it takes an inefficient code path in the kernel; but taking our less-efficient code path in Perl seems even worse. We also need to take load distribution into account for multi-worker systems.
2021-10-01doc: lei-security: some more updates
Virtual users will probably be used for read-write IMAP/JMAP support. The potential for various kernel/hardware bugs and attacks also needs to be highlighted.
2021-10-01search_view: various navigation tweaks
This improves the "&x=t" navigation between the thread overview (skeleton) section at the bottom and jumping back to the top for the mbox download form. The "--links below ..." text ought to be helpful for users unfamiliar with the /$MSGID/T/ and /$MSGID/t/ views.
2021-09-29git: shorten --git-dir= in CLI with chdir in spawn
Long pathnames are difficult to read and distinguish in ps(1) output. Deep paths can also slow down pathname resolution when dealing with loose objects, so we put "cat-file --batch" deeper into the directory tree. Since v2 processes are in the form of $INBOXDIR/all.git, keep the basename of $INBOXDIR in --git-dir= so it's easy to distinguish between processes just by looking at ps(1). While "git -C" also exists, it's only present in git 1.8.5+. We also need to keep in mind the "directory" pointed to by --git-dir= need not be a directory (nor a symlink pointing to one). This reduces pathname resolution overhead for v1 and v2 inbox git processes, but unfortunately not for extindex since that needs to store alternates as absolute paths.
2021-09-29ds: drop ::later support
add_uniq_timer seems sufficient, and we'll drop the last user of ::later (IMAP) and switch to unique timers.
2021-09-29ds: simplify idle time expiry, slightly
While it doesn't look like $EXPMAP can be populated in non-obvious ways via ->DESTROY, it still makes sense to keep it close to some of our other code around cleanup to reduce the likelyhood of subtle bugs in case semantics change..
2021-09-29t/solver_git: fix test to work with git <2.29
'git diff --abbrev=40' did not abbreviate /^index / lines of diff output with git <2.29, and 40 will be insufficient for SHA-256. --full-index has been around since 2005, so it's safe to rely on. Tested git version 2.20.0 (Debian buster). Fixes: 751df49e7db8ba77 ("lei rediff: add --drq and --dequote-only")
2021-09-29inbox: do not vivify {-repo_objs} during cleanup
This caused config->repo_objs to not fill in {-repo_objs} properly before starting solver. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87o88cqobd.fsf@kyleam.com/ Fixes: 63d7b8ceee55a34 ("daemons: revamp periodic cleanup task")
2021-09-29inbox: drop memoization/preload, cleanup expires caches
cloneurl, description, and base_url are no longer memoized. The non-$env form of base_url is rare in WWW, and is fast enough to not require memoization. cloneurl and description are now expired during cleanup, allowing admins to change these files without restarting (or SIGHUP). -altid_map is no longer cached nor memoized at all, since the endpoint(s) which hit it seem rarely accessed. nntp_url and imap_url are now cached (instead of memoized) in case an inbox is unvisited for a long time. They remain cached since the truthiness check gets called in every per-inbox HTML page, which can potentially be expensive.
2021-09-29inbox: rewrite cleanup to be more aggressive
Avoid relying on a giant cleanup hash and instead use the new DS->add_uniq_timer API to amortize the pause times associated with having to cleanup many inboxes. We can also use smaller intervals for this, as well. We now discard SQLite DB handles at cleanup. Each of these can use several megabytes of memory, which adds up with hundreds/thousands of inboxes. Since per-inbox access intervals are unpredictable and opening an SQLite handle is relatively inexpensive, release memory more aggressively to avoid the heap having to hit swap.
2021-09-29www: do not bump {over} refcnt on long responses
SQLite files may be replaced or removed by admins while generating a large threads or mailbox responses. Ensure we don't hold onto DBI handles and associated file descriptors past their cleanup.
2021-09-28www+httpd: lower priority of large mbox downloads
While each git blob request is treated fairly w.r.t other git blob requests, responses triggering thousands of git blob requests can still noticeably increase latency for less-expensive responses. Move large mbox results and the nasty all.mbox endpoint to a low priority queue which only fires once per-event loop iteration. This reduces the response time of short HTTP responses while many gigantic mboxes are being downloaded simultaneously, but still maximizes use of available I/O when there's no inexpensive HTTP responses happening. This only affects PublicInbox::WWW users who use public-inbox-httpd, not generic PSGI servers.
2021-09-28doc: lei-rediff: grammar fixes for --drq and --dequote-only
2021-09-27lei completion: workaround old Perl bug
While `$argv[-1]' is `undef' on an empty @argv, using `$argv[-1]' as a subroutine argument would fail incorrectly with: Modification of non-creatable array value attempted, subscript -1 at ... ...even though we'd never attempt to modify @_ itself in the subroutines being called. Work around the bug (tested on 5.16.3) by passing `undef' explicitly when `$argv[-1]' is already `undef'. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27t/lei-index: IMAP and NNTP dependencies are optional
"lei index" support for IMAP and NNTP is incomplete, so there's no point in requiring them. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27fetch: support running as root
The "-w" perlop always succeeds as root, so we need to check st_mode for writability bits to detect directories we shouldn't write to. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27t/cmd_ipc: allow extra errors and add diagnostics
Apparently, sendmsg can fail in less common ways when network buffers are gigantic. Add some diagnostics for future failures, as well. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27xt/net_writer_imap: env knobs for compress/debug/proxy
It can be useful to test with some of these, but we can't enable them universally for all servers (and debug + compress is gross)
2021-09-27config: get_1: use full parameter name
Instead of passing the prefix section and key separately, pass them together as is commonly done with git-config(1) usage as well as our ->get_all API. This inconsistency in the get_1 API is a needless footgun and confused me a bit while working on "lei up" the other week.
2021-09-27lei rediff: add --drq and --dequote-only
More switches which can be useful for users who pipe from text editors. --drq can be helpful while writing patch review email replies, and perhaps --dequote-only, too.
2021-09-27lei rediff: quiet warnings from Import and Eml
lei rediff is expected to see partial patch fragments and such, so silence warnings when something isn't exactly a valid email message.
2021-09-26net_reader: drop support for IgnoreSizeErrors option
Only the ->message_string method of Mail::IMAPClient uses it, and we have no intention of using ->message_string outside of tests.
2021-09-26lei: ensure refresh_watches isn't called from workers
Only the top-level lei-daemon will do inotify/kevent.
2021-09-26t/run.perl: less confusing error reporting
The $sigchld handler was reporting the last test (successful or not) for a given PID in case a worker dies prematurely. Instead, redisplay all failed test in $run_log to ensure the report only shows failed tests, and not the last started (and possibly successful) one.
2021-09-26inbox: cloneurl: avoid undef to hash table value
This saves us some memory for the hash slot in the common case the `cloneurl' file doesn't exist.
2021-09-26lei -f reply: fix Cc: header combining
When combining lines from To: and Cc: headers, ", " needs to be used to separate them.
2021-09-26www_listing: support /all/ search as a 302 redirect
This allows users to search /all/ from the top-level WwwListing without extra manual steps, although there's still extra network roundtrips incurred. No vertical whitespace is added, and there's no clumsy radio buttons nor menus to deal with. Users only have to use a different <input type=submit /> button. I forgot how to do this until I realized we already do something similar with multiple submit buttons for threaded vs non-threaded mboxrd.gz downloads. Link: https://public-inbox.org/meta/20210827120845.29682-1-e@80x24.org/
2021-09-26lei note-event: ignore kw_changed exceptions
The note-event worker may see changes before a Xapian shard commit happens, meaning keyword lookups fail as a result. Just emit the request to the lei/store worker since it's a fairly cheap operation at this point. We'll try harder to look for kw changes, too, since deduplication changes may lead to multiple docids being resolved for a single message.
2021-09-26search: avoid setting undef hashtable entries
`undef' entries still take up a slot in the hash table, and cause the `exists' check to false-positive in ->cleanup_shards. This should fully fix the (innocuous) messages introduced in commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
2021-09-26extmsg: search_partial: use ->isrch if available
This allows us to avoid creating ibx->{search}->{xdb} at this spot by using an `undef' value. This is a step towards eliminating the innocuous "/path/to/inboxdir/xap15 has no shards" messages introduced in commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
2021-09-25lei ls-external: split into separate file
This was written before we had auto-loading and rarely used.
2021-09-25lei add-external: split into separate file
Also was written before we had auto-loading and rarely used.
2021-09-25lei forget-external: split into separate file
This was written before we had auto-loading, and forget-external should be a rarely-used command that's not worth loading at startup. Do some golfing while we're in the area, too.
2021-09-25doc: lei-rm: remove unnecessary -F values
-F is really only useful for distinguishing between mbox variants and single message/rfc822 files. URLs and directory-based formats can be auto-detected easily enough.
2021-09-25lei: make pkt_op easier-to-use and understand
Since switching to SOCK_SEQUENTIAL, we no longer have to use fixed-width records to guarantee atomic reads. Thus we can maintain more human-readable/searchable PktOp opcodes. Furthermore, we can infer the subroutine name in many cases to avoid repeating ourselves by specifying a command-name twice (e.g. $ops->{CMD} => [ \&CMD, $obj ]; can now simply be written as: $ops->{CMD} => [ $obj ] if CMD is a method of $obj.
2021-09-25lei2mail: augment_inprogress: guard against closed FDs
I'm not sure what caused it, but $err was undef and caused print to fail, leading to an event loop error. Guard the timer with an eval and assume warn() can't trigger an event loop failure.
2021-09-25lei: restore old sigmask before daemon exit
If the event loop fails, we want blocking waitpid (wait4) calls to be interruptible with SIGTERM via "kill $PID" rather than SIGKILL. Though a failing event loop is something we should avoid...
2021-09-25lei up: show timezone offset with localtime
Sometimes a user (e.g. me) isn't really sure what timezone they're in...
2021-09-25doc: lei: manpages for export-kw and refresh-mail-sync
Something is better than nothing.
2021-09-25doc: lei-index: remove --stdin, reword -F
lei-index really only works for Maildir, at the moment.
2021-09-25doc: lei-overview: implicit stdin, correct Inline::C notes
Implicit stdin based on standard input being a pipe or regular file is here to stay, so save users the trouble of typing '-' or '--stdin'. Inline::C is required as of commit 1d6e1f9a6a66 (lei: require Socket::MsgHdr or Inline::C, drop oneshot, 2021-05-26); but Socket::MsgHdr still gives a noticeable improvement in bash completion speed. Also, spell-out "MESSAGE-ID" since "MID" is actually not a common abbreviation ("MSGID" is used by RFC 3977 and several other RFCs, I recall).
2021-09-25doc: lei blob+rediff+p2q: add notes about git directory
Try to clarify these commands are intended to be useful for git-using (usually software) projects (and not the bare git repos we use internally). We'll also document some commonly useful git-diff switches in the lei-rediff man page to highlight the usefulness of the command.
2021-09-25t/v2mirror: check dependencies for legacy test
We still need Email::MIME to test against old revisions. We'll also depend on the revision just prior to the manifest.js.gz introduction to avoid loading Danga::Socket, since it was getting loaded even with `plackup'. Finally, we'll disable Inline::C usage with old Spawn.pm since our old code included alloca.h, which is not portable to FreeBSD.
2021-09-24fetch: support v2 w/o manifest on old WWW
There may still be pre-manifest.js.gz versions of PublicInbox::WWW running and serving v2 inboxes. While -clone and "add-external --mirror" were working, -fetch was failing due to 301 redirect to $INBOX_URL/manifest.js.gz/ and not the expected 404. Update the code to deal with a JSON decode error (from the 301) and ensure v2 epochs detection is correct (and not using a shadowed variable).
2021-09-24clone|fetch|--mirror: cull manifest in partial mirrors
This makes it easier for users to enable fetching on a previously read-only epoch. Prior to this change, users were required to delete manifest.js.gz in addition to adding the writable bit. Now, they just have to "chmod +w $EPOCH_DIR".
2021-09-24clone|--mirror: fix and test against pre-manifest WWW
There may still be pre-manifest.js.gz versions of PublicInbox::WWW. running and serving v2 inboxes. Since $INBOX_URL/manifest.js.gz was not understood, it was assumed to be a Message-ID and 301-ed to "$INBOX_URL/manifest.js.gz/" with a trailing slash, so our 404 checks were invalid. Update our fallbacks to deal with 301 by catching JSON decoding errors to trigger HTML scraping. For HTML parsing, be sure to not be fooled by potential user-generated content and only scan the part after the last <hr>. We also need to avoid propagating $? from curl unnecessarily when we can continue safely. Finally, update v2mirror.t with tests to use PublicInbox::WWW from our "v1.1.0-pre1" tag to ensure these code paths get tested
2021-09-24fetch: fix skipping with multi-epoch inboxes
We need to check every epoch for writability, so don't break out of the loop when we find a URL.
2021-09-24clone|--mirror: support --epoch=RANGE for partial clones
Partial (v2) clones should be useful addition for users wanting to conserve storage while having fast access to recent messages. Continuing work started in 876e74283ff3 (fetch: ignore non-writable epoch dirs, 2021-09-17), this creates bare, read-only epoch git repos. These git repos have the remotes pre-configured, but does not fetch any objects. The goal is to allow users to set the writable bit on a previously-skipped epoch and start fetching it. Shell completion support may not be necessary given how short the epoch ranges are, here. Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
2021-09-23lei_xsearch: use localtime for user message
It's probably least confusing for user-facing messages to display times in the user's configured timezone. I considered appending "UTC" to the message and sticking with gmtime(), too, but this output isn't intended to be web-cache friendly nor expect users from across multiple timezones to view the same output.