about summary refs log tree commit homepage
path: root/lib/PublicInbox/IPC.pm
DateCommit message (Collapse)
2024-04-03treewide: avoid getpid() for OnDestroy checks
getpid() isn't cached by glibc nowadays and system calls are more expensive due to CPU vulnerability mitigations. To ensure we switch to the new semantics properly, introduce a new `on_destroy' function to simplify callers. Furthermore, most OnDestroy correctness is often tied to the process which creates it, so make the new API default to guarded against running in subprocesses. For cases which require running in all children, a new PublicInbox::OnDestroy::all call is provided.
2023-11-09ipc: simplify partial sendmsg fallback
In the rare case sendmsg(2) isn't able to send the full amount (due to buffers >=2GB on Linux), use print + (autodie)close to send the remainder and retry on EINTR. `substr' should be able to avoid a large malloc via offsets and CoW on modern Perl.
2023-10-18ds: introduce and use do_fork helper
This ensures we handle RNG reseeding and resetting the event loop properly in child processes after forking.
2023-10-08ipc: use autodie for most syscalls
I'm not sure how/if we should bother recovering from these, so just croak and let some caller deal with it. `autodie' uses Carp internally, so setting `PERL5OPT=-MCarp=verbose' in the environment gives us full stacktraces.
2023-10-08ipc: require fork+SOCK_SEQPACKET for wq_* functions
None of the lei internals works properly without forking and sockets. The fallback code increases the potential to accidentally call subs in the wrong process during the teardown phase. We'll still support ipc_do w/o forking for now since it forking doesn't benefit small indexing runs from -mda and such.
2023-10-06finalize DragonFlyBSD support
require_bsd and require_mods(':fcntl_lock') are now supported in TestCommon to make it easier to maintain than a big list of regexps. getsockopt for SO_ACCEPTFILTER seems to always succeed, even if the retrieved struct is all zeroes.
2023-10-06ipc: lower-level send_cmd/recv_cmd handle EINTR directly
This ensures script/lei $send_cmd usage is EINTR-safe (since I prefer to avoid loading PublicInbox::IPC for startup time). Overall, it saves us some code, too.
2023-10-04ds: don't pass FD map to post_loop_do callback
It's not used by any post_loop_do callbacks anymore, and the underlying FD map is a global `our' variable accessible from anywhere, anyways.
2023-09-20ipc: assume SOCK_SEQPACKET exists
The rest of our code does, and we haven't encountered a platform we'd care about without it.
2023-09-20drop GNU nproc(1) support in favor of getconf(1)
`getconf NPROCESSORS_ONLN' will succeed on GNU/Linux systems anyways; and the non-underscore-prefixed invocation works fine on all BSD flavors tested. Thus the `nproc' and `gnproc' attempts will never be reached. The only downside is we lose the ability to account for CPU affinity, but that's probably not an issue since CPU affinity (AFAIK) isn't a commonly-used feature.
2023-09-09ipc: define _SC_NPROCESSORS_ONLN for NetBSD
We'll reorganize this into a hash table for ease-of-reading.
2023-09-05xap_helper: support SIGTTIN+SIGTTOU worker adjustments
Being able to tune worker process counts on-the-fly when xap_helper gets used with -{netd,httpd,imapd} will be useful for tuning new setups.
2023-08-30treewide: drop MSG_EOR with AF_UNIX+SOCK_SEQPACKET
It's apparently not needed for AF_UNIX + SOCK_SEQPACKET as our receivers never check for MSG_EOR in "struct msghdr".msg_flags anyways. I don't believe POSIX is clear on the exact semantics of MSG_EOR on this socket type. This works around truncation problems on OpenBSD recvmsg when MSG_EOR is used by the sender. Link: https://marc.info/?i=20230826020759.M335788@dcvr
2023-08-19ipc: support _SC_NPROCESSORS_ONLN on OpenBSD
Tested on both amd64 and i386, and these constants tend to be architecture-independent.
2023-05-03ipc: get rid of lock support
SOCK_SEQPACKET is used whenever we care about parallel writes to a socket, so there's no need to mess with locks in userspace code.
2023-04-05ipc: support awaitpid in WQ workers
Using signalfd is necessary to get reliable signal wakeups w/o polling on fixed intervals. This change will make it possible to use awaitpid in cidx shard workers so they can perform prune work while waiting on the initial output of `git log -p'.
2023-03-26Merge branch 'cindex'
* cindex: (29 commits) cindex: --prune checkpoints to avoid OOM cindex: ignore SIGPIPE cindex: respect existing permissions cindex: squelch incompatible options cindex: implement reindex cindex: add support for --prune cindex: filter out non-existent git directories spawn: show failing directory for chdir failures cindex: improve granularity of quit checks cindex: attempt to give oldest commits lowest docids cindex: truncate or drop body for over-sized commits cindex: check for checkpoint before giant messages cindex: implement --max-size=SIZE sigfd: pass signal name rather than number to callback cindex: handle graceful shutdown by default cindex: drop `unchanged' progress message cindex: show shard number in progress message cindex: implement --exclude= like -clone ds: @post_loop_do replaces SetPostLoopCallback cindex: use DS and workqueues for parallelism ...
2023-03-25ds: @post_loop_do replaces SetPostLoopCallback
This allows us to avoid repeatedly using memory-intensive anonymous subs in CodeSearchIdx where the callback is assigned frequently. Anonymous subs are known to leak memory in old Perls (e.g. 5.16.3 in enterprise distros) and still expensive in newer Perls. So favor the (\&subroutine, @args) form which allows us to eliminate anonymous subs going forward. Only CodeSearchIdx takes advantage of the new API at the moment, since it's the biggest repeat user of post-loop callback changes. Getting rid of the subroutine and relying on a global `our' variable also has two advantages: 1) Perl warnings can detect typos at compile-time, whereas the (now gone) method could only detect errors at run-time. 2) `our' variable assignment can be `local'-ized to a scope
2023-03-25ipc: move nproc_shards from v2writable
We'll be using nproc_shards for indexing non-Inbox stuff.
2023-03-25ipc: retry sendmsg + recvmsg calls on EINTR
I'm not sure how this went undetected for so long, but EINTR must be checked for when working with blocking sockets. EINTR shouldn't happen for non-blocking sockets, though, but it's easier to just use the new wrapper in most of those places. I don't know what I was smoking when I left out EINTR checks :x
2023-01-30ipc: drop awaitpid_init to avoid circular refs
This brings t/lei-index.t back down from ~8 to ~3s. I didn't notice this before was because the LeiNoteEvent timer was firing every 5s and clearing circular refs and parallel testing meant the delay got hidden. Fixes: 4a2a95bbc78f99c8 (ipc+lei: switch to awaitpid, 2023-01-17)
2023-01-18ipc+lei: switch to awaitpid
This avoids awkwardly stuffing an arrayref into callbacks which expect multiple arguments. IPC->awaitpid_init now allows pre-registering callbacks before spawning workers.
2023-01-18ipc: drop unused $args from ->ipc_worker_stop
It's not used anywhere, and simplifies the next commit.
2023-01-18ipc: remove {-reap_async} field
We can just test for {-reap_do}, instead to save us a few bytes.
2022-07-20lei: avoid deadlock on inotify/EVFILT_VNODE wakeups
Enqueuing "note-event" requests from the DS event loop must not wait on workers being able to drain the queue quickly enough. Thus we make the SOCK_SEQPACKET writes nonblocking and rely on the lei-daemon event loop to enqueue writes. This is a unique problem for "note-event" since it reuses workers in between commands, while most lei commands currently fork off new workers.
2022-04-18lei: wire up pure Perl sendmsg/recvmsg for Linux users
This enables lei-daemon to work without Inline::C nor Socket::MsgHdr installed. Prior to this, only the `lei' client was using the pure Perl implementation. Either C implementation is still marginally faster, however.
2021-11-10ipc: note failing sub name
Hopefully problems can get diagnosed more quickly with the sub name in the error message.
2021-10-15lei + ipc: simplify process reaping
Simplify our APIs and force dwaitpid() to work in async mode for all lei workers. This avoids having lingering zombies for parallel searches if one worker finishes soon before another. The old distinction between "old" and "new" workers was needlessly complex, error-prone, and embarrasingly bad. We also never handled v2:// writers properly before on Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS to ensure they get handled by $lei when appropropriate.
2021-10-01ds: simplify signalfd use
Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-10-01ipc: run Net::SSLeay::randomize
Currently we don't use OpenSSL from child processes of parents which use OpenSSL, but we may in the future. So ensure OpenSSL initializes its PRNG after these forks to avoid one security pitfall down the line.
2021-09-22treewide: fix %SIG localization, harder
This fixes the occasional t/lei-sigpipe.t infinite loop under "make check-run". Link: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
2021-09-22ipc: do not add "0" to $0 of solo workers
It's needless noise and misleads users reading "ps" into thinking there's more workers when there's only one.
2021-09-19ipc: drop dynamic WQ process counts
In retrospect, I don't think it's needed; and trying to wire up a user interface for lei to manage process counts doesn't seem worthwhile. It could be resurrected for public-facing daemon use in the future, but that's what version control systems are for. This also lets us automatically avoid setting up broadcast sockets Followup-to: 7b7939d47b336fb7 ("lei: lock worker counts")
2021-09-19ipc: allow disabling broadcast for wq_workers
Since some lei worker classes only use a single worker, there's no sense in having broadcast for those cases.
2021-09-19ipc: wq_do: support synchronous waits and responses
This brings the wq_* SOCK_SEQPACKET API functionality on par with the ipc_do (pipe-based) API.
2021-08-18ipc: remove WQ_MAX_WORKERS
We no longer rely on IO::FDPass, so there's no longer a reason to limit this internally.
2021-08-14lei: diagnostics for /Document \d+ not found/ errors
This may help diagnose "Exception: Document \d+ not found" errors I'm seeing from "lei up" with HTTPS endpoints.
2021-05-25ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case
WQWorkers are limited roughly to MAX_ARG_STRLEN (the kernel limit of argv + environ) to avoid excessive memory growth. Occasionally, we need to send larger messages via workqueues that are too small to hit EMSGSIZE on the sender. This fixes "lei q" when using HTTP(S) externals, since that code path sends large Eml objects from lei_xsearch workers directly to lei2mail WQ workers.
2021-05-25ipc: avoid potential stack-not-refcounted bug
This fixes a potential problem with Carp::longmess firing somewhere deeper in the stack. This is not a known problem at this time, but something I noticed while chasing something else.
2021-03-20lei_store: initialize IPC lock properly
This was causing errors in a mass keyword import patch I'm working on.
2021-02-21ipc: support setting a locked number of WQ workers
We can use this to ensure sharded work doesn't do unexpected things if workers are added/removed. We currently don't increase/decrease workers once a workqueue is started, but non-lei code (-httpd/imapd) may start doing so. This also fixes a bug where lei2mail workers could not be adjusted via --jobs on the command-line.
2021-02-21ipc: add wq_broadcast
We'll give workqueues a broadcast mechanism to ensure all workers see a certain message. We'll also tag each worker with {-wq_worker_nr} in preparation for work distribution. This is intended to avoid extra connection and fork() costs from LeiAuth in a future commit.
2021-02-10tests|lei: fixes for TEST_RUN_MODE=0 and lei oneshot
DESTROY callbacks can clobber $?, so we must take care to preserve it when exiting. We'll also try to make an effort to ensure better DESTROY ordering and delete as much as possible before x_it finishes. We also need to load PublicInbox::Config when setting up public inboxes.
2021-02-08lei q: improve remote mboxrd UX + MUA
For early MUA spawners using lock-free outputs, we we need to on the startq pipe to silence progress reporting. For --augment users, we can start the MUA even earlier by creating Maildirs in the pre-augment phase. To improve progress reporting for non-MUA (or late-MUA) spawners, we'll no longer blindly append "--compressed" to the curl(1) command when POST-ing for the gzipped mboxrd. Furthermore, we'll overload stringify ('""') in LeiCurl to ensure the empty -d '' string shows up properly. v2: fix startq waiting with --threads mset_progress is never shown with early MUA spawning, The plan is to still show progress when augmenting and deduping. This fixes all local search cases. A leftover debug bit is dropped, too
2021-02-07lei: more consistent IPC exit and error handling
We're able to propagate $? from wq_workers in a consistent manner, now.
2021-02-07ipc: wq_do => wq_io_do
We will have a ->wq_do that doesn't pass FDs for I/O.
2021-02-07Revert "ipc: add support for asynchronous callbacks"
This reverts commit a7e6a8cd68fb6d700337d8dbc7ee2c65ff3d2fc1. It turns out to be unworkable in the face of multiple producer processes, since the lock we make has no effect when calculating pipe capacity.
2021-02-07ipc: trim down the Storable checks
It's distributed with Perl and our Makefile.PL even declares a dependency on it, just like Encode and all the Compress::* stuff.
2021-02-07ipc: do not die inside wq_worker child process
die() in a child zips up the stack into the parent, which is undesirable behavior. We're going to exit anyways, just warn and let exit(1) happen due to $@ being set.
2021-02-07lei add-external: handle interrupts with --mirror
This also updates lei_xsearch to follow the same pattern for stopping curl(1) and tail(1) processes it spawns.