about summary refs log tree commit homepage
path: root/lib/PublicInbox/Spawn.pm
DateCommit message (Collapse)
2024-04-28send_cmd4: make `tries' a per-call parameter
While existing callers are private (lei, *-index, -watch) are private, we should not be blocking the event loop in public-facing servers when we hit ETOOMANYREFS, ENOMEM, or ENOBUFS.
2024-01-30spawn: support some rlimit uses via Inline::C
BSD::Resource isn't packaged for Alpine (as of 3.19), but we also have optional Inline::C support and already rely on calling setrlimit(2) directly from the Inline::C version of pi_fork_exec.
2023-11-30spawn: drop IO layer support from redirects
When setting up stdin for commands, the write_file API is convenient enough nowadays to not be worth having special support with process spawning. When reading stdout of commands, we should probably be using utf8_maybe everywhere since there'll always be legacy encodings in git repos. Reading regular files with :utf8 also results in worse memory management since the file size cannot be used as a hint.
2023-11-13spawn: don't append to scalarrefs on stdout/stderr
None of our current code relies on it, and I can't imagine it's something we'd need in the future, actually... This keeps the door open for relying more on Spawn in TestCommon.
2023-11-13treewide: update read_all to avoid eof|close checks
read_all can be expanded to support FIFOs/pipes/sockets where read-until-EOF behavior is desired. We can also rely on wantarray to support splitting on EOL markers, but it's hard-coded to support only `$/ eq "\n"' since (AFAIK) it's the only way we use the wantarray form `readline'.
2023-11-09spawn: get rid of wantarray popen_rd/popen_wr
We've updated all of our users to use Process::IO (and avoiding tied handles) so the trade-off for using the array context no longer exists.
2023-11-03treewide: use eof and close to detect readline errors
readline (<FH>) isn't wrapped by autodie, and there's no way to know if read(2) errors truncated the readline output. IO::Handle->error isn't reliable on Perl < v5.34. Thus, combining the `eof' and `close' (combined with autodie) is the only way we can detect read(2) errors (injected via strace) when called via `readline' (aka <$fh>). Neither using `eof' nor `close' alone is sufficient, they must be combined to detect errors from buffered `readline'.
2023-11-03spawn: support PerlIO layer in scalar redirects
We have to deal with UTF-8 data for generating patches, so make it easier to pass Perl utf8 data to git, diff, sdiff, etc. to avoid "Wide character" warnings.
2023-11-03replace ProcessIO with untied PublicInbox::IO
This fixes two major problems with the use of tie for filehandles: * no way to do fcntl, stat, etc. calls directly on the tied handle, forcing callers to use the `tied' perlop to access the underlying IO::Handle * needing separate classes to handle blocking and non-blocking I/O As a result, Git->cleanup_if_unlinked, InputPipe->consume, and Qspawn->_yield_start have fewer bizzare bits and we can call `$io->blocking(0)' directly instead of `(tied *$io)->{fh}->blocking(0)' Having a PublicInbox::IO class will also allow us to support custom read buffering which allows inspecting the current state.
2023-11-03treewide: use ->close to call ProcessIO->CLOSE
This will open the door for us to drop `tie' usage from ProcessIO completely in favor of OO method dispatch. While OO method dispatches (e.g. `$fh->close') are slower than normal subroutine calls, it hardly matters in this case since process teardown is a fairly rare operation and we continue to use `close($fh)' for Maildir writes.
2023-10-28spawn: use readline instead of read for scalar redirects
Using `-s $fh' as the length arg for `read' is incorrect for :utf8 and other non-:raw file handles because `read' operates in *characters*, not bytes.
2023-10-28spawn: avoid alloca in C pi_fork_exec
We don't have thread-safety to worry about, so just leave a few allocations at process exit at worst. We'll also update some comments about usage while we're at it.
2023-10-28spawn: croak directly in C pi_fork_exec
This saves us some Perl code in the wrapper, since the SpawnPP implementation also dies directly.
2023-10-25spawn: support synchronous run_qx
This is similar to `backtick` but supports all our existing spawn functionality (chdir, env, rlimit, redirects, etc.). It also supports SCALAR ref redirects like run_script in our test suite for std{in,out,err}. We can probably use :utf8 by default for these redirects, even.
2023-10-11lei rediff: use ProcessIO for --drq support
This required fixing binmode support a few commits ago, along with properly enabling autoflush in popen_wr instead of setting it on the wrapper ProcessIO class.
2023-10-09spawn: reinstate directory existence check
We must not attempt to use Inline::C unless a user requests it (by creating the directory) or running lei. Fixes: ebdccd6666f9 (spawn: drop checks for directory writability)
2023-10-08process_io: pass args to awaitpid as list
Specifying {cb_args} in the options hash felt awkward to me. Instead, just use the Perl stack like we do with awaitpid() and pass the list down directly.
2023-10-08rename ProcessPipe to ProcessIO
Since we deal with pipes (of either direction) and bidirectional stream sockets for this class, it's better to remove the `Pipe' from the name and replace it with `IO' to communicate that it works for any form of IO::Handle-like object tied to a process.
2023-10-06ipc: lower-level send_cmd/recv_cmd handle EINTR directly
This ensures script/lei $send_cmd usage is EINTR-safe (since I prefer to avoid loading PublicInbox::IPC for startup time). Overall, it saves us some code, too.
2023-10-04spawn: drop checks for directory writability
It's a TOUTTOC bug to do stat or access checks, anyways, so just use the file and let autodie::sysopen PublicInbox::Lock take care of the rest.
2023-10-04spawn: use autodie and PublicInbox::Lock
It keeps Spawn.pm less noisy and ensures retries on EINTR.
2023-09-28spawn: add popen_wr support
This makes interesting parts of our code easier to read IMHO. We can take advantage of `local' while avoiding `fileno' calls since it's called in spawn() anyways to reduce LoC even further.
2023-09-26spawn: add run_wait to simplify spawn+waitpid use
It's basically the `system' perlop with support for env overrides, redirects, chdir, rlimits, and setpgid support.
2023-09-24ipc: recv_cmd4 clobbers destination buffer on errors
Handling this should be done at the lowest levels possible; so away from higher-level lei code.
2023-09-11spawn: do not block ABRT/BUS/ILL/SEGV signals
SIGABRT, SIGBUS, SIGILL, and SIGSEGV may all happen if we introduce bugs in the section where signals are blocked. We can delay handling of SIGFPE, SIGXCPU and SIGXFSZ since there's no floating point operations; while SIGXCPU and SIGXFSZ are safe to delay, especially in the absence of threads in our current code paths.
2023-08-28spawn: remove distracting empty line
If anything, it should have been before the $rlim declaration, not after, but the immediately preceding similar block has no empty line, either.
2023-08-28spawn: remove stray variable $ndc_err
Code that could be setting it was removed in 14fa0abdcc7b. Likewise for the double assignment to $err. Fixes: 14fa0abdcc7b ("rewrite Linux nodatacow use in pure Perl w/o system")
2023-03-25spawn: show failing directory for chdir failures
Our use of `git rev-parse --git-dir' depends on our (v)fork+exec wrapper doing chdir, so the error message is required to avoid user confusion. I'm still avoiding `git -C $DIR' for now since ancient versions of git did not support it.
2023-02-22sendmsg: prefix sleep message with `#'
It's an informative message that's harmless, so hopefully the `#' prefix puts the users mind at ease. (I saw it on an `lei import' against an IMAP source)
2023-01-18ds: introduce awaitpid, switch ProcessPipe users
awaitpid is the new API which will eventually replace dwaitpid. It enables early registration of callback handlers. Eventually (once dwaitpid is gone) it'll be able to use fewer waitpid calls. The avoidance of waitpid(-1) in our earlier days was driven by the belief that threads may eventually become relevant for Perl 5, but that's extremely unlikely at this stage. I will still introduce optional threads via C, but they definitely won't be spawning/reaping processes. Argument order to callbacks is swapped (PID first) to allow flattened multiple arguments more natrually. The previous API (allowing only a single argument, as influenced by pthread_create(3)) was more tedious as it involved packing multiple arguments into yet another array.
2022-12-24cleanup pure Perl use
This quiets down tests when the optional Inline::C is missing. We do not currently have a hard dependency on Inline::C; and we should not leave PERL_INLINE_DIRECTORY set in PublicInbox::Spawn if Inline fails to build. Leaving PERL_INLINE_DIRECTORY set by Spawn after it fails (due to missing Inline::C) would cause downstream failures in Gcf2 builds for the same reason. So we should bail out of the Gcf2 build early if Spawn already failed due to missing Inline::C. The only time we want to be noisy is if a user explicitly sets PERL_INLINE_DIRECTORY and Inline::C is missing. This reverts commit ad8acf7d6484d0a489499742cadadbd4f890ab53. ad8acf7d6484d0a4 (Gcf2: Create cache folder if missing, 2022-09-08)
2022-04-26lei: move to v5.12 to avoid "use strict"
Socket.pm still loads strict.pm, unfortunately, which hurts startup time; but we'll save some LoC this way.
2022-01-31rewrite Linux nodatacow use in pure Perl w/o system
btrfs is Linux-only at the moment (and likely to remain that way for practical purposes). So rely on Linux ABI stability and use the `syscall' and `ioctl' perlops rather than relying on Inline::C. Inline::C (and gcc||clang) are monstrous dependencies which we can't expect users to have. This makes supporting new architectures more difficult, but new architectures come along rarely and this reduces the burden for the majority of Linux users on popular architectures (while still avoiding the distribution of pre-built binaries). Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
2021-11-22spawn: avoid C++ keyword `try'
This is future-proofing in case we build against Xapian directly in the future, which would require a C++ compiler.
2021-10-23cmd_ipc4: retry sendmsg on ENOBUFS/ENOMEM/ETOOMANYREFS
I'm seeing ENOBUFS on a RAM-starved system, and slowing the sender down enough for the receiver to drain the buffers seems to work. ENOMEM and ETOOMANYREFS could be in the same boat as ENOBUFS. Watching for POLLOUT events via select/poll/epoll_wait doesn't seem to work, since the kernel can already sleep (or return EAGAIN) for cases where POLLOUT would work.
2021-09-14spawn+gcf2: improve diagnostics for build failures
I'm not sure why, but I noticed the one of my latest restarts of public-inbox-httpd wasn't loading the Inline::C .so for Gcf2 nor Spawn. I also can't reproduce the problem as both .so files are loaded fine on a restart with zero config changes. In any case, some extra, automatic diagnostics for build errors won't hurt, as no extra noise is introduced for successful builds. This will also make future development of C code more convenient, hopefully.
2021-02-24treewide: avoid "delete local" construct on hashes
Apparently this feature is only in Perl 5.12+, and we're still on Perl 5.10.
2021-02-07spawn: pi_fork_exec: support "pgid"
We'll be using this to allow the "git clone" process hierarchy to be killed via Ctrl-C. This also fixes a long-standing bug in error reporting for the Inline::C version, because we're actually testing for errors, now! n.b. strlen(3) is officially async-signal-safe as of POSIX.1-2016, but I can't think of a reason any previous implementation prior to that wouldn't be.
2021-02-07spawn: pi_fork_exec: restore parent sigmask in child
We continue to unblock SIGCHLD unconditionally, but also any signals not blocked by the parent (wq_worker). This will allow Ctrl-C (SIGINT) to stop "git clone" and allow git-clone cleanup to be performed and other long-running processes when pi_fork_exec supports setpgid(2). This won't affect existing daemons on systems with signalfd(2) or EVFILT_SIGNAL at all, since those run with signals blocked anyways.
2021-02-04spawn: merge common C code together
There'll probably be more things which work on both GNU and *BSD systems which we don't need separate strings for.
2021-01-30lei: less error-prone FD mapping
Keeping track of non-standard FDs gets tricky, so make it easier by relying on st_dev/st_ino mapping in the transmitted objects. We'll keep using numbers for the standard FDs since we need to be able to easily redirect them in the producer (main daemon) process for (gzip|bzip2|xz) if writing to a compressed mbox.
2021-01-26use defined-or in a few more places
Mainly around fork() calls, but some nearby places as well.
2021-01-26spawn: split() on regexp, not a literal string
It doesn't appear Perl (as of 5.32.x) has any internal optimization for splitting on a single-byte, so give it a regexp instead of letting it compile and discard a new one every single time.
2021-01-21lei q: start ->mset while query_prepare runs
We don't need the result of query_prepare (for augmenting or mass unlinking) until we're ready to deduplicate and write results to the filesystem. This ought to let us hide some of the cost of Xapian searches on multi-device/core systems for extremely expensive searches.
2021-01-18lei q: parallelize Maildir and mbox writing
With 4 dedicated workers, this seems to provide a 100-120% speedup on a 4 core machine when writing thousands of search results to a Maildir or mbox. This also sets us up for high-latency IMAP destinations in the future. This opens the door to more speedup opportunities such as optimizing dedupe locking and other ways to reduce contention. This change is fairly complex and convoluted, unfortunately. Further work may allow us to simplify it and even improve performance.
2021-01-14cmd_ipc: support + test EINTR + EAGAIN, no FDs
We'll ensure our {send,recv}_cmd4 implementations are consistent w.r.t. non-blocking and interrupted sockets. We'll also support receiving messages without FDs associated so we don't have to send dummy FDs to keep receivers from reporting EOF.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12ipc: start supporting sending/receiving more than 3 FDs
Actually, sending 4 FDs will be useful for lei internal xsearch work once we start accepting input from stdin. It won't be used with the lightweight lei(1) client, however. For WWW (eventually), a single FD may be enough.
2021-01-12cmd_ipc: send FDs with buffer payload
For another step in in syscall reduction, we'll support transferring 3 FDs and a buffer with a single sendmsg/recvmsg syscall using Socket::MsgHdr if available. Beyond script/lei itself, this will be used for internal IPC between search backends (perhaps with SOCK_SEQPACKET). There's a chance this could make it to the public-facing daemons, too. This adds an optional dependency on the Socket::MsgHdr package, available as libsocket-msghdr-perl on Debian-based distros (but not CentOS 7.x and FreeBSD 11.x, at least). Our Inline::C version in PublicInbox::Spawn remains the last choice for script/lei due to the high startup time, and IO::FDPass remains supported for non-Debian distros. Since the socket name prefix changes from 3 to 4, we'll also take this opportunity to make the argv+env buffer transfer less error-prone by relying on argc instead of designated delimiters.
2021-01-04lei: prefer IO::FDPass over our Inline::C recv_3fds
While our recv_3fds() implementation is more efficient syscall-wise, loading Inline takes nearly 50ms on my machine even after Inline::C memoizes the build. The current ~20ms in the fast path is barely acceptable to me, and 50ms would be unusable. Eventually, script/lei may invoke tcc(1) or cc(1) directly in the fast path, but it needs @INC for the slow path, at least. We'll encode the number of FDs into the socket name allow parallel installations, for now.