Date | Commit message (Collapse) |
|
While existing callers are private (lei, *-index, -watch) are
private, we should not be blocking the event loop in
public-facing servers when we hit ETOOMANYREFS, ENOMEM, or
ENOBUFS.
|
|
BSD::Resource isn't packaged for Alpine (as of 3.19), but we
also have optional Inline::C support and already rely on calling
setrlimit(2) directly from the Inline::C version of pi_fork_exec.
|
|
When setting up stdin for commands, the write_file API is
convenient enough nowadays to not be worth having special
support with process spawning.
When reading stdout of commands, we should probably be using
utf8_maybe everywhere since there'll always be legacy encodings
in git repos.
Reading regular files with :utf8 also results in worse memory
management since the file size cannot be used as a hint.
|
|
None of our current code relies on it, and I can't imagine it's
something we'd need in the future, actually... This keeps the
door open for relying more on Spawn in TestCommon.
|
|
read_all can be expanded to support FIFOs/pipes/sockets where
read-until-EOF behavior is desired. We can also rely on
wantarray to support splitting on EOL markers, but it's
hard-coded to support only `$/ eq "\n"' since (AFAIK)
it's the only way we use the wantarray form `readline'.
|
|
We've updated all of our users to use Process::IO (and avoiding
tied handles) so the trade-off for using the array context
no longer exists.
|
|
readline (<FH>) isn't wrapped by autodie, and there's no
way to know if read(2) errors truncated the readline output.
IO::Handle->error isn't reliable on Perl < v5.34.
Thus, combining the `eof' and `close' (combined with autodie) is
the only way we can detect read(2) errors (injected via strace)
when called via `readline' (aka <$fh>). Neither using `eof'
nor `close' alone is sufficient, they must be combined to detect
errors from buffered `readline'.
|
|
We have to deal with UTF-8 data for generating patches, so make
it easier to pass Perl utf8 data to git, diff, sdiff, etc. to
avoid "Wide character" warnings.
|
|
This fixes two major problems with the use of tie for filehandles:
* no way to do fcntl, stat, etc. calls directly on the tied handle,
forcing callers to use the `tied' perlop to access the underlying
IO::Handle
* needing separate classes to handle blocking and non-blocking I/O
As a result, Git->cleanup_if_unlinked, InputPipe->consume,
and Qspawn->_yield_start have fewer bizzare bits and we
can call `$io->blocking(0)' directly instead of
`(tied *$io)->{fh}->blocking(0)'
Having a PublicInbox::IO class will also allow us to support
custom read buffering which allows inspecting the current state.
|
|
This will open the door for us to drop `tie' usage from
ProcessIO completely in favor of OO method dispatch. While
OO method dispatches (e.g. `$fh->close') are slower than normal
subroutine calls, it hardly matters in this case since process
teardown is a fairly rare operation and we continue to use
`close($fh)' for Maildir writes.
|
|
Using `-s $fh' as the length arg for `read' is incorrect for
:utf8 and other non-:raw file handles because `read' operates
in *characters*, not bytes.
|
|
We don't have thread-safety to worry about, so just leave a few
allocations at process exit at worst. We'll also update some
comments about usage while we're at it.
|
|
This saves us some Perl code in the wrapper, since the SpawnPP
implementation also dies directly.
|
|
This is similar to `backtick` but supports all our existing spawn
functionality (chdir, env, rlimit, redirects, etc.). It also
supports SCALAR ref redirects like run_script in our test suite
for std{in,out,err}.
We can probably use :utf8 by default for these redirects, even.
|
|
This required fixing binmode support a few commits ago, along
with properly enabling autoflush in popen_wr instead of setting
it on the wrapper ProcessIO class.
|
|
We must not attempt to use Inline::C unless a user requests
it (by creating the directory) or running lei.
Fixes: ebdccd6666f9 (spawn: drop checks for directory writability)
|
|
Specifying {cb_args} in the options hash felt awkward to me.
Instead, just use the Perl stack like we do with awaitpid()
and pass the list down directly.
|
|
Since we deal with pipes (of either direction) and bidirectional
stream sockets for this class, it's better to remove the `Pipe'
from the name and replace it with `IO' to communicate that it
works for any form of IO::Handle-like object tied to a process.
|
|
This ensures script/lei $send_cmd usage is EINTR-safe (since
I prefer to avoid loading PublicInbox::IPC for startup time).
Overall, it saves us some code, too.
|
|
It's a TOUTTOC bug to do stat or access checks, anyways,
so just use the file and let autodie::sysopen PublicInbox::Lock
take care of the rest.
|
|
It keeps Spawn.pm less noisy and ensures retries on EINTR.
|
|
This makes interesting parts of our code easier to read IMHO.
We can take advantage of `local' while avoiding `fileno' calls
since it's called in spawn() anyways to reduce LoC even further.
|
|
It's basically the `system' perlop with support for env overrides,
redirects, chdir, rlimits, and setpgid support.
|
|
Handling this should be done at the lowest levels possible;
so away from higher-level lei code.
|
|
SIGABRT, SIGBUS, SIGILL, and SIGSEGV may all happen if we
introduce bugs in the section where signals are blocked.
We can delay handling of SIGFPE, SIGXCPU and SIGXFSZ since
there's no floating point operations; while SIGXCPU and
SIGXFSZ are safe to delay, especially in the absence of
threads in our current code paths.
|
|
If anything, it should have been before the $rlim declaration, not
after, but the immediately preceding similar block has no empty line,
either.
|
|
Code that could be setting it was removed in 14fa0abdcc7b.
Likewise for the double assignment to $err.
Fixes: 14fa0abdcc7b ("rewrite Linux nodatacow use in pure Perl w/o system")
|
|
Our use of `git rev-parse --git-dir' depends on our (v)fork+exec
wrapper doing chdir, so the error message is required to avoid
user confusion. I'm still avoiding `git -C $DIR' for now since
ancient versions of git did not support it.
|
|
It's an informative message that's harmless, so hopefully
the `#' prefix puts the users mind at ease.
(I saw it on an `lei import' against an IMAP source)
|
|
awaitpid is the new API which will eventually replace dwaitpid.
It enables early registration of callback handlers. Eventually
(once dwaitpid is gone) it'll be able to use fewer waitpid
calls.
The avoidance of waitpid(-1) in our earlier days was driven by
the belief that threads may eventually become relevant for Perl 5,
but that's extremely unlikely at this stage. I will still
introduce optional threads via C, but they definitely won't be
spawning/reaping processes.
Argument order to callbacks is swapped (PID first) to allow
flattened multiple arguments more natrually. The previous API
(allowing only a single argument, as influenced by
pthread_create(3)) was more tedious as it involved packing
multiple arguments into yet another array.
|
|
This quiets down tests when the optional Inline::C is missing.
We do not currently have a hard dependency on Inline::C; and we
should not leave PERL_INLINE_DIRECTORY set in PublicInbox::Spawn
if Inline fails to build.
Leaving PERL_INLINE_DIRECTORY set by Spawn after it fails (due
to missing Inline::C) would cause downstream failures in Gcf2
builds for the same reason. So we should bail out of the Gcf2
build early if Spawn already failed due to missing Inline::C.
The only time we want to be noisy is if a user explicitly sets
PERL_INLINE_DIRECTORY and Inline::C is missing.
This reverts commit ad8acf7d6484d0a489499742cadadbd4f890ab53.
ad8acf7d6484d0a4 (Gcf2: Create cache folder if missing, 2022-09-08)
|
|
Socket.pm still loads strict.pm, unfortunately, which hurts
startup time; but we'll save some LoC this way.
|
|
btrfs is Linux-only at the moment (and likely to remain that way
for practical purposes). So rely on Linux ABI stability and use
the `syscall' and `ioctl' perlops rather than relying on Inline::C.
Inline::C (and gcc||clang) are monstrous dependencies which we
can't expect users to have.
This makes supporting new architectures more difficult, but new
architectures come along rarely and this reduces the burden for
the majority of Linux users on popular architectures (while
still avoiding the distribution of pre-built binaries).
Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
|
|
This is future-proofing in case we build against Xapian directly
in the future, which would require a C++ compiler.
|
|
I'm seeing ENOBUFS on a RAM-starved system, and slowing the
sender down enough for the receiver to drain the buffers seems
to work. ENOMEM and ETOOMANYREFS could be in the same boat
as ENOBUFS.
Watching for POLLOUT events via select/poll/epoll_wait doesn't
seem to work, since the kernel can already sleep (or return
EAGAIN) for cases where POLLOUT would work.
|
|
I'm not sure why, but I noticed the one of my latest restarts of
public-inbox-httpd wasn't loading the Inline::C .so for Gcf2 nor
Spawn. I also can't reproduce the problem as both .so files are
loaded fine on a restart with zero config changes.
In any case, some extra, automatic diagnostics for build errors
won't hurt, as no extra noise is introduced for successful builds.
This will also make future development of C code more convenient,
hopefully.
|
|
Apparently this feature is only in Perl 5.12+, and we're
still on Perl 5.10.
|
|
We'll be using this to allow the "git clone" process hierarchy
to be killed via Ctrl-C. This also fixes a long-standing bug
in error reporting for the Inline::C version, because we're
actually testing for errors, now!
n.b. strlen(3) is officially async-signal-safe as of
POSIX.1-2016, but I can't think of a reason any previous
implementation prior to that wouldn't be.
|
|
We continue to unblock SIGCHLD unconditionally, but also
any signals not blocked by the parent (wq_worker).
This will allow Ctrl-C (SIGINT) to stop "git clone" and allow
git-clone cleanup to be performed and other long-running
processes when pi_fork_exec supports setpgid(2). This won't
affect existing daemons on systems with signalfd(2) or
EVFILT_SIGNAL at all, since those run with signals blocked
anyways.
|
|
There'll probably be more things which work on both GNU and
*BSD systems which we don't need separate strings for.
|
|
Keeping track of non-standard FDs gets tricky, so make it easier
by relying on st_dev/st_ino mapping in the transmitted objects.
We'll keep using numbers for the standard FDs since we need to
be able to easily redirect them in the producer (main daemon)
process for (gzip|bzip2|xz) if writing to a compressed mbox.
|
|
Mainly around fork() calls, but some nearby places as well.
|
|
It doesn't appear Perl (as of 5.32.x) has any internal
optimization for splitting on a single-byte, so give it
a regexp instead of letting it compile and discard a
new one every single time.
|
|
We don't need the result of query_prepare (for augmenting or
mass unlinking) until we're ready to deduplicate and write
results to the filesystem. This ought to let us hide some of
the cost of Xapian searches on multi-device/core systems for
extremely expensive searches.
|
|
With 4 dedicated workers, this seems to provide a 100-120%
speedup on a 4 core machine when writing thousands of search
results to a Maildir or mbox. This also sets us up for
high-latency IMAP destinations in the future.
This opens the door to more speedup opportunities such
as optimizing dedupe locking and other ways to reduce
contention.
This change is fairly complex and convoluted, unfortunately.
Further work may allow us to simplify it and even improve
performance.
|
|
We'll ensure our {send,recv}_cmd4 implementations are
consistent w.r.t. non-blocking and interrupted sockets.
We'll also support receiving messages without FDs associated
so we don't have to send dummy FDs to keep receivers from
reporting EOF.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
Actually, sending 4 FDs will be useful for lei internal xsearch
work once we start accepting input from stdin. It won't be used
with the lightweight lei(1) client, however.
For WWW (eventually), a single FD may be enough.
|
|
For another step in in syscall reduction, we'll support
transferring 3 FDs and a buffer with a single sendmsg/recvmsg
syscall using Socket::MsgHdr if available.
Beyond script/lei itself, this will be used for internal IPC
between search backends (perhaps with SOCK_SEQPACKET). There's
a chance this could make it to the public-facing daemons, too.
This adds an optional dependency on the Socket::MsgHdr package,
available as libsocket-msghdr-perl on Debian-based distros
(but not CentOS 7.x and FreeBSD 11.x, at least).
Our Inline::C version in PublicInbox::Spawn remains the last
choice for script/lei due to the high startup time, and
IO::FDPass remains supported for non-Debian distros.
Since the socket name prefix changes from 3 to 4, we'll also
take this opportunity to make the argv+env buffer transfer less
error-prone by relying on argc instead of designated delimiters.
|
|
While our recv_3fds() implementation is more efficient
syscall-wise, loading Inline takes nearly 50ms on my machine
even after Inline::C memoizes the build. The current ~20ms in
the fast path is barely acceptable to me, and 50ms would be
unusable.
Eventually, script/lei may invoke tcc(1) or cc(1) directly in
the fast path, but it needs @INC for the slow path, at least.
We'll encode the number of FDs into the socket name allow
parallel installations, for now.
|