Date | Commit message (Collapse) |
|
getpid() isn't cached by glibc nowadays and system calls are
more expensive due to CPU vulnerability mitigations. To
ensure we switch to the new semantics properly, introduce
a new `on_destroy' function to simplify callers.
Furthermore, most OnDestroy correctness is often tied to the
process which creates it, so make the new API default to
guarded against running in subprocesses.
For cases which require running in all children, a new
PublicInbox::OnDestroy::all call is provided.
|
|
In the rare case sendmsg(2) isn't able to send the full amount
(due to buffers >=2GB on Linux), use print + (autodie)close
to send the remainder and retry on EINTR. `substr' should be
able to avoid a large malloc via offsets and CoW on modern Perl.
|
|
This ensures we handle RNG reseeding and resetting the event
loop properly in child processes after forking.
|
|
I'm not sure how/if we should bother recovering from these,
so just croak and let some caller deal with it. `autodie'
uses Carp internally, so setting `PERL5OPT=-MCarp=verbose'
in the environment gives us full stacktraces.
|
|
None of the lei internals works properly without forking and
sockets. The fallback code increases the potential to accidentally
call subs in the wrong process during the teardown phase.
We'll still support ipc_do w/o forking for now since it
forking doesn't benefit small indexing runs from -mda and
such.
|
|
require_bsd and require_mods(':fcntl_lock') are now
supported in TestCommon to make it easier to maintain
than a big list of regexps.
getsockopt for SO_ACCEPTFILTER seems to always succeed,
even if the retrieved struct is all zeroes.
|
|
This ensures script/lei $send_cmd usage is EINTR-safe (since
I prefer to avoid loading PublicInbox::IPC for startup time).
Overall, it saves us some code, too.
|
|
It's not used by any post_loop_do callbacks anymore, and the
underlying FD map is a global `our' variable accessible from
anywhere, anyways.
|
|
The rest of our code does, and we haven't encountered a platform
we'd care about without it.
|
|
`getconf NPROCESSORS_ONLN' will succeed on GNU/Linux systems
anyways; and the non-underscore-prefixed invocation works fine
on all BSD flavors tested.
Thus the `nproc' and `gnproc' attempts will never be reached.
The only downside is we lose the ability to account for CPU
affinity, but that's probably not an issue since CPU affinity
(AFAIK) isn't a commonly-used feature.
|
|
We'll reorganize this into a hash table for ease-of-reading.
|
|
Being able to tune worker process counts on-the-fly when
xap_helper gets used with -{netd,httpd,imapd} will be useful
for tuning new setups.
|
|
It's apparently not needed for AF_UNIX + SOCK_SEQPACKET as our
receivers never check for MSG_EOR in "struct msghdr".msg_flags
anyways. I don't believe POSIX is clear on the exact semantics
of MSG_EOR on this socket type. This works around truncation
problems on OpenBSD recvmsg when MSG_EOR is used by the sender.
Link: https://marc.info/?i=20230826020759.M335788@dcvr
|
|
Tested on both amd64 and i386, and these constants tend to be
architecture-independent.
|
|
SOCK_SEQPACKET is used whenever we care about parallel writes to
a socket, so there's no need to mess with locks in userspace
code.
|
|
Using signalfd is necessary to get reliable signal wakeups w/o
polling on fixed intervals. This change will make it possible
to use awaitpid in cidx shard workers so they can perform prune
work while waiting on the initial output of `git log -p'.
|
|
* cindex: (29 commits)
cindex: --prune checkpoints to avoid OOM
cindex: ignore SIGPIPE
cindex: respect existing permissions
cindex: squelch incompatible options
cindex: implement reindex
cindex: add support for --prune
cindex: filter out non-existent git directories
spawn: show failing directory for chdir failures
cindex: improve granularity of quit checks
cindex: attempt to give oldest commits lowest docids
cindex: truncate or drop body for over-sized commits
cindex: check for checkpoint before giant messages
cindex: implement --max-size=SIZE
sigfd: pass signal name rather than number to callback
cindex: handle graceful shutdown by default
cindex: drop `unchanged' progress message
cindex: show shard number in progress message
cindex: implement --exclude= like -clone
ds: @post_loop_do replaces SetPostLoopCallback
cindex: use DS and workqueues for parallelism
...
|
|
This allows us to avoid repeatedly using memory-intensive
anonymous subs in CodeSearchIdx where the callback is assigned
frequently. Anonymous subs are known to leak memory in old
Perls (e.g. 5.16.3 in enterprise distros) and still expensive in
newer Perls. So favor the (\&subroutine, @args) form which
allows us to eliminate anonymous subs going forward.
Only CodeSearchIdx takes advantage of the new API at the moment,
since it's the biggest repeat user of post-loop callback
changes.
Getting rid of the subroutine and relying on a global `our'
variable also has two advantages:
1) Perl warnings can detect typos at compile-time, whereas the
(now gone) method could only detect errors at run-time.
2) `our' variable assignment can be `local'-ized to a scope
|
|
We'll be using nproc_shards for indexing non-Inbox stuff.
|
|
I'm not sure how this went undetected for so long, but EINTR
must be checked for when working with blocking sockets. EINTR
shouldn't happen for non-blocking sockets, though, but it's
easier to just use the new wrapper in most of those places.
I don't know what I was smoking when I left out EINTR checks :x
|
|
This brings t/lei-index.t back down from ~8 to ~3s. I didn't
notice this before was because the LeiNoteEvent timer was firing
every 5s and clearing circular refs and parallel testing meant
the delay got hidden.
Fixes: 4a2a95bbc78f99c8 (ipc+lei: switch to awaitpid, 2023-01-17)
|
|
This avoids awkwardly stuffing an arrayref into callbacks
which expect multiple arguments. IPC->awaitpid_init now
allows pre-registering callbacks before spawning workers.
|
|
It's not used anywhere, and simplifies the next commit.
|
|
We can just test for {-reap_do}, instead to save us a few bytes.
|
|
Enqueuing "note-event" requests from the DS event loop must
not wait on workers being able to drain the queue quickly
enough. Thus we make the SOCK_SEQPACKET writes nonblocking
and rely on the lei-daemon event loop to enqueue writes.
This is a unique problem for "note-event" since it reuses
workers in between commands, while most lei commands currently
fork off new workers.
|
|
This enables lei-daemon to work without Inline::C nor
Socket::MsgHdr installed. Prior to this, only the `lei' client
was using the pure Perl implementation. Either C implementation
is still marginally faster, however.
|
|
Hopefully problems can get diagnosed more quickly with
the sub name in the error message.
|
|
Simplify our APIs and force dwaitpid() to work in async mode for
all lei workers. This avoids having lingering zombies for
parallel searches if one worker finishes soon before another.
The old distinction between "old" and "new" workers was
needlessly complex, error-prone, and embarrasingly bad.
We also never handled v2:// writers properly before on
Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS
to ensure they get handled by $lei when appropropriate.
|
|
Since signalfd is often combined with our event loop, give it a
convenient API and reduce the code duplication required to use it.
EventLoop is replaced with ::event_loop to allow consistent
parameter passing and avoid needlessly passing the package name
on stack.
We also avoid exporting SFD_NONBLOCK since it's the only flag we
support. There's no sense in having the memory overhead of a
constant function when it's in cold code.
|
|
Currently we don't use OpenSSL from child processes of parents
which use OpenSSL, but we may in the future. So ensure OpenSSL
initializes its PRNG after these forks to avoid one security
pitfall down the line.
|
|
This fixes the occasional t/lei-sigpipe.t infinite loop
under "make check-run".
Link: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
|
|
It's needless noise and misleads users reading "ps" into
thinking there's more workers when there's only one.
|
|
In retrospect, I don't think it's needed; and trying to wire up
a user interface for lei to manage process counts doesn't seem
worthwhile. It could be resurrected for public-facing daemon
use in the future, but that's what version control systems are for.
This also lets us automatically avoid setting up broadcast
sockets
Followup-to: 7b7939d47b336fb7 ("lei: lock worker counts")
|
|
Since some lei worker classes only use a single worker,
there's no sense in having broadcast for those cases.
|
|
This brings the wq_* SOCK_SEQPACKET API functionality
on par with the ipc_do (pipe-based) API.
|
|
We no longer rely on IO::FDPass, so there's no longer a reason
to limit this internally.
|
|
This may help diagnose "Exception: Document \d+ not found"
errors I'm seeing from "lei up" with HTTPS endpoints.
|
|
WQWorkers are limited roughly to MAX_ARG_STRLEN (the kernel
limit of argv + environ) to avoid excessive memory growth.
Occasionally, we need to send larger messages via workqueues
that are too small to hit EMSGSIZE on the sender.
This fixes "lei q" when using HTTP(S) externals, since that
code path sends large Eml objects from lei_xsearch workers
directly to lei2mail WQ workers.
|
|
This fixes a potential problem with Carp::longmess
firing somewhere deeper in the stack. This is not a known
problem at this time, but something I noticed while chasing
something else.
|
|
This was causing errors in a mass keyword import patch
I'm working on.
|
|
We can use this to ensure sharded work doesn't do unexpected
things if workers are added/removed. We currently don't
increase/decrease workers once a workqueue is started, but
non-lei code (-httpd/imapd) may start doing so.
This also fixes a bug where lei2mail workers could not
be adjusted via --jobs on the command-line.
|
|
We'll give workqueues a broadcast mechanism to ensure all
workers see a certain message. We'll also tag each worker
with {-wq_worker_nr} in preparation for work distribution.
This is intended to avoid extra connection and fork() costs
from LeiAuth in a future commit.
|
|
DESTROY callbacks can clobber $?, so we must take care to
preserve it when exiting. We'll also try to make an effort to
ensure better DESTROY ordering and delete as much as possible
before x_it finishes.
We also need to load PublicInbox::Config when setting up
public inboxes.
|
|
For early MUA spawners using lock-free outputs, we we need to
on the startq pipe to silence progress reporting. For
--augment users, we can start the MUA even earlier by
creating Maildirs in the pre-augment phase.
To improve progress reporting for non-MUA (or late-MUA)
spawners, we'll no longer blindly append "--compressed" to the
curl(1) command when POST-ing for the gzipped mboxrd.
Furthermore, we'll overload stringify ('""') in LeiCurl to
ensure the empty -d '' string shows up properly.
v2: fix startq waiting with --threads
mset_progress is never shown with early MUA spawning,
The plan is to still show progress when augmenting and
deduping. This fixes all local search cases.
A leftover debug bit is dropped, too
|
|
We're able to propagate $? from wq_workers in a consistent
manner, now.
|
|
We will have a ->wq_do that doesn't pass FDs for I/O.
|
|
This reverts commit a7e6a8cd68fb6d700337d8dbc7ee2c65ff3d2fc1.
It turns out to be unworkable in the face of multiple producer
processes, since the lock we make has no effect when calculating
pipe capacity.
|
|
It's distributed with Perl and our Makefile.PL even declares a
dependency on it, just like Encode and all the Compress::*
stuff.
|
|
die() in a child zips up the stack into the parent, which is
undesirable behavior. We're going to exit anyways, just warn
and let exit(1) happen due to $@ being set.
|
|
This also updates lei_xsearch to follow the same pattern for
stopping curl(1) and tail(1) processes it spawns.
|