Date | Commit message (Collapse) |
|
git multi-pack-index files were creating swap storms and OOM-ing
on my system; so providing an option to disable it seems prudent
given the minor startup time regression.
|
|
Localizing assignments to *STDERR doesn't seem to always work
with scalar (String) IO objects. Fortunately, doing actual dup2
redirects always seems reliable, so do that instead of
attempting to understand why PerlIO sometimes fails with the
assignment.
|
|
alarm(2) delivering SIGALRM seems sufficient for Xapian since
Xapian doesn't block signals (which would necessitate the use of
SIGKILL via RLIMIT_CPU hard limit). When Xapian gets stuck in
`D' state on slow storage, SIGKILL would not make a difference,
either (at least not on Linux).
Relying on RLIMIT_CPU is also trickier since we must account for
CPU time already consumed by a process for unrelated requests.
Thus we just rely on a simple alarm-based timeout. This also
avoids requiring the optional BSD::Resource module in the (mostly)
Perl implementation (and avoids potential bugs given my meager
arithmetic skills).
|
|
When read-only daemons reopen log files via SIGUSR1, be sure to
propagate it to Xapian helper processes to ensure old log files
can be closed and archived.
|
|
Only public-facing daemons use it, currently, and all
public-facing daemons will pre-spawn it as early as feasible.
lei will need it eventually to handle queries requiring C++,
but I'm not certain what path to take with lei, yet...
|
|
We should almost always be calling `check_build' instead of
`build'. Using ccache masked some of the overhead from
this, but various linker implementations are still slow.
|
|
Xapian helper processes are disabled by default once again.
However, they can be enabled via the new `-X INTEGER' parameter.
One big positive is the Xapian helpers being spawned by the
top-level daemon means they can be shared freely across all
workers for improved load balancing and memory reduction.
|
|
We need to be able to handle resource limitation errors in
public-facing daemons.
|
|
While existing callers are private (lei, *-index, -watch) are
private, we should not be blocking the event loop in
public-facing servers when we hit ETOOMANYREFS, ENOMEM, or
ENOBUFS.
|
|
We were already silently relying on v5.10 features (`//') and
all the regexps to work correctly with v5.12 unicode_strings.
|
|
Technically it's not required, but -compact blindly requires
DBD::SQLite at the moment since it was designed for inboxes in
mind. Furthermore, cindex isn't useful at the moment without
inboxes to associate with, and inboxes can't be indexed without
SQLite.
|
|
systemd setups may use role accounts (e.g. `news') with
XDG_CACHE_HOME unset and a non-existent HOME directory
which the user has no permission to create.
In those cases, fallback to using PERL_INLINE_DIRECTORY if
available for building the just-ahead-of-time C++ binary.
|
|
The C++ version of xap_helper will allow more complex and
expensive queries. Both the Perl and C++-only version will
allow offloading search into a separate process which can be
killed via ITIMER_REAL or RLIMIT_CPU in the face of overload.
The xap_helper `mset' command wrapper is simplified to
unconditionally return rank, percentage, and estimated matches
information. This may slightly penalize mbox retrievals and
lei users, but perhaps that can be a different command entirely.
|
|
This makes upcoming changes easier to understand.
|
|
Retrieving Xapian document terms, data (and possibly values) and
transferring to the Perl side would be an increase in complexity
and I/O both the Perl and C++ sides. It would require more I/O
in C++ and transient memory use on the Perl side where slow mset
iteration gives an opportunity to dictate memory release rate.
So lets ignore the document-related stuff here for now for
ease-of-development. We can reconsider this change if dropping
Xapian Perl bindings entirely and relying on JAOT C++ ever
becomes a possibility.
|
|
It's never straightforward to pick an ideal number of processes
for anything and Xapian helper processes are no exception since
there may be a massive disparities in CPU count and I/O
performance. So default to a single worker for now in the C++
version since that's the default is for the Perl/(XS|SWIG)
version, and also the same as for our normal public-facing
daemons.
This keeps the behavior between the Perl+(XS|SWIG) and C++
version as similar as possible.
|
|
It hasn't been used since 2016 when we started working on
improved streamability of gigantic responses.
Fixes: 95d4bf7aded4 (atom: switch to getline/close for response bodies, 2016-12-03)
|
|
The 131072 byte lower bound was the old default before the
sliding mmap window was introduced in modern glibc malloc.
While the sliding mmap window was intended to be faster by
reducing syscalls, zeroing and kernel overhead, it is also prone
to fragmentation from allocation patterns seen in evented Perl
servers.
Individual allocations over 128K are rare in our codebase since
there aren't many messages this large, making any performance
impact tiny. Furthermore, the reduction in fragmentation and
memory use will be a speedup for memory-constrained systems
since they can avoid swap and have more leftover for the page
cache.
|
|
Write barriers can take a long time to finish, especially when
commands are issues in parallel. So handle it asynchronously
without blocking lei-daemon by making EOFpipe a little more
flexible by supporting arguments to the callback function.
This is another step towards improving parallel use of lei.
|
|
Schedule a timer to stop shard workers and the git-cat-file
process after a `barrier' command. This allows us to save some
memory again when the lei-daemon is idle but preserves the fork
overhead reduction when issuing many commands in parallel or in
quick succession.
|
|
barrier (synchronous checkpoint) is better than ->done with
parallel lei commands being issued (via '&' or different
terminals), since repeatedly stopping and restarting processes
doesn't play nicely with expensive tasks like `lei reindex'.
This introduces a slight regression in maintaining more
processes (and thus resource use) when lei is idle, but that'll
be fixed in the next commit.
|
|
Since data going to git is the most important, always ensure
data is written to git before attempting to write anything to
SQLite or Xapian.
|
|
Large string processing + concurrency + caching/memoization
really brings out the worst in glibc malloc :<
|
|
Noticed while working on other things...
Fixes: 299aac294ec3 (lei: do label/keyword parsing in optparse, 2023-10-02)
|
|
We shouldn't attempt to reap a process again after it's been
reaped asynchronously in the SIGCHLD handler. Noticed while
working on changes to get lei/store to use checkpointing.
|
|
This should improve `lei blob' and `lei rediff' functionality
for folks relying on `lei index' and allows future work to
improve parallelism via checkpointing in lei/store.
|
|
We need these values in the PSGI $env to generate the cache key,
even if we're not linkifying anything.
Fixes: 48cbe0c3 (www: linkify inbox addresses in To/Cc headers, 2024-01-09)
|
|
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..."
added last year to support per-thread searches
764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30)
This only supports instances of public-inbox since 764035c83,
but unfortunately there hasn't been a release since then.
|
|
Noticed while trying to make other reliability improvements to
lei...
|
|
By reducing internal event loop iterations, this brings 300+
inboxes down ~32ms to ~27ms. It should also be more consistent
on servers with busy event loops since all the Xapian DB traffic
happens at once, theoretically mproving cache utilization.
|
|
This fixes compile errors on platforms we can't explicitly
support from pure Perl due to the lack of syscall stability
guarantees by the OS developers.
Reported-by: Gaelan Steele <gbs@canishe.com>
Tested-by: Gaelan Steele <gbs@canishe.com>
|
|
There are still some places where on_destroy isn't suitable,
This gets rid of getpid() calls in most of those cases to
reduce syscall costs and cleanup syscall trace output.
|
|
getpid() isn't cached by glibc nowadays and system calls are
more expensive due to CPU vulnerability mitigations. To
ensure we switch to the new semantics properly, introduce
a new `on_destroy' function to simplify callers.
Furthermore, most OnDestroy correctness is often tied to the
process which creates it, so make the new API default to
guarded against running in subprocesses.
For cases which require running in all children, a new
PublicInbox::OnDestroy::all call is provided.
|
|
PID guards for OnDestroy will be the default in an upcoming
change. In the meantime, LeiMirror was the only user and
didn't actually need it.
|
|
|
|
|
|
INSTALL now covers more of lei since I'm less uncomfortable
about it for 2.0 and points users towards the install/ helpers
if installing from source.
|
|
While PublicInbox::Config is responsible for some instances of
setting $git->{nick}, more PublicInbox::Git objects may be
created from loading the cindex and we should do our best to
reuse that memory, too.
Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04)
|
|
I may be mistaken, but I suspect the reason jemalloc handles
long-lived processes better than glibc is due to granularity
reduction being scaled to larger size classes. This can waste
20% of an individual allocation, but increases the likelyhood
of reuse (without splitting/consolidating into other sizes).
In other words, glibc seems to try too hard to make the best fit
for initial allocations. This ends up being suboptimal over
time as those allocations are freed and similar (but not
identical) allocations come in. jemalloc sacrifices the best
initial fit for better fits over a long process lifetime.
|
|
With my current mirror of lore + gko, this saves over 300K
allocations and brings the allocation count in this area down
to under 5K. The reduction in AV refs saves around 45MB RAM
according to measurements done live via Devel::Mwrap.
|
|
Wrap the entire solver command chain with a dedicated limiter.
The normal limiter is designed for longer-lived commands or ones
which serve a single HTTP request (e.g. git-http-backend or
cgit) and not effective for short memory + CPU intensive commands
used for solver.
Each overall solver request is both memory + CPU intensive: it
spawns several short-lived git processes(*) in addition to a
longer-lived `git cat-file --batch' process.
Thus running parallel solvers from a single -netd/-httpd worker
(which have their own parallelization) results in excessive
parallelism that is both memory and CPU-bound (not network-bound)
and cascade into slowdowns for handling simpler memory/CPU-bound
requests. Parallel solvers were also responsible for the
increased lifetime and frequency of zombies since the event loop
was too saturated to reap them.
We'll also return 503 on excessive solver queueing, since these
require an FD for the client HTTP(S) socket to be held onto.
(*) git (update-index|apply|ls-files) are all run by solver and
short-lived
|
|
Fortunately, this only affects `--multi-accept=' users, with
`--multi-accept=-1' users getting infinite loops.
I noticed this when EMFILE was reached on my setup, but any
error should cause us to give up accept(2) (at least
temporarily) and allow work for other items in the event loop to
be processed.
|
|
We must chomp the newline in the branch name if it's set.
Reported-by: Rob Herring <robh@kernel.org>
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
Fixes: 73830410e4336b77 (treewide: use run_qx where appropriate, 2023-10-27)
|
|
This allows accurate reporting of the error location and can be
made to dump a Perl backtrace via PERL5OPT='-MCarp=verbose'.
Noticed while tracking down fast-import failures.
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
|
|
Noticed while tracking down fast-import crash bug report.
Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
|
|
Inbox names, coderepo nicks, git_dir values are used heavily
as hash keys by the read-only coderepo WWW pieces.
Relying on CoW for mutable scalars on newer Perl doesn't work
well since CoW for those scalars are limited to 256 CoW references
and blow past that number when mapping thousands of coderepos
and inboxes to each other. Instead, make the hash key up-front
and get the resulting string to point directly to the pointer
used by the hash key.
|
|
It's not really relevant at the moment, but a sufficiently
smart implementation could eventually save some memory here.
Perl already optimizes in-place sort (@x = sort @x), so there's
precedent for a potential future where a Perl implementation
could generally optimize in-place operations for non-builtin
subroutines, too.
|
|
Repeatedly allocating an anonymous sub is an expensive operation
and a potential source of leaks in older Perl. Instead,
`local'-ize a global and use a permanent sub to workaround the
old Encode 2.87..3.12 leak.
|
|
We are just using the odd ref+deref (`${\...}') syntax and
don't need to calculate line numbers ourselves, nowadays.
|
|
While fast build times from -O0 is critical to my sanity when
actively working on C++, the files installed via package
managers or `make install' aren't likely to change frequently.
In that case, expensive -O2 optimizations make sense since the
10-20s saved from a single large --join more than covers the
cost of waiting on g++ to optimize.
|