Date | Commit message (Collapse) |
|
We'll save ourselves some code here and let the kernel do more
work, instead.
|
|
Large chunks of our codebase and 3rd-party dependencies do not
use ->{psgi.errors}, so trying to standardize on it was a
fruitless endeavor. Since warn() and carp() are standard
mechanism within Perl, just use that instead and simplify a
bunch of existing code.
|
|
Since signalfd is often combined with our event loop, give it a
convenient API and reduce the code duplication required to use it.
EventLoop is replaced with ::event_loop to allow consistent
parameter passing and avoid needlessly passing the package name
on stack.
We also avoid exporting SFD_NONBLOCK since it's the only flag we
support. There's no sense in having the memory overhead of a
constant function when it's in cold code.
|
|
ProcessPipe has a built-in mechanism to prevent siblings from
reaping children.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
This simplifies our code and provides a more consistent API for
error handling. PublicInbox::DS can be loaded nowadays on all
*BSDs and Linux distros easily without extra packages to
install.
The downside is possibly increased startup time, but it's
probably not as a big problem with lei being a daemon
(and -mda possibly following suite).
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
This will allow us to gzip responses generated by cgit
and any other CGI programs or long-lived streaming
responses we may spawn.
|
|
Making the RLIMITS list a function doesn't allow constant
folding, so just make it an array accessible to other modules.
|
|
It seems no longer necessary to workaround this Perl 5.16.3 bug
after the removal of anonymous subs from all of our internal
code in
https://public-inbox.org/meta/20191225075104.22184-1-e@80x24.org/
Tested with repeated clones (both aborted and completed)
in a CentOS 7.x VM which was once able to reproduce leaks
before the workaround appeared in 2fc42236f72ad16a
("qspawn: workaround Perl 5.16.3 leak, re-enable Deflater")
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
|
|
User-supplied callbacks may fail, so capture the error instead
of propagating it up the stack into the public-inbox-httpd event
loop.
|
|
As sqlite3(1) and other executables may become unavailable or
uninstalled while a daemon runs, we need to gracefully handle
errors in those cases.
|
|
We'll be supporting gzipped from sqlite3(1) dumps
for altid files in future commits.
In the future (and if we survive), we may replace
Plack::Middleware::Deflater with our own GzipFilter to work
better with asynchronous responses without relying on
memory-intensive anonymous subs.
|
|
I didn't wait until September to do it, this year!
|
|
Perl 5.14+ gained the ability to autoload IO::File
(and IO::Handle) on missing methods, so relying on
this breaks under 5.10.1.
There's no reason to load IO::File or IO::Handle
when built-in perlops work fine and are even a hair
faster.
|
|
popen_rd dies on pipe()/pipe2() failure due to FD exhaustion.
EPOLL_CTL_ADD (via PublicInbox::HTTPD::Async->new) may also fail
due to memory exhaustion or exceeding the value of
/proc/sys/fs/epoll/max_user_watches
|
|
solver can spawn multiple processes per HTTP request, but
"git apply" failures are needlessly noisy due to corrupt
patches. We also don't want to silence "git ls-files"
or "git update-index" errors using $env->{'qspawn.quiet'},
either, so this granularity is needed.
Admins can check for 500 errors in access logs to detect
(and reproduce) solver failures, anyways, so there's no
need to log every time "git apply" rejects a corrupt patch.
|
|
Callers can supply an arg to parse_hdr, now, eliminating the
need for closures to capture local variables.
|
|
This feature was added in preparation for future changes
that have yet to materialize after nearly 3 years. We
can re-add it if needed in the future.
|
|
We can follow what we did in psgi_return to make psgi_qx
allocate less memory on each call.
|
|
Instead of just passing the rpipe to the start_cb, pass the
entire qspawn ref to start_cb. Update existing callers to
avoid circular refs.
|
|
We can take advantage of HTTPD::Async being able to pass
user-supplied args to callbacks to get rid of one (of many)
anonymous subs in the code path.
|
|
rd_hdr() now becomes a named subroutine instead of a per-call
local variable, so kilobytes of memory will not have to be
allocated for it on every ->psgi_return call.
|
|
This will tie into the DS event loop if that's used, but
event_step an be called directly without relying on the
event loop from Apache or other HTTP servers (or PSGI tests).
|
|
Make things easier-to-follow and paves the way for future work
to reduce dependencies on anonymous subs capturing local variables.
|
|
By passing a user-supplied arg to $qx_cb, we can eliminate the
callers' need to capture on-stack variables with a closure.
This saves several kilobytes of memory allocation at the expense
of some extra hash table lookups in user-supplied callbacks. It
also reduces the risk of memory leaks by eliminating a common
source of circular references.
|
|
Another step towards removing anonymous subs to eliminate
a possible source of memory leaks and high memory use.
|
|
We need to detect "git apply" failures reliably when patches
fail. This is necessary for solving for blob 81c1164ae5 in
https://public-inbox.org/git/ when at least two messages can
solve for it (and one of them fails):
1. https://public-inbox.org/git/b9fb52b8-8168-6bf0-9a72-1e6c44a281a5@oracle.com/
2. https://public-inbox.org/git/56664222-6c29-09dc-ef78-7b380b113c4a@oracle.com/
|
|
The httpd-supplied write callback is the leak culprit under Perl
5.16.3. undef-ing it immediately after use keeps a repeated
"git fetch" loop from monotonically increasing memory and FD use
on the Perl shipped with RHEL/CentOS 7.x.
Other endpoints tested showed no increase in memory use under
constant load with "ab -HAccept-Encoding:gzip -k", including the
async psgi_qx code path used by $INBOX_URL/$OBJECT_ID/s/ via
SolverGit module.
|
|
Naming $start_cb consistently helps avoid confusing new readers,
and some comments will help with understanding flow
|
|
All of these circular references are designed to clear
themselves, but these will make actual errors from Devel::Cycle
easier-to-spot.
The circular reference in the limiter {run_queue} is not a real
problem, but we can avoid storing the circular reference until
we actually need to spawn the child, reducing the size of the
Qspawn object while it's in the queue, slightly.
We also do not need to have redundant checks to spawn new
processes, we should only spawn new processes when they're
->start-ed or after waitpid reaps them.
|
|
Generic PSGI servers have $env->{'psgi.errors'}, too,
so ensure they can log errors.
|
|
We don't use the return value in real code since we do waitpid
asynchronously, now. So simplify our runtime code at the cost
of making our test slighly more complex.
|
|
We don't need to hold onto the subprocess environ and
redirects/options for popen_rd after spawning the child process.
I do not expect this to fix problem of leaking unlinked regular
file descriptors (which I still can't reproduce), and it
definitely does not fix the problem of leaking pipe descriptors
(which I also can't reproduce).
This will save an FD sooner on non-public-inbox-httpd servers
which give a non-FD $env->{'psgi.input'}, however
Regardless, it's good to free up memory resources in our own
process ASAP we're done using them.
|
|
EINTR should not happen when using non-blocking sockets like we
do in our daemons, but maybe some OSes allow it to happen and
edge-triggered notifications won't notify us again.
So always retry immediately on EINTR without relying on kqueue
or epoll to notify us, and log any other unrecoverable errors
which may happen while we're at it.
|
|
We rely on DS to do waitpid with WNOHANG, now, and the non-DS
code path won't use WNOHANG.
|
|
Rename the {cleanup} field to {end}, since it's similar
to END {} and is consistent with the variable in Qspawn.pm
And document how certain subs get called, since we have
many subs named "new" and "close".
|
|
I didn't know PerlIO::scalar existed until a few months ago,
but it's been distributed with Perl since 5.8 and doesn't
seem to be split out into it's own package on any distro.
|
|
While we're usually not stuck waiting on waitpid after
seeing a pipe EOF or even triggering SIGPIPE in the process
(e.g. git-http-backend) we're reading from, it MAY happen
and we should be careful to never hang the daemon process
on waitpid calls.
v2: use "eq" for string comparison against 'DEFAULT'
|
|
We need to ensure the BIN_DETECT (8000 byte) check in
ViewVCS can be handled properly when sending giant
files. Otherwise, EPOLLET won't notify us, again,
and responses can get stuck.
While we're at it, bump up the read-size up to 4096
bytes so we make fewer trips to the kernel.
|
|
Linux pipes default to 65536 bytes in size, and we want to read
external processes as fast as possible now that we don't use
Danga::Socket or buffer to heap.
However, drop the buffer ASAP if we need to wait on anything;
since idle buffers can be idle for eons. This lets other
execution contexts can reuse that memory right away.
|
|
Integer comparisions of "$!" are faster than hash lookups.
See commit 6fa2b29fcd0477d126ebb7db7f97b334f74bbcbc
("ds: cleanup Errno imports and favor constant comparisons")
for benchmarks.
|
|
It wasn't immediately obvious to me after several months of
not looking at this code.
|
|
These modules are unmaintained upstream at the moment, but I'll
be able to help with the intended maintainer once/if CPAN
ownership is transferred. OTOH, we've been waiting for that
transfer for several years, now...
Changes I intend to make:
* EPOLLEXCLUSIVE for Linux
* remove unused fields wasting memory
* kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615
* accept4 support
And some lower priority experiments:
* switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes)
* nginx-style buffering to tmpfile instead of string array
* sendfile off tmpfile buffers
* io_uring maybe?
|
|
This allows users to configure RLIMIT_{CORE,CPU,DATA} using
our "limiter" config directive when spawning external processes.
|
|
This will become critical for future changes to display
git commits, diffs, and trees.
Use "qspawn.wcb" instead of "qspawn.response" to enhance
readability.
|
|
The raw value of $? isn't very useful, generally.
|
|
This new asynchronous API, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.
|
|
|
|
This is intended for wrapping "git show" and "git diff"
processes in the future and to prevent it from monopolizing
callers.
This will us to better handle backpressure from gigantic
commits.
|