about summary refs log tree commit homepage
path: root/lib/PublicInbox/Qspawn.pm
DateCommit message (Collapse)
2023-11-03replace ProcessIO with untied PublicInbox::IO
This fixes two major problems with the use of tie for filehandles: * no way to do fcntl, stat, etc. calls directly on the tied handle, forcing callers to use the `tied' perlop to access the underlying IO::Handle * needing separate classes to handle blocking and non-blocking I/O As a result, Git->cleanup_if_unlinked, InputPipe->consume, and Qspawn->_yield_start have fewer bizzare bits and we can call `$io->blocking(0)' directly instead of `(tied *$io)->{fh}->blocking(0)' Having a PublicInbox::IO class will also allow us to support custom read buffering which allows inspecting the current state.
2023-10-25qspawn: simplify internal argument passing
Now that psgi_return is gone, we can further simplify our internals to support only psgi_qx and psgi_yield. Internal argument passing is reduced and we keep the command env and redirects in the Qspawn object for as long as it's alive. I wanted to get rid of finalize() entirely, but it seems trickier to do when having to support generic PSGI.
2023-10-25qspawn: use WwwStatic for fallbacks and error code
This ensures we set directives to disable caching since errors are always transient.
2023-10-25drop psgi_return, httpd/async and GetlineBody
Now that psgi_yield is used everywhere, the more complex psgi_return and it's helper bits can be removed. We'll also fix some outdated comments now that everything on psgi_return has switched to psgi_yield. GetlineResponse replaces GetlineBody and does a better job of isolating generic PSGI-only code.
2023-10-25qspawn: introduce new psgi_yield API
This is intended to replace psgi_return and HTTPD/Async entirely, hopefully making our code less convoluted while maintaining the ability to handle slow clients on memory-constrained systems This was made possible by the philosophy shift in commit 21a539a2df0c (httpd/async: switch to buffering-as-fast-as-possible, 2019-06-28). We'll still support generic PSGI via the `pull' model with a GetlineResponse class which is similar to the old GetlineBody.
2023-10-25qspawn: drop unused err arg for ->event_step
It's no longer needed since psgi_qx doesn't use a pipe, anymore.
2023-10-25qspawn: psgi_return allows list for callback args
This slightly simplifies our GitHTTPBackend wrapper. We can also use shorter variable names to avoid wrapping some lines.
2023-10-25psgi_qx: use a temporary file rather than pipe
A pipe requires more context switches, syscalls, and code to deal with unpredictable pipe EOF vs waitpid ordering. So just use the new spawn/aspawn features to automatically handle slurping output into a string.
2023-10-25limiter: split out from qspawn
It's slightly better organized this way, especially since `publicinboxLimiter' has its own user-facing config section and knobs. I may use it in LeiMirror and CodeSearchIdx for process management.
2023-10-08process_io: pass args to awaitpid as list
Specifying {cb_args} in the options hash felt awkward to me. Instead, just use the Perl stack like we do with awaitpid() and pass the list down directly.
2023-10-08rename ProcessPipe to ProcessIO
Since we deal with pipes (of either direction) and bidirectional stream sockets for this class, it's better to remove the `Pipe' from the name and replace it with `IO' to communicate that it works for any form of IO::Handle-like object tied to a process.
2023-01-24qspawn: drop lineno from command failure warning
git, cgit, or any other command failing isn't an error we can do anything about in qspawn, so don't have Perl emit line number info and needlessly pollute logs.
2023-01-19qspawn: drop unnecessary awaitpid import
We don't actually need to call awaitpid here, ProcessPipe will take care of that.
2023-01-19qspawn: psgi_qx: do not call async_pass on errors
This makes control flow slightly less confusing.
2023-01-19qspawn: {quiet} only affects normal command exit
{quiet} is nice for quieting normal/expected errors (e.g `git diff'), but we still want to show the command in case there's errors in our own code.
2023-01-18qspawn: use ->DESTROY to force ->finalize
There's apparently a few places where we do not call ->finalize or ->finish and leave dangling limiter slots occupied. I can't reproduce this easily, so it's likely in error-handling paths. I already made ->finalize idempotent when switching to awaitpid since I wanted to rely entirely on DESTROY. However, DESTROY doesn't always fire soon enough (and the client has already seen a response), but using DESTROY as a fallback seems reasonable.. This does the minimum to ensure the limiter is freed up on process exit, but ensuring a finish/finalize call always happens is the goal.
2023-01-18ds: introduce awaitpid, switch ProcessPipe users
awaitpid is the new API which will eventually replace dwaitpid. It enables early registration of callback handlers. Eventually (once dwaitpid is gone) it'll be able to use fewer waitpid calls. The avoidance of waitpid(-1) in our earlier days was driven by the belief that threads may eventually become relevant for Perl 5, but that's extremely unlikely at this stage. I will still introduce optional threads via C, but they definitely won't be spawning/reaping processes. Argument order to callbacks is swapped (PID first) to allow flattened multiple arguments more natrually. The previous API (allowing only a single argument, as influenced by pthread_create(3)) was more tedious as it involved packing multiple arguments into yet another array.
2023-01-18qspawn: drop {psgi_env} deref
We don't use the assigned variable anywhere, and just access PATH_INFO directly in the subsequent warning message.
2023-01-13qspawn: import Scalar::Util::blessed properly
Scalar::Util may not be loaded by other modules in the future.
2023-01-06qspawn: use Perl 5.12 and rely on `perl -w' for warnings
Another step towards making our startup performance faster.
2023-01-06qspawn: fix EINTR with generic PSGI servers
Using the `next' operator doesn't work with `do {} (until|while)' loops, so change it to use `until {}'. I've never encountered this problem in-the-wild, but I only use -(netd|httpd).
2023-01-06qspawn: consistently return 500 on premature EOF
If {parse_hdr} callback doesn't handle it, we need to break the loop if the CGI process dies prematurely. This doesn't fix a currently known problem, but theoretically a SIGKILL could hit (cgit || git-http-backend) while -netd or -httpd survives.
2023-01-06httpd/async: retry reads properly when parsing headers
While git-http-backend sends headers with one write syscall, upstream cgit still trickles them out line-by-line and we need to account for that and retry Qspawn {parse_hdr} callbacks.
2023-01-06qspawn: use fallback response code from CGI program
Prefer to use the original (cgit||git-http-backend) HTTP response code if our fallback to WwwCoderepo fails. 404 codes is typically more appropriate than 500 for these things.
2023-01-04www_coderepo: implement /$CODE_REPO/atom/ endpoint
This should be similar or identical to what's in cgit; and tie into the rest of the www_coderepo stuff.
2023-01-02qspawn: fix process finalization for generic PSGI server
This fixes the inability to fallback to WwwCoderepo on cgit 404s with generic PSGI servers. Unfortunately, this doesn't seem to get tested with generic PSGI tests, and doesn't happen on public-inbox-httpd, obviously.
2022-12-27qspawn: more generic command chaining
Move the chaining logic into qspawn so we can gracefully try other commands when cgit or git-http-backend refuses to service a request for us.
2022-12-23httpd/async + qspawn: rename {fh} fields
Use more unique names within the project to minimize confusion since these packages interact quite a bit and using identical names leads to needless confusion.
2022-12-23qspawn: shorten life of {hdr_buf} in generic code path
No point in keeping the old buffer around if we don't need to.
2022-10-07www: cgit: fall back to WwwCoderepo on 404s
We can't rely on 3-element array response when calling WwwCoderepo for ViewVCS endpoints since that uses Qspawn internally. Thus, we have to allow two Qspawn objects to run in parallel and ensure `qspawn.wcb' only gets called once, so we end up duplicating the entire $ctx to ensure this.
2022-08-23qspawn: improve error reporting and handling
First off, avoid potential circular references (via {qx_arg}) by dropping the {-qsp} field from $ctx and SolverGit objects. Instead, we only share a reference to an optional error buffer string {qsp_err}. We'll also attempt to call qspawn.wcb if qx_cb fails, and warn in more places w/o checking for $env since we now rely on warn() instead of $env->{'psgi.errors'}. This makes error handling simpler and safer in future callers.
2022-08-23qspawn: add type comments in a few places
This makes things easier-to-follow in a minimally-typed language.
2021-10-16httpd/async: switch to level-triggered epoll
We'll save ourselves some code here and let the kernel do more work, instead.
2021-10-13treewide: use warn() or carp() instead of env->{psgi.errors}
Large chunks of our codebase and 3rd-party dependencies do not use ->{psgi.errors}, so trying to standardize on it was a fruitless endeavor. Since warn() and carp() are standard mechanism within Perl, just use that instead and simplify a bunch of existing code.
2021-10-01ds: simplify signalfd use
Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-01-02qspawn: switch to ProcessPipe via popen_rd
ProcessPipe has a built-in mechanism to prevent siblings from reaping children.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01use PublicInbox::DS for dwaitpid
This simplifies our code and provides a more consistent API for error handling. PublicInbox::DS can be loaded nowadays on all *BSDs and Linux distros easily without extra packages to install. The downside is possibly increased startup time, but it's probably not as a big problem with lei being a daemon (and -mda possibly following suite).
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-07-06qspawn: learn to gzip streaming responses
This will allow us to gzip responses generated by cgit and any other CGI programs or long-lived streaming responses we may spawn.
2020-07-02spawn: make @RLIMITS an array
Making the RLIMITS list a function doesn't allow constant folding, so just make it an array accessible to other modules.
2020-04-21qspawn: remove Perl 5.16.x leak workaround
It seems no longer necessary to workaround this Perl 5.16.3 bug after the removal of anonymous subs from all of our internal code in https://public-inbox.org/meta/20191225075104.22184-1-e@80x24.org/ Tested with repeated clones (both aborted and completed) in a CentOS 7.x VM which was once able to reproduce leaks before the workaround appeared in 2fc42236f72ad16a ("qspawn: workaround Perl 5.16.3 leak, re-enable Deflater") Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2020-03-30qspawn: capture errors from parse_hdr callback
User-supplied callbacks may fail, so capture the error instead of propagating it up the stack into the public-inbox-httpd event loop.
2020-03-25qspawn: handle ENOENT (and other errors on exec)
As sqlite3(1) and other executables may become unavailable or uninstalled while a daemon runs, we need to gracefully handle errors in those cases.
2020-03-25qspawn: reinstate filter support, add gzip filter
We'll be supporting gzipped from sqlite3(1) dumps for altid files in future commits. In the future (and if we survive), we may replace Plack::Middleware::Deflater with our own GzipFilter to work better with asynchronous responses without relying on memory-intensive anonymous subs.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-28avoid relying on IO::Handle/IO::File autoload
Perl 5.14+ gained the ability to autoload IO::File (and IO::Handle) on missing methods, so relying on this breaks under 5.10.1. There's no reason to load IO::File or IO::Handle when built-in perlops work fine and are even a hair faster.
2020-01-09qspawn: catch transient errors on pipe, EPOLL_CTL_ADD
popen_rd dies on pipe()/pipe2() failure due to FD exhaustion. EPOLL_CTL_ADD (via PublicInbox::HTTPD::Async->new) may also fail due to memory exhaustion or exceeding the value of /proc/sys/fs/epoll/max_user_watches
2020-01-03qspawn: use per-call quiet flag for solver
solver can spawn multiple processes per HTTP request, but "git apply" failures are needlessly noisy due to corrupt patches. We also don't want to silence "git ls-files" or "git update-index" errors using $env->{'qspawn.quiet'}, either, so this granularity is needed. Admins can check for 500 errors in access logs to detect (and reproduce) solver failures, anyways, so there's no need to log every time "git apply" rejects a corrupt patch.
2019-12-26qspawn: psgi_return: allow non-anon parse_hdr callback
Callers can supply an arg to parse_hdr, now, eliminating the need for closures to capture local variables.