about summary refs log tree commit homepage
path: root/lib/PublicInbox/Gcf2Client.pm
DateCommit message (Collapse)
2023-11-26git: improve coupling with {sock} and {inflight} fields
While the {inflight} array should be tied to the IO object even more tightly, that's not an easy task with our current code. So take some small steps by introducing a gcf_inflight helper to validate the ownership of the process and to drain the inflight array via the awaitpid callback. This hopefully fix problems with t/lei-q-save.t (still) hanging occasionally on v2 outputs since git->cleanup/->DESTROY was getting called in v2 shard workers.
2023-11-26git: move rbuf handling to PublicInbox::IO
The long-term plan is to share non-blocking read buffering logic with HTTP/NNTP/IMAP/POP3 and also XapClient.
2023-11-15gcf2client: add alias for PublicInbox::Git::fail
Ensure we can ->fail properly from other subs we can within Gcf2Client. This doesn't fix the test failures on CentOS 7.x, but tries to make it easier to fix underlying problems and report OOM errors and other things which the test suite doesn't touch on.
2023-11-03replace ProcessIO with untied PublicInbox::IO
This fixes two major problems with the use of tie for filehandles: * no way to do fcntl, stat, etc. calls directly on the tied handle, forcing callers to use the `tied' perlop to access the underlying IO::Handle * needing separate classes to handle blocking and non-blocking I/O As a result, Git->cleanup_if_unlinked, InputPipe->consume, and Qspawn->_yield_start have fewer bizzare bits and we can call `$io->blocking(0)' directly instead of `(tied *$io)->{fh}->blocking(0)' Having a PublicInbox::IO class will also allow us to support custom read buffering which allows inspecting the current state.
2023-10-08rename ProcessPipe to ProcessIO
Since we deal with pipes (of either direction) and bidirectional stream sockets for this class, it's better to remove the `Pipe' from the name and replace it with `IO' to communicate that it works for any form of IO::Handle-like object tied to a process.
2023-10-01treewide: enable warnings in all exec-ed processes
While forked processes inherit from the parent, exec-ed processes need the `-w' flag passed to them. To determine whether or not we should pass them, we must check the `$^W' global perlvar, first. We'll also favor `perl -e' over `perl -E' in places where we don't rely on the latest features, since `-E' incurs slightly more startup time overhead from loading feature.pm (while `perl -Mv5.12' does not).
2023-10-01gcf2: take non-ref scalar request arg
Asking callers to pass a scalar reference is awkward and doesn't benefit modern Perl with CoW support. Unlike some constant error messages, it can't save any allocations at all since there's no constant strings being passed to libgit2.
2023-10-01git+gcf2client: switch to level-triggered wakeups
Instead of using ->requeue to emulate level-triggered wakeups in userspace, just use level-triggered wakeups in the kernel to save some user time at the expense of system (kernel) time. Of course, the ready list implementation in the kernel via C is faster than a Perl one on our end. We must still use requeue if we've got buffered data, however. Followup-to: 1181a7e6a853 (listener: switch to level-triggered epoll)
2023-10-01git: use Unix stream sockets for `cat-file --batch-*'
The benefit of 1MB potential pipe buffer size (on Linux) doesn't seem noticeable when reading from git (unlike when writing to v2 shards), so Unix stream sockets seem fine, here. This allows us to simplify our process management by using the same socket FD for reads and writes and enables us to use our ProcessPipe class for reaping (as we can do with Gcf2Client). Gcf2Client no longer relies on PublicInbox::DS for write buffering, and instead just waits for requests to complete once the number of inflight requests hits the MAX_INFLIGHT threshold as we do with PublicInbox::Git. We reuse the existing MAX_INFLIGHT limit (18) that was determined by the minimum allowed PIPE_BUF (512). (AFAIK) Unix stream sockets have no analogy to PIPE_BUF, but all *BSDs and Linux I've checked have default SO_RCVBUF and SO_SNDBUF values larger than the previously-required PIPE_BUF size of 512 bytes.
2023-01-18git|gcf2: switch to awaitpid
This is a trivial change compared to Qspawn in the previous commit.
2021-10-08git: async_abort includes --batch-check requests
We need to abort both check-only and cat requests when aborting, since we'll be aborting more aggressively in in read-write paths.
2021-10-01ds: simplify signalfd use
Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-01-03gcf2client: split out request API from regular git
While Gcf2Client is designed to mimic what git-cat-file writes to stdout, its request format is different to support requests with a git repository path included. We'll highlight the distinction and make the GitAsyncCat support code easier-to-follow as a result. Since Gcf2Client relies on DS, we can rely on DS-specific code here, too, and use a single Unix socket instead of separate input and output pipes, reducing memory overhead in both users and kernel space. Due to the interactive nature of requests and responses, the buffer size limitations of Unix sockets on Linux seems inconsequential here (just like it is for existing "git cat-file --batch" use).
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01avoid calling waitpid from children in DESTROY
Objects with DESTROY callbacks get propagated to children, so we must be careful to not invoke waitpid from children on their sibling processes. Only parents (and their parents...) can reap child processes.
2021-01-01use PublicInbox::DS for dwaitpid
This simplifies our code and provides a more consistent API for error handling. PublicInbox::DS can be loaded nowadays on all *BSDs and Linux distros easily without extra packages to install. The downside is possibly increased startup time, but it's probably not as a big problem with lei being a daemon (and -mda possibly following suite).
2021-01-01gcf2client: reap process on DESTROY
We don't want to leave Xapcmd waitpid(-1, ...) call to hit it.
2020-09-28gcf2: improve error handling and do not ->fail on wbuf
For historical reasons, both Danga::Socket::write and PublicInbox::DS::write will return 0 when data is buffered; so Gcf2Client must not call ->fail when DS::write returns 0. We'll also improve robustness by recreating the entire Gcf2Client object if it does die for other reasons, instead of risking mismatched fields due to deferred close. We also need to ensure we only get one EPOLLERR wakeup and issue EPOLL_CTL_DEL if ->event_step is triggered by a dying Gcf2 process, so always register the FD with EPOLLONESHOT.
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2: require git dir with OID
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git.
2020-09-19gcf2*: more descriptive package descriptions
Hopefully this allows others to more quickly figure out what's going on.
2020-09-19gcf2: transparently retry on missing OID
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens.
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.