Date | Commit message (Collapse) |
|
Repeatedly checking $PATH for `git' when we need to call it
multiple times in quick succession doesn't seem useful. So
avoid some expensive stat(2) syscalls to make things less bad
for systems which require expensive CPU vulnerability
mitigations.
This also saves a bunch of memory allocations since we do the
$PATH lookup in pure Perl to avoid doing the uncacheable lookup
in a vfork-ed child.
|
|
There are still some places where on_destroy isn't suitable,
This gets rid of getpid() calls in most of those cases to
reduce syscall costs and cleanup syscall trace output.
|
|
Inbox names, coderepo nicks, git_dir values are used heavily
as hash keys by the read-only coderepo WWW pieces.
Relying on CoW for mutable scalars on newer Perl doesn't work
well since CoW for those scalars are limited to 256 CoW references
and blow past that number when mapping thousands of coderepos
and inboxes to each other. Instead, make the hash key up-front
and get the resulting string to point directly to the pointer
used by the hash key.
|
|
This will make it more effective for use as a cache key.
I'm not entirely happy with this sub being in the Git module
since it's used by lei and command-line tools, but that's
for another day to deal with...
|
|
I'm not sure how this happens (perl 5.34.1 on FreeBSD 13.2)
but it appears the {sock} check can succeed and then go undef
and become unable to call ->owner_pid.
This happens when libgit2 is in use, so perhaps that's a factor.
In any case, the rest of the tests succeed.
|
|
This fixes t/mda.t with git 1.8.5
|
|
It saves some code in case we keep libgit2 around.
|
|
This becomes noticeable when loading lots of coderepos on
my local mirror of git.kernel.org now that we can load repos
from cindex.
|
|
Only worktrees need to use `git rev-parse --git-path', so avoid
the spawn overhead of a new process. With the SolverGit.pm
limit on coderepo scans disabled and scanning over 800 git repos
for git@vger matches, this reduces up xt/solver.t times by
roughly 25%.
|
|
We don't want hundreds of git cat-file processes for coderepos
lingering around.
|
|
While the {inflight} array should be tied to the IO object even
more tightly, that's not an easy task with our current code. So
take some small steps by introducing a gcf_inflight helper to
validate the ownership of the process and to drain the inflight
array via the awaitpid callback.
This hopefully fix problems with t/lei-q-save.t (still) hanging
occasionally on v2 outputs since git->cleanup/->DESTROY was getting
called in v2 shard workers.
|
|
The long-term plan is to share non-blocking read buffering logic
with HTTP/NNTP/IMAP/POP3 and also XapClient.
|
|
I encountered the odd lack of `return' while chasing Gcf2 bugs
on CentOS 7.x which resulted in commit 7d06b126e939
("gcf2: fix autodie usage for older Perl") and commit e618c7654794
("gcf2client: add alias for PublicInbox::Git::fail") before
realizing the lack of `return' here wasn't the culprit behind
failures on CentOS 7.x.
However, the use of a `return' here appears required in case we
actually hit the error path, since falling through and
attempting my_readline with an undefined filehandle is always a
failure.
Fixes: e97a30e7624d ("lei: fix SIGPIPE on large result sets to pager")
|
|
When dealing with large search results, we need to deal with
EPIPE not just from the pager, but also EPIPE or ECONNRESET
between lei_xsearch and lei2mail processes.
Without this fix, lei_xsearch processes could linger and get
stuck writing to dead lei2mail processes if a user aborts the
pager early during a large result set.
To ensure lei_xsearch processes don't linger around after
lei2mail workers all die, we must close $l2m->{-wq_s2} before
spawning lei_xsearch processes, since $l2m->{-wq_s2} is only
used in lei2mail workers.
For `git cat-file' processes, we also need to trigger
PublicInbox::Git->close to handle unpredictable destructor
ordering to avoid using uninitialized IO refs. This combines
with the `git_to_mail' change to deal with process cleanup
handling from premature shutdowns.
To test all this, we can't just rely on a single message being
large, but also need to rely on the result set being large
enough to saturate the lei_xsearch -> lei2mail socket so we
rely on GIANT_INBOX_DIR once again.
|
|
The IO package seems like a better home for I/O subs than the
Git package. We lose the 60 second read timeout for `git
cat-file --batch-*' processes since it's probably not necessary
given how reliable the code has proven and things would fall
over hard in other ways if the storage device were completely
hosed.
|
|
This fixes two major problems with the use of tie for filehandles:
* no way to do fcntl, stat, etc. calls directly on the tied handle,
forcing callers to use the `tied' perlop to access the underlying
IO::Handle
* needing separate classes to handle blocking and non-blocking I/O
As a result, Git->cleanup_if_unlinked, InputPipe->consume,
and Qspawn->_yield_start have fewer bizzare bits and we
can call `$io->blocking(0)' directly instead of
`(tied *$io)->{fh}->blocking(0)'
Having a PublicInbox::IO class will also allow us to support
custom read buffering which allows inspecting the current state.
|
|
It's easier-to-read and should open the door for us to get rid
of `tie' for ProcessIO without performance penalties for
more frequently-used perlop calls and ability to do `stat' directly
on the object instead of the awkward `tied' thing.
|
|
This is necessary to reliably cleanup cat-file processes for
coderepos in long-lived -netd and -httpd processes if they
haven't been accessed in a while.
Followup-to: 33e99002c552 (git: cleanup un-associated coderepo processes)
|
|
No sane installer will update executable files in place due to
ETXTBSY on execve. So save ourselves a stat(2) call by relying
on the special `CORE::stat(_)' case to reuse the cached result
from the `-x FILE' filetest in which().
|
|
It exists, now, so save us a few lines of code.
|
|
It's possible to have many coderepos with no inbox association
that never see git->cleanup. So instead of tying git->cleanup
to inboxes, ensure it gets armed when ->watch_async is called
(since it's only called in our -netd or -httpd servers).
|
|
While uncommon, some git repos have hundreds of thousands of
refs and slurping that output into memory can bloat the heap.
Introduce a sha_all sub in PublicInbox::SHA to loop until EOF
and rely on autodie for checking sysread errors.
|
|
This is similar to `backtick` but supports all our existing spawn
functionality (chdir, env, rlimit, redirects, etc.). It also
supports SCALAR ref redirects like run_script in our test suite
for std{in,out,err}.
We can probably use :utf8 by default for these redirects, even.
|
|
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.
|
|
This makes it easier to improve error checking, since the
`do { local $/; readline(FH) }' construct does not detect
errors (autodie does not cover `readline' or `<FH>').
I'm not sure exactly where this should be, but PublicInbox::Git
is used nearly everywhere in our code base and it's probably
not worth creating a new package for it.
|
|
This is required for reliable epoll/kevent/poll/select
wakeup notifications, since we have no visibility into
the buffer states used internally by Perl.
We can safely use sysread here since we never use the :utf8
nor any :encoding Perl IO layers for readable pipes.
I suspect this fixes occasional failures from t/solver_git.t
when retrieving the WwwCoderepo summary.
|
|
Since we deal with pipes (of either direction) and bidirectional
stream sockets for this class, it's better to remove the `Pipe'
from the name and replace it with `IO' to communicate that it
works for any form of IO::Handle-like object tied to a process.
|
|
There's not much we can do about this aside from just ignoring
errors and considering un-stat-able files as zero-sized.
There's no syscalls which expose FUSE3 `readdirplus' type
functionality to userspace to avoid this problem.
|
|
We can use autodie for socketpair to handle errors for us,
but we need Time::HiRes::stat so we must write the error message
ourselves if stat-ing the git executable fails.
|
|
Instead of using ->requeue to emulate level-triggered wakeups in
userspace, just use level-triggered wakeups in the kernel to
save some user time at the expense of system (kernel) time. Of
course, the ready list implementation in the kernel via C is
faster than a Perl one on our end.
We must still use requeue if we've got buffered data, however.
Followup-to: 1181a7e6a853 (listener: switch to level-triggered epoll)
|
|
The benefit of 1MB potential pipe buffer size (on Linux) doesn't
seem noticeable when reading from git (unlike when writing to v2
shards), so Unix stream sockets seem fine, here.
This allows us to simplify our process management by using the
same socket FD for reads and writes and enables us to use our
ProcessPipe class for reaping (as we can do with Gcf2Client).
Gcf2Client no longer relies on PublicInbox::DS for write
buffering, and instead just waits for requests to complete
once the number of inflight requests hits the MAX_INFLIGHT
threshold as we do with PublicInbox::Git.
We reuse the existing MAX_INFLIGHT limit (18) that was
determined by the minimum allowed PIPE_BUF (512). (AFAIK) Unix
stream sockets have no analogy to PIPE_BUF, but all *BSDs and
Linux I've checked have default SO_RCVBUF and SO_SNDBUF values
larger than the previously-required PIPE_BUF size of 512 bytes.
|
|
While pipes guarantee writes of <= 512 bytes to be atomic,
Unix stream sockets (or TCP sockets) have no such guarantees.
Removing the pipe assumption will make it possible for us to
switch to bidirectional Unix stream sockets and save FDs with
`git cat-file' processes as we have with Gcf2Client. The
performance benefit of larger pipe buffers over stream sockets
isn't irrelevant when interacting with git as it is with
SearchIdx shards.
|
|
We haven't used _bidi_pipe idempotently in a while, so
the stderr was never getting reset on reads.
This isn't fully useful when using async eeeae20893a25956
(imap: use git-cat-file asynchronously, 2020-06-10)
So instead of truncating it on reads, we'll truncate
immediately after reading and rely on O_APPEND to keep
new writes at the end.
Fortunately, this stderrr error checking isn't used
outside of solver (which is synchronous).
|
|
Unlike C, Perl automatically converts quotients to double-precision
floating point even with UV/IV numerators and denominators. So
force the intermediate quotient to be an integer before
multiplying it by the size of each inflight array element.
This bug was inconsequential for all platforms since d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)
and inconsequential on most (or all?) Linux even before that due
to the larger 4096-byte PIPE_BUF on Linux.
|
|
This simplifies Git->cat_async_step and fixes Git->async_abort,
the latter of which was passing arguments improperly for the
--batch-check (or `info') case at the cost of making the few
check_async callers handle an extra argument.
The extra (PublicInbox::Git) $self argument for check_async
callbacks is now gone, as avoiding the temporary cyclic
reference doesn't seem worthwhile since the temporary cyclic
reference appears in the ->cat_async code paths, too.
|
|
This special support is only needed for --prune at the moment
since the indexing side works on a per-repo basis. There's no
automated tests, yet, but it seems to work well on my sha256
projects when sharing a cindex with sha1 projects.
|
|
This saves a few milliseconds per-epoch without incurring
any dependencies on the event loop. It can be parallelized
further, of course, but it may not be worth it for -extindex
users since it's already cached.
|
|
In case it gets confused with Inbox->version or similar.
|
|
This improves readability for me. Instead of checking for `info '
requests of `--batch-command' in multiple places of every
common branch, do it once per-call and stash its result.
We'll also avoid storing `$bc' for now since the only other
check is in a cold path.
|
|
Retrying requests on alternates changing was causing inflight
requests to get lost due to {inflight} getting clobbered by
batch_prepare.
Unfortunately, reproducing this is difficult without mocking
->alternates_changed.
SearchIdx now avoids calling ->batch_prepare directly and
relies on more common API functions.
Fixes: 65db62eb006f ("git: use --batch-command in git 2.36+ to save processes")
|
|
There's too many require_git callsites in t/*.t to change,
but we can make the rest of the code more readable and reuse
PublicInbox::Git::version() in our test suite, too.
|
|
While unlikely, `git --version' may fail, so we must check for
errors and by reaping the process ASAP via tied close().
|
|
This avoids unnecessary writes to the FETCH_HEAD file, which is
worthless in multi-remote mirrors. Actually, I haven't found
FETCH_HEAD useful anywhere since the `/remotes/' namespace
became popular...
|
|
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as
the Digest::SHA implementation from Perl, most likely due to an
optimized assembly implementation. SHA-1 is a few percent
faster, too.
|
|
`ambiguous' was added in git 2.21, and `dangling' was the only
other possible phrase which was inadvertantly slipped in prior
to 2.21. Thus there's no need to check for `notdir' or `loop'
responses since we aren't using `git cat-file --follow-symlinks'
anywhere.
|
|
`git cat-file --batch-command' combines the functionality of
`--batch' and `--batch-check' into a single process. This
reduces the amount of running processes and is primarily
useful for coderepos (e.g. solver).
This also fixes prior use of `print { $git->{out} }' which is
a a potential (but unlikely) bug since commit d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)
Lack of libgit2 on one of my test machines also uncovered fixes
necessary for t/imapd.t, t/nntpd.t and t/nntpd-v2.t.
|
|
We can avoid some extra returns and branches by just relying on
variadic arguments.
|
|
I imported it in commit 356439a571c536eaa487031802b436d087113f4f
(gcf2 + extsearch: check for unlinked files on Linux, 2021-09-22)
but never used it.
|
|
While most uses of ->DESTROY happens in a predictable order in
long-lived daemons, process teardown on exit is chaotic and not
subject to ordering guarantees, so we must keep both ends of a
`git cat-file --batch*' pipe at the same level in the object
hierarchy.
Drop an old Carp import while I'm in the area.
|
|
This makes it easier to support SHA-256 inboxes in the future.
Tested with both git 2.30.2 (Debian stable) and 2.39.1
|