Date | Commit message (Collapse) |
|
It saves some code in case we keep libgit2 around.
|
|
At least on Perl v5.16.3 on CentOS 7.x, use-ing autodie within
BEGIN {} affects all subroutines in that package, too. So just
use autodie at the top-level and rely on CORE::* and try_cat
to handle cases where autodie isn't desired.
|
|
read_all can be expanded to support FIFOs/pipes/sockets where
read-until-EOF behavior is desired. We can also rely on
wantarray to support splitting on EOL markers, but it's
hard-coded to support only `$/ eq "\n"' since (AFAIK)
it's the only way we use the wantarray form `readline'.
|
|
The IO package seems like a better home for I/O subs than the
Git package. We lose the 60 second read timeout for `git
cat-file --batch-*' processes since it's probably not necessary
given how reliable the code has proven and things would fall
over hard in other ways if the storage device were completely
hosed.
|
|
readline (<FH>) isn't wrapped by autodie, and there's no
way to know if read(2) errors truncated the readline output.
IO::Handle->error isn't reliable on Perl < v5.34.
Thus, combining the `eof' and `close' (combined with autodie) is
the only way we can detect read(2) errors (injected via strace)
when called via `readline' (aka <$fh>). Neither using `eof'
nor `close' alone is sufficient, they must be combined to detect
errors from buffered `readline'.
|
|
We can use run_qx and try_cat to make the build setup simpler.
|
|
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.
|
|
It auto-retries on EINTR and saves us the trouble of doing so.
|
|
This simplifies much of our code since much of it is
error-handling.
|
|
We need to force Inline::C to rebuild if libgit2 is updated;
otherwise dynamic linking can be broken. Adding the output
from the `--modversion' of pkg-config(1) along with the existing
`--libs' and `--cflags' output seems appropriate for this task.
To force Inline::C into a rebuild, neither CFLAGSEX nor CPPFLAGS
changes are enough. Modifying the source string and adding
comments seems like the most obvious way to force a rebuild.
The `-print-file-name=LIBRARY' feature from gcc+clang could also
be used, but that requires parsing the library name from
`pkg-config --libs' output into a library basename appropriate
for `-print-file-name='. IOW, we'd need to transform:
`-lgit2' => `libgit2.so'; and possibly deal with platforms
which deal with static libraries in the future.
So just use pkg-config, since `pkg-config --modversion' is
roughly 2-3x as fast as `gcc-10 -print-file-name=', and
10-20x faster than clang-11.
|
|
This quiets down tests when the optional Inline::C is missing.
We do not currently have a hard dependency on Inline::C; and we
should not leave PERL_INLINE_DIRECTORY set in PublicInbox::Spawn
if Inline fails to build.
Leaving PERL_INLINE_DIRECTORY set by Spawn after it fails (due
to missing Inline::C) would cause downstream failures in Gcf2
builds for the same reason. So we should bail out of the Gcf2
build early if Spawn already failed due to missing Inline::C.
The only time we want to be noisy is if a user explicitly sets
PERL_INLINE_DIRECTORY and Inline::C is missing.
This reverts commit ad8acf7d6484d0a489499742cadadbd4f890ab53.
ad8acf7d6484d0a4 (Gcf2: Create cache folder if missing, 2022-09-08)
|
|
This is like more familiar to readers of TAP (Test Anywhere
Protocol) output, as well as shell and Perl scripters which also
use `#' for comments.
AFAIK, nobody is parsing our stderr, and I'm not sure how
standardized the `I:' prefix is (nor `W:' and `E:' are). It's
already the prevailing style in Lei* code, too, so things have
been moving in that direction for a bit.
|
|
I failed to notice these since I uninstalled libgit2 for
benchmarking and kept it uninstalled since my git(1) install
is faster.
Fixes: 1c0ec857d041 "gcf2: support worktree $GIT_DIR"
|
|
We must use `git rev-parse --git-path objects' instead of
blindly appending '/objects' to $GIT_DIR, since appending
doesn't work when $GIT_DIR is a worktree.
|
|
The code expects that the folder is already present, this patch creates
it if missing.
Without this path the test fails with:
open(/home/debci/.cache/public-inbox/inline-c/.public-inbox.lock): No such file or directory at /usr/share/perl5/PublicInbox/Gcf2.pm line 20
Signed-off-by: Ricardo Ribalda <ricardo@ribalda.com>
|
|
We were misusing the timer and not expiring it before checking
for unlinked files. Now, we check for unlinked files every 60s,
instead.
|
|
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s
or so.
ExtSearch (extindex) is compatible-enough with Inbox objects to
be wired into the old per-inbox code, but the startup cost is
projected to be much higher down the line when there's >30K
inboxes, so we scan /proc/$PID/maps for deleted files before
unlinking. With old Inbox objects, it was (and is) simpler to
just kill processes w/o checking due to the low startup cost
(and non-portability of checking).
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210921144754.gulkneuulzo27qbw@meerkat.local/
|
|
We need to waitpid synchronously on pkg-config to use $?.
When loading Gcf2 inside the event loop, implicit dwaitpid
done by PublicInbox::ProcessPipe would not call waitpid in
time to zero $?. This was causing one of my -httpd to
occasionally fall back to git(1) instead of using Gcf2.
This was noted in:
Link: https://public-inbox.org/meta/20210914085322.25517-1-e@80x24.org/
|
|
I'm not sure why, but I noticed the one of my latest restarts of
public-inbox-httpd wasn't loading the Inline::C .so for Gcf2 nor
Spawn. I also can't reproduce the problem as both .so files are
loaded fine on a restart with zero config changes.
In any case, some extra, automatic diagnostics for build errors
won't hurt, as no extra noise is introduced for successful builds.
This will also make future development of C code more convenient,
hopefully.
|
|
We can't link properly to libgit2 without pkg-config telling
us which libraries and headers to use.
|
|
And note the PublicInbox::Spawn side effect of setting
PERL_INLINE_DIRECTORY.
|
|
While Gcf2Client is designed to mimic what git-cat-file writes
to stdout, its request format is different to support requests
with a git repository path included.
We'll highlight the distinction and make the GitAsyncCat support
code easier-to-follow as a result.
Since Gcf2Client relies on DS, we can rely on DS-specific code
here, too, and use a single Unix socket instead of separate
input and output pipes, reducing memory overhead in both users
and kernel space. Due to the interactive nature of requests and
responses, the buffer size limitations of Unix sockets on Linux
seems inconsequential here (just like it is for existing "git
cat-file --batch" use).
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Reading from regular files (even on STDIN) can fail
when dealing with flakey storage.
|
|
While libgit2 handles alternates with relative paths properly
for v2 epochs; nesting them another layer with extindex uses
the wrong relative path expansion (and is inconsistent with
git(1) behavior).
Fortunately, it's possible to work around this libgit2 bug
entirely within Gcf2 and avoid further special cases throughout
the rest of our code to support extindex.
Link: https://bugs.debian.org/975607
|
|
It seems easiest to have a singleton Gcf2Client client object
per daemon worker for all inboxes to use. This reduces overall
FD usage from pipes.
The `public-inbox-gcf2' command + manpage are gone and a `$^X'
one-liner is used, instead. This saves inodes for internal
commands and hopefully makes it easier to avoid mismatched
PERL5LIB include paths (as noticed during development :x).
We'll also make the existing cat-file process management
infrastructure more resilient to BOFHs on process killing
sprees (or in case our libgit2-based code fails on us).
(Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd
won't automatically benefit from this change, and extra
configuration will be required (to be documented later).
|
|
Hopefully this allows others to more quickly figure out what's
going on.
|
|
Having tens of thousands of inboxes and associated git processes
won't work well, so we'll use libgit2 to access the object DB
directly. We only care about OID lookups and won't need to rely
on per-repo revision names or paths.
The Git::Raw XS package won't be used since its manpages don't
promise a stable API. Since we already use Inline::C and have
experience with I::C when it comes to compatibility, this only
introduces libgit2 itself as a source of new incompatibilities.
This also provides an excuse for me to writev(2) to reduce
syscalls, but liburing is on the horizon for next year.
|