about summary refs log tree commit homepage
path: root/lib/PublicInbox/Gcf2.pm
DateCommit message (Collapse)
2023-11-30git: share unlinked pack checking code with gcf2
It saves some code in case we keep libgit2 around.
2023-11-15gcf2: fix autodie usage for older Perl
At least on Perl v5.16.3 on CentOS 7.x, use-ing autodie within BEGIN {} affects all subroutines in that package, too. So just use autodie at the top-level and rely on CORE::* and try_cat to handle cases where autodie isn't desired.
2023-11-13treewide: update read_all to avoid eof|close checks
read_all can be expanded to support FIFOs/pipes/sockets where read-until-EOF behavior is desired. We can also rely on wantarray to support splitting on EOL markers, but it's hard-coded to support only `$/ eq "\n"' since (AFAIK) it's the only way we use the wantarray form `readline'.
2023-11-03move read_all, try_cat, and poll_in to PublicInbox::IO
The IO package seems like a better home for I/O subs than the Git package. We lose the 60 second read timeout for `git cat-file --batch-*' processes since it's probably not necessary given how reliable the code has proven and things would fall over hard in other ways if the storage device were completely hosed.
2023-11-03treewide: use eof and close to detect readline errors
readline (<FH>) isn't wrapped by autodie, and there's no way to know if read(2) errors truncated the readline output. IO::Handle->error isn't reliable on Perl < v5.34. Thus, combining the `eof' and `close' (combined with autodie) is the only way we can detect read(2) errors (injected via strace) when called via `readline' (aka <$fh>). Neither using `eof' nor `close' alone is sufficient, they must be combined to detect errors from buffered `readline'.
2023-10-28gcf2: simplify pkg-config and Inline::C setup
We can use run_qx and try_cat to make the build setup simpler.
2023-10-18use read_all in more places to improve safety
`readline' ops may not detect errors on partial reads. This saves us some code to reduce cognitive overhead for readers. We'll also support reusing a destination buffers so it can work more nicely with existing code.
2023-10-04gcf2: use PublicInbox::Lock
It auto-retries on EINTR and saves us the trouble of doing so.
2023-09-12gcf2: switch build phase to use autodie
This simplifies much of our code since much of it is error-handling.
2023-09-12gcf2: detect libgit2 version changes
We need to force Inline::C to rebuild if libgit2 is updated; otherwise dynamic linking can be broken. Adding the output from the `--modversion' of pkg-config(1) along with the existing `--libs' and `--cflags' output seems appropriate for this task. To force Inline::C into a rebuild, neither CFLAGSEX nor CPPFLAGS changes are enough. Modifying the source string and adding comments seems like the most obvious way to force a rebuild. The `-print-file-name=LIBRARY' feature from gcc+clang could also be used, but that requires parsing the library name from `pkg-config --libs' output into a library basename appropriate for `-print-file-name='. IOW, we'd need to transform: `-lgit2' => `libgit2.so'; and possibly deal with platforms which deal with static libraries in the future. So just use pkg-config, since `pkg-config --modversion' is roughly 2-3x as fast as `gcc-10 -print-file-name=', and 10-20x faster than clang-11.
2022-12-24cleanup pure Perl use
This quiets down tests when the optional Inline::C is missing. We do not currently have a hard dependency on Inline::C; and we should not leave PERL_INLINE_DIRECTORY set in PublicInbox::Spawn if Inline fails to build. Leaving PERL_INLINE_DIRECTORY set by Spawn after it fails (due to missing Inline::C) would cause downstream failures in Gcf2 builds for the same reason. So we should bail out of the Gcf2 build early if Spawn already failed due to missing Inline::C. The only time we want to be noisy is if a user explicitly sets PERL_INLINE_DIRECTORY and Inline::C is missing. This reverts commit ad8acf7d6484d0a489499742cadadbd4f890ab53. ad8acf7d6484d0a4 (Gcf2: Create cache folder if missing, 2022-09-08)
2022-10-24treewide: replace /^I: / prefix with /^# /
This is like more familiar to readers of TAP (Test Anywhere Protocol) output, as well as shell and Perl scripters which also use `#' for comments. AFAIK, nobody is parsing our stderr, and I'm not sure how standardized the `I:' prefix is (nor `W:' and `E:' are). It's already the prevailing style in Lei* code, too, so things have been moving in that direction for a bit.
2022-09-29gcf2: fix syntax error and require PublicInbox::Git
I failed to notice these since I uninstalled libgit2 for benchmarking and kept it uninstalled since my git(1) install is faster. Fixes: 1c0ec857d041 "gcf2: support worktree $GIT_DIR"
2022-09-26gcf2: support worktree $GIT_DIR
We must use `git rev-parse --git-path objects' instead of blindly appending '/objects' to $GIT_DIR, since appending doesn't work when $GIT_DIR is a worktree.
2022-09-08Gcf2: Create cache folder if missing
The code expects that the folder is already present, this patch creates it if missing. Without this path the test fails with: open(/home/debci/.cache/public-inbox/inline-c/.public-inbox.lock): No such file or directory at /usr/share/perl5/PublicInbox/Gcf2.pm line 20 Signed-off-by: Ricardo Ribalda <ricardo@ribalda.com>
2022-07-20gcf2: avoid excessive checks for unlinked files
We were misusing the timer and not expiring it before checking for unlinked files. Now, we check for unlinked files every 60s, instead.
2021-09-23gcf2 + extsearch: check for unlinked files on Linux
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s or so. ExtSearch (extindex) is compatible-enough with Inbox objects to be wired into the old per-inbox code, but the startup cost is projected to be much higher down the line when there's >30K inboxes, so we scan /proc/$PID/maps for deleted files before unlinking. With old Inbox objects, it was (and is) simpler to just kill processes w/o checking due to the low startup cost (and non-portability of checking). Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210921144754.gulkneuulzo27qbw@meerkat.local/
2021-09-20gcf2: fix loading at runtime
We need to waitpid synchronously on pkg-config to use $?. When loading Gcf2 inside the event loop, implicit dwaitpid done by PublicInbox::ProcessPipe would not call waitpid in time to zero $?. This was causing one of my -httpd to occasionally fall back to git(1) instead of using Gcf2. This was noted in: Link: https://public-inbox.org/meta/20210914085322.25517-1-e@80x24.org/
2021-09-14spawn+gcf2: improve diagnostics for build failures
I'm not sure why, but I noticed the one of my latest restarts of public-inbox-httpd wasn't loading the Inline::C .so for Gcf2 nor Spawn. I also can't reproduce the problem as both .so files are loaded fine on a restart with zero config changes. In any case, some extra, automatic diagnostics for build errors won't hurt, as no extra noise is introduced for successful builds. This will also make future development of C code more convenient, hopefully.
2021-09-10gcf2: die if pkg-config is missing
We can't link properly to libgit2 without pkg-config telling us which libraries and headers to use.
2021-01-29gcf2: rely on Perl 5.10 to avoid needless ++
And note the PublicInbox::Spawn side effect of setting PERL_INLINE_DIRECTORY.
2021-01-03gcf2client: split out request API from regular git
While Gcf2Client is designed to mimic what git-cat-file writes to stdout, its request format is different to support requests with a git repository path included. We'll highlight the distinction and make the GitAsyncCat support code easier-to-follow as a result. Since Gcf2Client relies on DS, we can rely on DS-specific code here, too, and use a single Unix socket instead of separate input and output pipes, reducing memory overhead in both users and kernel space. Due to the interactive nature of requests and responses, the buffer size limitations of Unix sockets on Linux seems inconsequential here (just like it is for existing "git cat-file --batch" use).
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-28check defined return value for localized slurp errors
Reading from regular files (even on STDIN) can fail when dealing with flakey storage.
2020-11-24gcf2: workaround libgit2 alternates bug for extindex
While libgit2 handles alternates with relative paths properly for v2 epochs; nesting them another layer with extindex uses the wrong relative path expansion (and is inconsistent with git(1) behavior). Fortunately, it's possible to work around this libgit2 bug entirely within Gcf2 and avoid further special cases throughout the rest of our code to support extindex. Link: https://bugs.debian.org/975607
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2*: more descriptive package descriptions
Hopefully this allows others to more quickly figure out what's going on.
2020-09-19gcf2: libgit2-based git cat-file alternative
Having tens of thousands of inboxes and associated git processes won't work well, so we'll use libgit2 to access the object DB directly. We only care about OID lookups and won't need to rely on per-repo revision names or paths. The Git::Raw XS package won't be used since its manpages don't promise a stable API. Since we already use Inline::C and have experience with I::C when it comes to compatibility, this only introduces libgit2 itself as a source of new incompatibilities. This also provides an excuse for me to writev(2) to reduce syscalls, but liburing is on the horizon for next year.