about summary refs log tree commit homepage
path: root/lib/PublicInbox
DateCommit message (Collapse)
2023-11-26xap_client: pass arguments to top-level xap_helper
This ensures our tests actually test the -j0 and -j1 cases properly.
2023-11-26xap_client: attach PID to the IO object
As with our popen_* uses, we can simplify callers by using attach_pid to handle automatic reaping upon close.
2023-11-26xap_helper_cxx: do not copy xap_helper.h source
No need to waste memory bandwidth when we can just rely on the preprocessor to load the header.
2023-11-26ds: long_step: eliminate redundant fileno call
We already stash the associated FD for reporting at startup and don't need to call `fileno' again. Found via manual code inspection while considering the effort to make async {forward} from PublicInbox::HTTP more like the generic long_response API and {long_cb} field used by IMAP/NNTP/POP3.
2023-11-26select+poll: have caller retry on EINTR
We can't assume signals are blocked when neither signalfd nor EVFILT_SIGNAL are in use. So just return an empty result so the caller can recalculate the timeout. I found this bug while making xt/httpd-async-stream.t use our event loop to reap processes but have abandoned that effort for now since it didn't save any code.
2023-11-26http: fix pipelining during long async requests
We must not attempt to read request bodies from the HTTP client while processing a long request since that drains pipelined requests. The NNTP/IMAP/POP3 event_step callbacks follow the same behavior when {long_cb} is present from ->long_response. This bug has little real-world consequence since HTTP/1.1 pipelining is not widely-used, especially when behind varnish or other reverse proxies. I found this bug while randomly strace-ing an active -netd process to see the kind of traffic it was seeing.
2023-11-25cindex: fix --join=reset and speed up incremental joins
`reset' means we want to ignore existing join data, while the default (non-reset) means we perform an incremental join while taking into account existing (fuzzy) join data.
2023-11-22lei_saved_search: don't create Git object during ->DESTROY
This fixes t/lei-q-save.t getting stuck since $self->{ale} is already gone by the time DESTROY gets called.
2023-11-22watch: support `watch=false' to negate watchspam
For users hosting read-only mirrors (via clone|fetch) and feeding inboxes via -watch
2023-11-22lei_to_mail: don't close STDOUT unless it is a mbox* output
We only care about error checking when stdout is an mbox output pointed to a pathname. This is noticeable with `lei up' with multiple non-mbox* destinations. We'll also ensure throwing exceptions to trigger lei->x_it from lei->do_env results in the epoll/kqueue watch being discarded, otherwise commands may never terminate (leading to stuck tests)
2023-11-21cindex: rename --associate to --join, test w/ real repos
The association data is just stored as deflated JSON in Xapian metadata keys of shard[0] for now. It should be reasonably compact and fit in memory for now since we'll assume sane, non-malicious git coderepo history, for now. The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be set in the environment and tests the joins against the inboxes and coderepos of two small projects with a common history. Internally, we'll use `ibx_off', `root_off' instead of `ibx_id' and `root_id' since `_id' may be mistaken for columns in an SQL database which they are not.
2023-11-21cindex: avoid unneeded and redundant `local' calls
We only set $MAX_SIZE at startup, and there's no need to use a local $self->{roots} for the per-repo roots array.
2023-11-21searchidx: run `git patch-id' in parallel
Informal benchmarks show a rough 5% indexing improvement on an SMP system when there are idle cores due to Xapian shards being I/O bound (since `git patch-id' is mainly CPU bound). This is only parallelized on a per-patch basis. Further increasing parallelism would increase complexity and probably not be worth it since `git patch-id' is reasonably fast while our text indexing tends to be slow.
2023-11-20git: return upon self->close
I encountered the odd lack of `return' while chasing Gcf2 bugs on CentOS 7.x which resulted in commit 7d06b126e939 ("gcf2: fix autodie usage for older Perl") and commit e618c7654794 ("gcf2client: add alias for PublicInbox::Git::fail") before realizing the lack of `return' here wasn't the culprit behind failures on CentOS 7.x. However, the use of a `return' here appears required in case we actually hit the error path, since falling through and attempting my_readline with an undefined filehandle is always a failure. Fixes: e97a30e7624d ("lei: fix SIGPIPE on large result sets to pager")
2023-11-20test_common: fix excessive wait for GNU tail inotify
We want to use the filenames tail will watch, not the number of args passed to the `tail_f' subroutine. Fixes: 9231d2e7b93f (tests: map CLOFORK->FD_CLOEXEC temporarily for `tail -f')
2023-11-16extindex: warn and hint about --gc on bad ibx_id
Stale entries from newsgroup name changes (including adding a `publicinbox.<name>.newsgroup' entry when none existed before) can wreak havoc during a --reindex. So give the hint to users about running -extindex with --gc to clean up stale entries.
2023-11-16lei q|up|convert: common finish_output to detect errors
We need to consistently check the exit code of pigz|gzip|xz|bzip2 when writing to compressed mboxes (or bad storage).
2023-11-16lei: avoid extra fork for v2 outputs
We've always forced LeiToMail to only have one process for v2 outputs anyways since v2 has its own sharding and IPC. Thus we can use the single LeiToMail process directly to avoid extra IPC overhead.
2023-11-16lei convert: fix repeat and idempotent v2 output
We should be able to treat v2 outputs just like any other mail format, with the exception that content dedupe is always enforced by the v2 format. This allows users hosting v2 public-inboxes to catch up broken synchronization from alternate archives such as the mbox archives hosted by https://lists.gnu.org/ Link: https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
2023-11-16lei: fix idempotent STDERR redirect in workers
This is needed to support forking from already-forked lei workers and $lei->{2} is already STDERR. Fixes: e015c3742f91 (lei: use autodie where appropriate, 2023-10-17)
2023-11-15xap_helper_cxx: accept leading spaces from pkg-config
Eric Wong <e@80x24.org> wrote: > Avoid mixing autodie use in different scopes since it's likely > to cause problems like it did in Gcf2. While none of these > fix known problems with test cases, it's likely worthwhile to > avoid it anyways to avoid future surprises. > lib/PublicInbox/XapHelperCxx.pm | 18 ++++++++---------- That XapHelperCxx change was totally necessary for running the C++ build on CentOS 7.x (but the test is auto-skipped on any build failure), as is this one: --------8<-------- Subject: [PATCH] xap_helper_cxx: accept leading spaces from pkg-config pkg-config 0.27.1 and xapian14-core-devel (1.4.24-1.el7) on CentOS 7.x will print a leading space when running `pkg-config --libs --cflags xapian-core'. This leading space creates an empty string when `split' with /\s+/ as a pattern. Instead, use the documented ' ' (SP) character to put split into "awk mode" which eats leading (and redundant) spaces and tabs.
2023-11-15treewide: more autodie safety fixes for older Perl
Avoid mixing autodie use in different scopes since it's likely to cause problems like it did in Gcf2. While none of these fix known problems with test cases, it's likely worthwhile to avoid it anyways to avoid future surprises. For Process::IO, we'll add some additional tests in t/io.t to ensure we don't get unintended exceptions for try_cat.
2023-11-15gcf2: fix autodie usage for older Perl
At least on Perl v5.16.3 on CentOS 7.x, use-ing autodie within BEGIN {} affects all subroutines in that package, too. So just use autodie at the top-level and rely on CORE::* and try_cat to handle cases where autodie isn't desired.
2023-11-15gcf2client: add alias for PublicInbox::Git::fail
Ensure we can ->fail properly from other subs we can within Gcf2Client. This doesn't fix the test failures on CentOS 7.x, but tries to make it easier to fix underlying problems and report OOM errors and other things which the test suite doesn't touch on.
2023-11-15ds: run @post_loop_do if any user-queued events run
This ensures we can notice shutdown events in one-shot scripts like -cindex (and eventually -clone/-fetch/-compact) without forcing another real event to fire.
2023-11-15cindex: fix test when missing time(1) executable
It was only there for development purposes because associate is slow, but it causes the test to get stuck on systems where it's not available. So remove it and just call join(1posix). Note: this is not the `time' builtin found in shells, this executable shows memory and pagefault info (and more with the `-v' switch). Unfortunately, it's not installed on many systems despite being widely-packaged. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2023-11-15lei: use -signal numbers for old Perl
Unlike modern Perls, Perl 5.16.3 on CentOS doesn't accept negative string signals like "-TERM" . This only became a problem since commit b231d91f42d7 (treewide: enable warnings in all exec-ed processes) made our code stricter by enabling more warnings. In both cases, the kill is probably unnecessary and safe to remove since we can rely on closing sockets to drop processes.
2023-11-14TestCommon: older strace does not have --version
The tests will check for strace >= 4.16, but version 4.24 that I have does not accept --version, only -V. This works for both older and newer strace, so switch to using "strace -V" for the check. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2023-11-14config: avoid eidx_key and newsgroup conflicts
Start lowercasing newsgroup names automatically since uppercase names are incompatible with IMAP and POP3 and also causes problems with both -extindex and -cindex. We'll also warn on eidx_key and newsgroup conflicts to avoid sometimes subtle breakage when using -extindex and -cindex.
2023-11-14cindex: fix missing semicolon on broken $GIT_DIR/objects
Noticed while working on another feature...
2023-11-13cindex: support --associate-aggressive shortcut
This is shorthand to enabling --associate with the most aggressive (and time-consuming) options available, starting from the Unix epoch and having an unlimited window to join on.
2023-11-13cindex: rename associate-max => window
"window" is probably a better term since it's an inexact thing to match on.
2023-11-13cindex: do not guess integer maximum for Xapian
We can return an array to allow the caller to omit the internal `-m' arg entirely. We'll also allow any non-positive values to mean there's no limit; and we'll defer the "unlimited" case to the XapHelper implementation. This frees us of having to deal with mismatches between Perl and Xapian if Xapian was compiled with 64-bit docid support and we're stuck on a 32-bit Perl build.
2023-11-13xap_helper: better variable naming for key buffer
We'll use `kbuf' for the search object key, since we already use the `fbuf' term in `struct fbuf'. This also adds an extra check for open_memstream(3) failures in case of ENOMEM.
2023-11-13xap_helper: stricter and harsher error handling
We'll require an error stream for dump_ibx and dump_roots commands; they're too important to ignore. Instead of writing code to provide diagnostics for errors, rely on abort(3) and the -ggdb3 compiler flag to generate nice core dumps for gdb since all commands sent to xap_helper are from internal users. We'll even abort on most usage errors since they could be bugs in split2argv or our use of getopt(3). We'll also just exit on ENOMEM errors since it's the easiest way to recover from those errors by starting a new process which closes all open Xapian DB handles.
2023-11-13cidx_xap_helper_aux: complain about truncated inputs
This will help us notice bugs and system resource limitations sooner rather than later.
2023-11-13xap_helper: Perl dump_ibx respects `-m MAX'
The C++ version does, so the Perl/XS version should, too; even if we intentionally avoid using it right now.
2023-11-13cindex: delay associate until prune+indexing finish
Prune can get rid of invalid commits while indexing can add new candidates for association, so we don't dump coderepo roots for association until those are squared away. However, we can dump inbox info since we don't touch inboxes while -cindex is running.
2023-11-13cindex: imply --all with --associate w/o -I/--only
I just forgot to use --all with --associate and it wasn't easily apparent what was wrong. We'll also show some extra progress while we're at it.
2023-11-13spawn: don't append to scalarrefs on stdout/stderr
None of our current code relies on it, and I can't imagine it's something we'd need in the future, actually... This keeps the door open for relying more on Spawn in TestCommon.
2023-11-13treewide: update read_all to avoid eof|close checks
read_all can be expanded to support FIFOs/pipes/sockets where read-until-EOF behavior is desired. We can also rely on wantarray to support splitting on EOL markers, but it's hard-coded to support only `$/ eq "\n"' since (AFAIK) it's the only way we use the wantarray form `readline'.
2023-11-13xap_client: spawn C++ xap_helper directly
No need to suffer through an extra dose of slow Perl load times when we can drive the build in the big parent Perl process and get the executable path name to pass to spawn directly.
2023-11-13xap_helper_cxx: use -pipe by default in CXXFLAGS
-ggdb3 is already used for g++ and clang, and -pipe is supported by clang even if it's a no-op. So just use it to speed up g++ since it saves me 30-40ms. We'll also get rid of the explicit `-O0' since it's the default for both clang and g++.
2023-11-13xap_helper_cxx: make the build process ccache-friendly
We need to have stable filenames and separate compilation from the linkage stage for ccache to hit. So avoid the use of a temporary directory and instead rely on a lock file to guard against parallel builds.
2023-11-13xap_helper_cxx: use write_file helper
PublicInbox::IO already gets loaded by PublicInbox::Spawn, so there's no avoiding it even if we want fast startup time :< But startup time for this piece will be less relevant in the near future...
2023-11-13cindex: use `local' for pipes between processes
We can let these pipes get auto-closed upon leaving the process subroutine scope.
2023-11-13tmpfile: check `stat' errors, use autodie for unlink
`stat' can fail due to bugs on our end or ENOMEM, but there's no autodie support for it. So just die if `unlink' fails, since the FS wouldn't be usable for tmpfiles in that state, anyways.
2023-11-13cindex: check `say' errors w/ close or ->flush
We actually need to rely on autodie `close' to check for errors, since error-checking with `say' is not useful due to perlio write buffering. We'll also stop relying on `say ... or die' since it's needless noise. Fixes: 19f9089343c9 (cindex: drop redundant close on regular FH)
2023-11-13xap_helper: reset getopt(3) properly in workers
I only noticed this while doing a full -cindex --associate with --associate-date-range=30.years.ago and --associate-max=-1 (no limit for Xapian) between local mirrors of lore and git.kernel.org my glibc-based system. Apparently, glibc requires `optind = 0' to reset getopt(3) in our workers. Oddly, glibc appeared to work fine prior to this change for the defaults (--associate-date-range=1.year.ago.. and --associate-max=50000). BSDs and musl have an `optreset' variable which appear to do the same thing, but I don't have space on BSD VMs to test full associations. While we're at it, we'll also keep `opterr' enabled to improve error reporting.
2023-11-13lei: don't read --stdin terminals from daemon
We must use a foreground process to read from terminals on stdin, otherwise weird things like lost keystrokes and EIO can happen. So take advantage of ->send_exec_cmd to spawn `cat' in the same way we spawn MUAs, pagers, `git config --edit' and `git credential' from script/lei