Date | Commit message (Collapse) |
|
This ensures our tests actually test the -j0 and -j1 cases
properly.
|
|
As with our popen_* uses, we can simplify callers by using
attach_pid to handle automatic reaping upon close.
|
|
No need to waste memory bandwidth when we can just rely on
the preprocessor to load the header.
|
|
We already stash the associated FD for reporting at startup and
don't need to call `fileno' again. Found via manual code
inspection while considering the effort to make async {forward}
from PublicInbox::HTTP more like the generic long_response API
and {long_cb} field used by IMAP/NNTP/POP3.
|
|
We can't assume signals are blocked when neither signalfd nor
EVFILT_SIGNAL are in use. So just return an empty result so
the caller can recalculate the timeout.
I found this bug while making xt/httpd-async-stream.t
use our event loop to reap processes but have abandoned
that effort for now since it didn't save any code.
|
|
We must not attempt to read request bodies from the HTTP client
while processing a long request since that drains pipelined
requests. The NNTP/IMAP/POP3 event_step callbacks follow the
same behavior when {long_cb} is present from ->long_response.
This bug has little real-world consequence since HTTP/1.1
pipelining is not widely-used, especially when behind varnish
or other reverse proxies.
I found this bug while randomly strace-ing an active -netd
process to see the kind of traffic it was seeing.
|
|
`reset' means we want to ignore existing join data, while
the default (non-reset) means we perform an incremental
join while taking into account existing (fuzzy) join data.
|
|
This fixes t/lei-q-save.t getting stuck since $self->{ale} is
already gone by the time DESTROY gets called.
|
|
For users hosting read-only mirrors (via clone|fetch) and feeding
inboxes via -watch
|
|
We only care about error checking when stdout is an mbox output
pointed to a pathname. This is noticeable with `lei up' with
multiple non-mbox* destinations. We'll also ensure throwing
exceptions to trigger lei->x_it from lei->do_env results in the
epoll/kqueue watch being discarded, otherwise commands may never
terminate (leading to stuck tests)
|
|
The association data is just stored as deflated JSON in Xapian
metadata keys of shard[0] for now. It should be reasonably
compact and fit in memory for now since we'll assume sane,
non-malicious git coderepo history, for now.
The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be
set in the environment and tests the joins against the inboxes
and coderepos of two small projects with a common history.
Internally, we'll use `ibx_off', `root_off' instead of `ibx_id'
and `root_id' since `_id' may be mistaken for columns in an SQL
database which they are not.
|
|
We only set $MAX_SIZE at startup, and there's no need to
use a local $self->{roots} for the per-repo roots array.
|
|
Informal benchmarks show a rough 5% indexing improvement on an
SMP system when there are idle cores due to Xapian shards being
I/O bound (since `git patch-id' is mainly CPU bound).
This is only parallelized on a per-patch basis. Further
increasing parallelism would increase complexity and probably
not be worth it since `git patch-id' is reasonably fast while
our text indexing tends to be slow.
|
|
I encountered the odd lack of `return' while chasing Gcf2 bugs
on CentOS 7.x which resulted in commit 7d06b126e939
("gcf2: fix autodie usage for older Perl") and commit e618c7654794
("gcf2client: add alias for PublicInbox::Git::fail") before
realizing the lack of `return' here wasn't the culprit behind
failures on CentOS 7.x.
However, the use of a `return' here appears required in case we
actually hit the error path, since falling through and
attempting my_readline with an undefined filehandle is always a
failure.
Fixes: e97a30e7624d ("lei: fix SIGPIPE on large result sets to pager")
|
|
We want to use the filenames tail will watch, not the number of
args passed to the `tail_f' subroutine.
Fixes: 9231d2e7b93f (tests: map CLOFORK->FD_CLOEXEC temporarily for `tail -f')
|
|
Stale entries from newsgroup name changes (including adding
a `publicinbox.<name>.newsgroup' entry when none existed
before) can wreak havoc during a --reindex. So give the
hint to users about running -extindex with --gc to clean
up stale entries.
|
|
We need to consistently check the exit code of pigz|gzip|xz|bzip2
when writing to compressed mboxes (or bad storage).
|
|
We've always forced LeiToMail to only have one process for v2
outputs anyways since v2 has its own sharding and IPC. Thus we
can use the single LeiToMail process directly to avoid extra IPC
overhead.
|
|
We should be able to treat v2 outputs just like any other mail
format, with the exception that content dedupe is always
enforced by the v2 format.
This allows users hosting v2 public-inboxes to catch up broken
synchronization from alternate archives such as the mbox
archives hosted by https://lists.gnu.org/
Link: https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
|
|
This is needed to support forking from already-forked lei workers
and $lei->{2} is already STDERR.
Fixes: e015c3742f91 (lei: use autodie where appropriate, 2023-10-17)
|
|
Eric Wong <e@80x24.org> wrote:
> Avoid mixing autodie use in different scopes since it's likely
> to cause problems like it did in Gcf2. While none of these
> fix known problems with test cases, it's likely worthwhile to
> avoid it anyways to avoid future surprises.
> lib/PublicInbox/XapHelperCxx.pm | 18 ++++++++----------
That XapHelperCxx change was totally necessary for running the
C++ build on CentOS 7.x (but the test is auto-skipped on any
build failure), as is this one:
--------8<--------
Subject: [PATCH] xap_helper_cxx: accept leading spaces from pkg-config
pkg-config 0.27.1 and xapian14-core-devel (1.4.24-1.el7) on
CentOS 7.x will print a leading space when running
`pkg-config --libs --cflags xapian-core'. This leading
space creates an empty string when `split' with /\s+/ as
a pattern. Instead, use the documented ' ' (SP) character
to put split into "awk mode" which eats leading (and
redundant) spaces and tabs.
|
|
Avoid mixing autodie use in different scopes since it's likely
to cause problems like it did in Gcf2. While none of these
fix known problems with test cases, it's likely worthwhile to
avoid it anyways to avoid future surprises.
For Process::IO, we'll add some additional tests in t/io.t
to ensure we don't get unintended exceptions for try_cat.
|
|
At least on Perl v5.16.3 on CentOS 7.x, use-ing autodie within
BEGIN {} affects all subroutines in that package, too. So just
use autodie at the top-level and rely on CORE::* and try_cat
to handle cases where autodie isn't desired.
|
|
Ensure we can ->fail properly from other subs we can within
Gcf2Client. This doesn't fix the test failures on CentOS 7.x,
but tries to make it easier to fix underlying problems and
report OOM errors and other things which the test suite doesn't
touch on.
|
|
This ensures we can notice shutdown events in one-shot scripts
like -cindex (and eventually -clone/-fetch/-compact) without
forcing another real event to fire.
|
|
It was only there for development purposes because associate is
slow, but it causes the test to get stuck on systems where it's
not available. So remove it and just call join(1posix).
Note: this is not the `time' builtin found in shells, this
executable shows memory and pagefault info (and more with the
`-v' switch). Unfortunately, it's not installed on many systems
despite being widely-packaged.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
|
|
Unlike modern Perls, Perl 5.16.3 on CentOS doesn't accept
negative string signals like "-TERM" .
This only became a problem since commit b231d91f42d7
(treewide: enable warnings in all exec-ed processes)
made our code stricter by enabling more warnings.
In both cases, the kill is probably unnecessary and safe
to remove since we can rely on closing sockets to drop
processes.
|
|
The tests will check for strace >= 4.16, but version 4.24 that I have
does not accept --version, only -V. This works for both older and newer
strace, so switch to using "strace -V" for the check.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
|
|
Start lowercasing newsgroup names automatically since uppercase
names are incompatible with IMAP and POP3 and also causes
problems with both -extindex and -cindex.
We'll also warn on eidx_key and newsgroup conflicts to avoid
sometimes subtle breakage when using -extindex and -cindex.
|
|
Noticed while working on another feature...
|
|
This is shorthand to enabling --associate with the most
aggressive (and time-consuming) options available, starting from
the Unix epoch and having an unlimited window to join on.
|
|
"window" is probably a better term since it's an inexact thing
to match on.
|
|
We can return an array to allow the caller to omit the internal
`-m' arg entirely. We'll also allow any non-positive values to
mean there's no limit; and we'll defer the "unlimited" case to
the XapHelper implementation. This frees us of having to deal
with mismatches between Perl and Xapian if Xapian was compiled
with 64-bit docid support and we're stuck on a 32-bit Perl
build.
|
|
We'll use `kbuf' for the search object key, since we already use
the `fbuf' term in `struct fbuf'. This also adds an extra check
for open_memstream(3) failures in case of ENOMEM.
|
|
We'll require an error stream for dump_ibx and dump_roots
commands; they're too important to ignore. Instead of writing
code to provide diagnostics for errors, rely on abort(3) and the
-ggdb3 compiler flag to generate nice core dumps for gdb since
all commands sent to xap_helper are from internal users.
We'll even abort on most usage errors since they could be
bugs in split2argv or our use of getopt(3).
We'll also just exit on ENOMEM errors since it's the easiest way
to recover from those errors by starting a new process which
closes all open Xapian DB handles.
|
|
This will help us notice bugs and system resource limitations
sooner rather than later.
|
|
The C++ version does, so the Perl/XS version should, too;
even if we intentionally avoid using it right now.
|
|
Prune can get rid of invalid commits while indexing can add new
candidates for association, so we don't dump coderepo roots for
association until those are squared away. However, we can dump
inbox info since we don't touch inboxes while -cindex is running.
|
|
I just forgot to use --all with --associate and it wasn't
easily apparent what was wrong. We'll also show some extra
progress while we're at it.
|
|
None of our current code relies on it, and I can't imagine it's
something we'd need in the future, actually... This keeps the
door open for relying more on Spawn in TestCommon.
|
|
read_all can be expanded to support FIFOs/pipes/sockets where
read-until-EOF behavior is desired. We can also rely on
wantarray to support splitting on EOL markers, but it's
hard-coded to support only `$/ eq "\n"' since (AFAIK)
it's the only way we use the wantarray form `readline'.
|
|
No need to suffer through an extra dose of slow Perl load times
when we can drive the build in the big parent Perl process and
get the executable path name to pass to spawn directly.
|
|
-ggdb3 is already used for g++ and clang, and -pipe is supported
by clang even if it's a no-op. So just use it to speed up g++
since it saves me 30-40ms.
We'll also get rid of the explicit `-O0' since it's the default
for both clang and g++.
|
|
We need to have stable filenames and separate compilation
from the linkage stage for ccache to hit. So avoid the use
of a temporary directory and instead rely on a lock file to
guard against parallel builds.
|
|
PublicInbox::IO already gets loaded by PublicInbox::Spawn, so
there's no avoiding it even if we want fast startup time :<
But startup time for this piece will be less relevant in the
near future...
|
|
We can let these pipes get auto-closed upon leaving the process
subroutine scope.
|
|
`stat' can fail due to bugs on our end or ENOMEM, but there's
no autodie support for it. So just die if `unlink' fails, since
the FS wouldn't be usable for tmpfiles in that state, anyways.
|
|
We actually need to rely on autodie `close' to check for errors,
since error-checking with `say' is not useful due to perlio
write buffering. We'll also stop relying on `say ... or die'
since it's needless noise.
Fixes: 19f9089343c9 (cindex: drop redundant close on regular FH)
|
|
I only noticed this while doing a full -cindex --associate with
--associate-date-range=30.years.ago and --associate-max=-1 (no
limit for Xapian) between local mirrors of lore and
git.kernel.org my glibc-based system.
Apparently, glibc requires `optind = 0' to reset getopt(3) in
our workers. Oddly, glibc appeared to work fine prior to this
change for the defaults (--associate-date-range=1.year.ago..
and --associate-max=50000).
BSDs and musl have an `optreset' variable which appear to do
the same thing, but I don't have space on BSD VMs to test full
associations.
While we're at it, we'll also keep `opterr' enabled to improve
error reporting.
|
|
We must use a foreground process to read from terminals
on stdin, otherwise weird things like lost keystrokes and
EIO can happen. So take advantage of ->send_exec_cmd to
spawn `cat' in the same way we spawn MUAs, pagers,
`git config --edit' and `git credential' from script/lei
|