Date | Commit message (Collapse) |
|
Write barriers can take a long time to finish, especially when
commands are issues in parallel. So handle it asynchronously
without blocking lei-daemon by making EOFpipe a little more
flexible by supporting arguments to the callback function.
This is another step towards improving parallel use of lei.
|
|
barrier (synchronous checkpoint) is better than ->done with
parallel lei commands being issued (via '&' or different
terminals), since repeatedly stopping and restarting processes
doesn't play nicely with expensive tasks like `lei reindex'.
This introduces a slight regression in maintaining more
processes (and thus resource use) when lei is idle, but that'll
be fixed in the next commit.
|
|
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..."
added last year to support per-thread searches
764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30)
This only supports instances of public-inbox since 764035c83,
but unfortunately there hasn't been a release since then.
|
|
The musl strftime(3) implementation on AlpineLinux 3.19.0
doesn't support `%k' and `%k' isn't in POSIX, either. So we
fall back to using the `sprintf' perlop in the user-facing UI
since leading zeroes require needless overhead for my eyes and
brain to parse in the time.
|
|
We need to consistently check the exit code of pigz|gzip|xz|bzip2
when writing to compressed mboxes (or bad storage).
|
|
We've always forced LeiToMail to only have one process for v2
outputs anyways since v2 has its own sharding and IPC. Thus we
can use the single LeiToMail process directly to avoid extra IPC
overhead.
|
|
We should be able to treat v2 outputs just like any other mail
format, with the exception that content dedupe is always
enforced by the v2 format.
This allows users hosting v2 public-inboxes to catch up broken
synchronization from alternate archives such as the mbox
archives hosted by https://lists.gnu.org/
Link: https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
|
|
Unlike modern Perls, Perl 5.16.3 on CentOS doesn't accept
negative string signals like "-TERM" .
This only became a problem since commit b231d91f42d7
(treewide: enable warnings in all exec-ed processes)
made our code stricter by enabling more warnings.
In both cases, the kill is probably unnecessary and safe
to remove since we can rely on closing sockets to drop
processes.
|
|
We can rely on Process::IO->DESTROY to close and reap
in these cases. This is the final step in eliminating
the wantarray invocations of popen_rd (and popen_wr).
|
|
Having queries in the process titles makes it easier to diagnose
stuck queries due to IPC problems. This was used to diagnose
commit e97a30e7624d (lei: fix SIGPIPE on large result sets to pager)).
|
|
When dealing with large search results, we need to deal with
EPIPE not just from the pager, but also EPIPE or ECONNRESET
between lei_xsearch and lei2mail processes.
Without this fix, lei_xsearch processes could linger and get
stuck writing to dead lei2mail processes if a user aborts the
pager early during a large result set.
To ensure lei_xsearch processes don't linger around after
lei2mail workers all die, we must close $l2m->{-wq_s2} before
spawning lei_xsearch processes, since $l2m->{-wq_s2} is only
used in lei2mail workers.
For `git cat-file' processes, we also need to trigger
PublicInbox::Git->close to handle unpredictable destructor
ordering to avoid using uninitialized IO refs. This combines
with the `git_to_mail' change to deal with process cleanup
handling from premature shutdowns.
To test all this, we can't just rely on a single message being
large, but also need to rely on the result set being large
enough to saturate the lei_xsearch -> lei2mail socket so we
rely on GIANT_INBOX_DIR once again.
|
|
This will open the door for us to drop `tie' usage from
ProcessIO completely in favor of OO method dispatch. While
OO method dispatches (e.g. `$fh->close') are slower than normal
subroutine calls, it hardly matters in this case since process
teardown is a fairly rare operation and we continue to use
`close($fh)' for Maildir writes.
|
|
We only need to write one byte at MUA start instead of a byte
for every LeiXSearch worker. Also, make sure it succeeds by
enabling autodie for syswrite.
When reading, we can rely on `:perlio' layer `read' semantics
to retry on EINTR to avoid looping and other error checking.
|
|
We use this in various places to minimize or maximize pipe
size on Linux. So keep it all in one place.
|
|
We don't want to end up dumping nr_seen/nr_write when progress
is disabled, nor do we want forked off `lei note-event' workers
dump them when DS->Reset is called on fork.
|
|
Instead of having tail(1) follow a file when we're in verbose
mode, unconditionally pipe stderr to a Perl 2-liner which tees
its output to a regular file with line buffering.
POSIX tee(1) isn't suitable for this task since it's required
to be completely unbuffered while we want line-buffering when
running parallel processes. Fortunately, Perl makes this easy.
This also means we no longer leave curl-err.XXXX files around
on premature shutdown if we're hit by a SIGKILL or similar and
can't exit normally.
We do need to stop and respawn the Perl process if we hit a curl
error, though, since we need to be certain the output is
flushed.
|
|
The `binmode' perlop can only take two scalars, so passing
`@_' blindly won't work since prototypes are checked. This
means we can get IO::Uncompress::Gunzip working properly
with ProcessIO and use it for curl.
We'll also just autodie (instead of warn) on FS errors when
dealing with curl stderr; since the process will likely be
in bigger trouble soon, anyways.
|
|
It's safer against deadlocks and we still get proper error
reporting by passing stderr across in addition to the lei
socket.
|
|
We already have an ->incr callback we can enhance to support
multiple counters with a single request. Furthermore, we can
just flatten the object graph by storing counters directly in
the $lei object itself to reduce hash lookups.
|
|
This will make switching $lei contexts less error-prone
and hopefully save us from some suprising bugs in the future.
Followup-to: 759885e60e59 (lei: ensure --stdin sets %ENV and $current_lei, 2023-09-14)
|
|
When using isearch (that is v1/v2 inbox relying on extindex
for search), there's actually no guarantee that IMAP UIDs
are in the correct order with regard to Xapian docids.
Thus we must iterate through every UID(num) to see if it's
suitable to display in a saved search. The old grep filter
(before commit a6fe84489127) was not effective since it
didn't account for the mset->items correspondence.
Fortunately, this bug merely manifests in reduced performance
as of a6fe84489127. Prior to that, it could cause incorrect
keywords and labels to be applied.
Unfortunately, this behavior is hard-to-test so no test case
is included.
Followup-to: a6fe84489127 (lei up: fix missing -t/--threads matches w/ saved search)
|
|
We must not filter out seen docids from the mset; but only with
the result of over->expand_thread.
|
|
Xapian can't parse every query, so ensure we set the
exit code for the client.
|
|
This brings t/lei-index.t back down from ~8 to ~3s. I didn't
notice this before was because the LeiNoteEvent timer was firing
every 5s and clearing circular refs and parallel testing meant
the delay got hidden.
Fixes: 4a2a95bbc78f99c8 (ipc+lei: switch to awaitpid, 2023-01-17)
|
|
This avoids awkwardly stuffing an arrayref into callbacks
which expect multiple arguments. IPC->awaitpid_init now
allows pre-registering callbacks before spawning workers.
|
|
While users may specify relative paths for convenience on the
command-line, absolute paths are required for `lei up' since
that (especially `lei up --all') could run from anywhere.
Note that we need to do this when parsing the command-line
options, since shortcuts for URL matching on URL path components
are allowed for `lei q', and those same shortcuts may remain
in effect across to `lei up' as the underlying external may
be moved to a different URI host.
|
|
I ended up with my $HOME in
~/.cache/lei/all_locals_ever.git/objects/info/alterntes
and am trying to avoid that in the future.
|
|
It's a needless wrapper, nowadays. Originally, ->over was added
on experimental basis to optimize for /$INBOX/ where Xapian
->search is slower on gigantic (LKML-sized) inboxes.
Nowadays with extindex, ->over is here to stay given NNTP and
IMAP both benefit from it. So reduce the interpreter stack
overhead and just access ->over directly.
lxs->recent was never used outside of tests, anyways.
And while we're in the area, avoid needlessly bumping the
refcount of $ctx->{ibx} in View::paginate_recent.
|
|
This may help track down deduplication or other bugs in lei
which lead to occasionally missing messages.
Link: https://public-inbox.org/meta/CAL_JsqJH8xx_2NyZffNsRXbGXiv3kjmCETvKXt3Yfb0uToLm9Q@mail.gmail.com/
|
|
There's no need to check for two fields when one will suffice.
|
|
Following commit 57fed2e4b78ed394 (lei: normalize whitespace in
remote queries, 2021-09-11), leaving the trailing `\n' from
stdin queries to be normalized to ` ' (SP) causes it to appear
as `+' in URLs, which Xapian ignores.
|
|
SIGPIPE and SIGTERM are common and user-induced, so they're
not worth warning on. Add the value of "$?", though, since
it can help users notice other errors (e.g. SIGSEGV).
|
|
We need to update the {-nr_remote_eml} counter regardless
of progress display being enabled since it's needed for
saved searches. We'll also split out the {-imported} flag
separately and only call LeiStore->done if a new message
was imported.
Note: this change is NOT expected to fix errors reported by
Thomas in <ebf92218-1470-4602-b534-6dae59639dc6@t-8ch.de>
Cc: Thomas Weißschuh <thomas@t-8ch.de>
|
|
This will make future developments easier.
|
|
This allows "lei up" to continue processing unrelated externals
if on output fails.
|
|
Relying on $lei->fail is unsustainable since there'll always
be parts of our code and dependencies which can trigger die()
and break the event loop.
|
|
v2w->wq_do('done') may die on I/O errors, and likely other
places. Just guard the entire block with an eval and ->fail
as appropriate.
|
|
Simplify our APIs and force dwaitpid() to work in async mode for
all lei workers. This avoids having lingering zombies for
parallel searches if one worker finishes soon before another.
The old distinction between "old" and "new" workers was
needlessly complex, error-prone, and embarrasingly bad.
We also never handled v2:// writers properly before on
Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS
to ensure they get handled by $lei when appropropriate.
|
|
When importing several sources in parallel via http(s) mboxrd,
we need to be able to get keywords of uncommitted documents
directly from shard workers. Otherwise, Xapian DocNotFound
errors happen because the read-only LeiSearch won't see
documents from uncomitted transactions. Keep in mind that it's
possible the keywords can be changed on-the-fly even for
uncommitted documents because of inotify watches from LeiNoteEvent.
|
|
By relying more on pgroups for remaining remaining processes,
this lets us pause all curl+tail subprocesses with a single
kill(2) to avoid cluttering stderr.
We won't bother pausing the pigz/gzip/bzip2/xz compressor
process not cat-file processes, though, since those don't write
to the terminal and they idle soon after the workers react to
SIGSTOP.
AutoReap is hoisted out from TestCommon.pm. CLONE_SKIP
is gone since we won't be using Perl threads any time
soon (they're discouraged by the maintainers of Perl).
|
|
This lets users Ctrl-Z from their terminal to pause an entire
git-clone process hierarchy.
|
|
Large chunks of our codebase and 3rd-party dependencies do not
use ->{psgi.errors}, so trying to standardize on it was a
fruitless endeavor. Since warn() and carp() are standard
mechanism within Perl, just use that instead and simplify a
bunch of existing code.
|
|
warn() is easier to augment with context information, and
frankly unavoidable in the presence of 3rd-party libraries
we don't control.
|
|
Since switching to SOCK_SEQUENTIAL, we no longer have to use
fixed-width records to guarantee atomic reads. Thus we can
maintain more human-readable/searchable PktOp opcodes.
Furthermore, we can infer the subroutine name in many cases
to avoid repeating ourselves by specifying a command-name
twice (e.g. $ops->{CMD} => [ \&CMD, $obj ]; can now simply be
written as: $ops->{CMD} => [ $obj ] if CMD is a method of
$obj.
|
|
Sometimes a user (e.g. me) isn't really sure what timezone
they're in...
|
|
It's probably least confusing for user-facing messages to
display times in the user's configured timezone. I considered
appending "UTC" to the message and sticking with gmtime(), too,
but this output isn't intended to be web-cache friendly nor
expect users from across multiple timezones to view the same
output.
|
|
Avoid slurping gigantic (e.g. 100000) result sets into a single
response if a giant limit is specified, and instead use 10000
as a window for the mset with a given offset. We'll also warn
and hint towards about the --limit= switch when the estimated
result set is larger than the default limit.
|
|
Overwriting existing destinations safe (but slow) by default,
so show a progress message noting what we're doing while
a user waits.
|
|
If lcat-ing multiple argument types (blobs vs folders),
maintain the original order of the arguments instead of
dumping all blobs before folder contents.
|
|
We're not using Data::Dumper for JSON output.
|