Date | Commit message (Collapse) |
|
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
|
|
We don't want circular references giving surprising behavior
during worker exit.
|
|
Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
|
|
"next" apparently doesn't work in "do {} while" loops,
so just use "while" as it makes no difference, here.
|
|
|
|
Nginx doesn't appear to be happy with just -XPOST, so use -d '' to
avoid potential confusion about why the instructions aren't working.
cf. commit 533e1234bc03a1ca8754d249aa8c2ce157e26780
(lei_xsearch: use curl -d '' for nginx compatibility, 2021-01-24)
|
|
The features we use for SharedKV could probably be implemented
with GDBM_File or SDBM_File, but that doesn't seem worth it at
the moment since we depend on SQLite elsewhere.
|
|
There'll probably be more things which work on both GNU and
*BSD systems which we don't need separate strings for.
|
|
This will be useful on shared machines when a user doesn't want
search queries visible to other users looking at the ps(1)
output or similar.
|
|
Perl may internally race and miss signals due to a lack of
self-pipe / eventfd / signalfd / EVFILT_SIGNAL usage. While our
event loop paths avoid these problems by using signalfd or
EVFILT_SIGNAL, thse sleep() calls are not within the event loop.
|
|
Given the presence of one external on a certain host or prefix
path, it's logical other inboxes would share a common prefix.
For bash users, attempt to complete that using the "-o nospace"
option of bash
|
|
Because some commands have many options which take up
multiple screens.
|
|
This will make it even easier for RSI-afflicted users to use,
since many externals may share a common prefix.
|
|
We can do basename matching when it's unambiguous. Since '*?[]'
characters are rare in URLs and pathnames, we'll do glob
matching by default to support a (curl-inspired) --globoff/-g
option to disable globbing.
And fix --exclude while we're at it
|
|
IO::Uncompress::Gunzip seems to be losing $? when closing
PublicInbox::ProcessPipe. To workaround this, do a synchronous
waitpid ourselves to force proper $? reporting update tests to
use the new --only feature for testing invalid URLs.
This improves internal code consistency by having {pkt_op}
parse the same ASCII-only protocol script/lei understands.
We no longer pass {sock} to worker processes at all,
further reducing FD pressure on per-user limits.
|
|
|
|
No reason to check for $lei->{oneshot} here.
|
|
We don't need to be sending errors directly to the client, but
instead go through lei-daemon or the top-level one-shot process.
|
|
lei2mail doesn't need stdin anymore, so we can use the [0] slot
for the $not_done keepalive purposes.
|
|
This comma-delimited parameter allows controlling the number or
lei_xsearch and lei2mail worker processes. With the change
to make IPC wq_* work use the event loop, it's now safe to
run fewer worker processes for searching with no risk of
deadlocks.
MAX_PER_HOST isn't configurable yet for remote hosts,
and maybe it shouldn't be due to potential for abuse.
|
|
We won't be reporting progress when output is going to stdout
since it can clutter up the terminal unless stderr != stdout,
which probably isn't worth checking.
We'll also use a more agnostic mset_progress which may
make it easier to support worker-less invocations.
|
|
We may reuse these objects in the non-worker code paths.
|
|
We can safely rely on exit(0) here when interacting with curl(1)
and git(1), unlike query workers which hit Xapian directly,
where some badness happens when hit with a signal while
retrieving an mset.
|
|
In case we have other bugs in our code.
|
|
|
|
Avoid on-stack shortcuts which may prevent destructors from
firing since we're not inside the event loop. We'll also tidy
up the unlink mechanism in LeiOverview while we're at it.
|
|
The daemon must not be fooled into thinking it's in oneshot
after a lei client disconnects and erases {sock}.
|
|
We may have further URLs to read in that process, so ensure
we don't end up having tail send stale data.
|
|
Because .onion URLs names are long!
|
|
Some of our command-line switches take no arguments, and need
no completion for those arguments.
|
|
-I is short for --include since it's standard for C compilers
(along with Perl and Ruby). There are no single-character
shortcuts for --exclude or --only, since I don't expect
--exclude to be used very often and --only is already short (and
will support shell completion).
|
|
Sometimes it can be confusing for "lei q" to finish writing to a
Maildir|mbox and not know if it did anything. So show some
per-external progress and stats.
These can be disabled via the new --quiet/-q switch.
We differ slightly from mairix(1) here, as we use stderr
instead of stdout for reporting totals (and we support
parallel queries from various sources).
|
|
Otherwise, we were only getting 50 matches without (-t)
thread expansion.
|
|
This will allow us to use larger messages and do progress
reporting to accumulate in the main daemon.
|
|
$_ at the top of a potentially deep stack below may cause
surprising behavior as I experienced with ExtSearchIdx. In the
future, we'll limit our $_ usage to easily-auditable bits (e.g.
map, grep, and small for loops)
|
|
The Perl 5 stack is weakly-referenced for performance reasons.
This means it's possible for items in the stack to be freed
while executing further down the stack.
In lei (and perhaps public-facing read-only daemons in the
future), we'll fork and call PublicInbox::DS->Reset in the child
process. This causes %DescriptorMap to be clobbered, allowing
the $DescriptorMap{$fd} arg to be freed inside the child
process.
When Carp::confess or Carp::longmess is called to generate a
backtrace, it may access the @DB::args array. This array access
is not protected by reference counting and is known to cause
segfaults and other weird errors.
While the caller of an unnecessary Carp::confess may be
eliminated in a future commit, we can't guarantee our
dependencies will be free of @DB::args access attempts
in the future.
So guard against this Perl 5 quirmk by defensively bumping the
refcount of any object we call ->event_step on.
cf. https://rt.perl.org/Public/Bug/Display.html?id=131046
https://github.com/Perl/perl5/issues/15928
|
|
This avoids a zombie if another step of the event loop
takes too long.
|
|
These DBs will probably be too small to be worth increasing the
cache size of.
|
|
At most, we'll only warn once per worker when a Maildir
disappears from under us. We'll also use the '!' OpPipe
to note the exceptional condition, and use '|' to SIGPIPE
so it'll be a bit easier for hackers to remember.
|
|
This allows us to avoid repeated open() and close() syscalls
and speeds up the new xt/stress-sharedkv.t maintainer test
by roughly 7%.
|
|
PublicInbox::Listener unconditionally sets O_NONBLOCK upon
accept(), so we need a larger timeout under heavy load since
there's no "dataready" accept filter on the listener.
With O_NONBLOCK already set, we don't have to set it at
->event_step_init
|
|
It may be possible for updates or changes to be uncommitted
until disconnect, so we'll use flock() as we do elsewhere
to avoid the polling retry behavior of SQLite.
We also need to clear CachedKids before disconnecting to
to avoid warnings like:
->disconnect invalidates 1 active statement handle
(either destroy statement handles or call finish on
them before disconnecting)
|
|
We don't need to send the temporary xsearch {git} object over to
workers, just the directory name.
|
|
We use $smsg->populate here, so ensure it's loaded although
PublicInbox::Search currently loads it.
|
|
While it's loaded by ContentHash, we use Digest::SHA directly in
this package for smsg and OID-only deduplication.
|
|
This prevents SharedKV->DESTROY in lei-daemon from triggering
before DB handles are closed in lei2mail processes. The
{each_smsg_not_done} pipe was not sufficient in this case:
that gets closed at the end of the last git_to_mail callback
invocation.
|
|
This may be needed to avoid warnings/errors when
operating in single process mode in the future.
|
|
It doesn't seem necessary now that we redirect and write
stuff to errors.log, which gets checked every run.
|
|
ETOOMANYREFS is probably a unfamiliar error to most users, so
give a hint about RLIMIT_NOFILE. This can be hit on my system
running 3 simultaneous queries with my system default limit of
1024.
There's also no need to import Errno constants for uncommon
errors, so we'll stop using Errno, here.
We'll also try to bump RLIMIT_NOFILE as much as possible
to avoid this error.
|
|
It doesn't save us any code, and the action-at-a-distance
element was making it confusing to track down actual problems.
Another potential problem was keeping references alive too long.
So do like we would a C100K server and check every write
while still ensuring lei(1) exit with a proper SIGPIPE
iff needed.
|