Date | Commit message (Collapse) |
|
The PublicInbox::Eml (and previously Email::MIME) use of confess
was the primary (or only) culprit behind the lei2mail segfaults
fixed by commit 0795b0906cc81f40.
("ds: guard against stack-not-refcounted quirk of Perl 5").
We never care about a backtrace when dealing with Eml objects
anyways, so it was just a worthless waste of CPU cycles.
We can also drop confess in a few other places. Since we only
use Perl and Inline::C, users will never be without source
and can replace s/croak/Carp::confess/ on a per-callsite basis
to help report problems.
It's also possible to use PERL5OPT=-MCarp=verbose in the
environment though still potentially risky.
Link: https://public-inbox.org/meta/20210201082833.3293-1-e@80x24.org/
|
|
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
|
|
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
|
|
We don't want circular references giving surprising behavior
during worker exit.
|
|
This comma-delimited parameter allows controlling the number or
lei_xsearch and lei2mail worker processes. With the change
to make IPC wq_* work use the event loop, it's now safe to
run fewer worker processes for searching with no risk of
deadlocks.
MAX_PER_HOST isn't configurable yet for remote hosts,
and maybe it shouldn't be due to potential for abuse.
|
|
Sometimes it can be confusing for "lei q" to finish writing to a
Maildir|mbox and not know if it did anything. So show some
per-external progress and stats.
These can be disabled via the new --quiet/-q switch.
We differ slightly from mairix(1) here, as we use stderr
instead of stdout for reporting totals (and we support
parallel queries from various sources).
|
|
This prevents SharedKV->DESTROY in lei-daemon from triggering
before DB handles are closed in lei2mail processes. The
{each_smsg_not_done} pipe was not sufficient in this case:
that gets closed at the end of the last git_to_mail callback
invocation.
|
|
ETOOMANYREFS is probably a unfamiliar error to most users, so
give a hint about RLIMIT_NOFILE. This can be hit on my system
running 3 simultaneous queries with my system default limit of
1024.
There's also no need to import Errno constants for uncommon
errors, so we'll stop using Errno, here.
We'll also try to bump RLIMIT_NOFILE as much as possible
to avoid this error.
|
|
It doesn't save us any code, and the action-at-a-distance
element was making it confusing to track down actual problems.
Another potential problem was keeping references alive too long.
So do like we would a C100K server and check every write
while still ensuring lei(1) exit with a proper SIGPIPE
iff needed.
|
|
This will let us to maximize the capability of our asynchronous
git API. This lets us avoid relying on EOF to notify lei2mail
workers; thus giving us the option of running fewer lei_xsearch
worker processes in parallel than local sources.
I tried using a synchronous git API; and even with libgit2 in
the same process to avoid the IPC cost failed to match the
throughput afforded by this change. This is because libgit2 is
built (at least on Debian) with the SHA-1 collision code enabled
and ubc_check stuff was dominating my profiles.
|
|
It saves us a line of code
|
|
Localize signals inside the respective worker loops
in case there's circular references.
We'll also rely on OnDestroy to trigger exits from the
ipc_worker_loop like we do with wq_worker_loop. And
also add some more developer documentation to help future
developers.
The default signals remain different, for now.
Cleanup some unnecessary "use" statements while we're
loading OnDestroy.
|
|
This will be useful for pre-sharing certain file handles.
|
|
Just open every FD as read/write. Perl (or any non-broken
runtime) won't care and won't attempt to use F_SETFL to alter
file description flags; as attempting to change those would
lead to unpleasant side effects if the file description is
shared with another process.
|
|
This should not be needed, but somebody using lei could
theoretically create thousands of external URLs and
only have a handful of workers, which means the per-worker
URI list could be large.
|
|
This prevents name conflicts leading to retries and slowdowns in
temporary file name generation. No actual data corruption
resulted because all temporary files are opened with O_EXCL
anyways.
This may increase security for IMAP, NNTP, and HTTPS sessions
using TLS, but it's all public data anyways.
|
|
The signal handlers on the client side were unnecessary,
all we need is to handle socket EOF properly in the daemon
by killing xsearch and l2m workers.
|
|
$wq->{-ipc_atfork_child_close} neededed to be initialized properly.
And start setting $0 in workers to improve visibility.
|
|
With 4 dedicated workers, this seems to provide a 100-120%
speedup on a 4 core machine when writing thousands of search
results to a Maildir or mbox. This also sets us up for
high-latency IMAP destinations in the future.
This opens the door to more speedup opportunities such
as optimizing dedupe locking and other ways to reduce
contention.
This change is fairly complex and convoluted, unfortunately.
Further work may allow us to simplify it and even improve
performance.
|
|
Children should not be blindly killing siblings on ->DESTROY
since they're typically shorter-lived than parents. We'll
also be more careful about on-stack variables and now we
can rely exclusively on delete ops to close FDs.
We also need to fix our SIGPIPE handling for the oneshot case
while fixing a typo for delete, so we write "!" to the EOF pipe
to ensure the parent oneshot process exits on the first worker
that hits SIGPIPE, rather than waiting for the last worker to
hit SIGPIPE.
|
|
The new test ensures consistency between oneshot and
client/daemon users. Cancelling an in-progress result now also
stops xsearch workers to avoid wasted CPU and I/O.
Note the lei->atfork_child_wq usage changes, it is to workaround
a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
This switches the internal protocol to use SOCK_SEQPACKET
AF_UNIX sockets to prevent merging messages from the daemon to
client to run pager and kill/exit the client script.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
Do a better job of closing FDs that we don't want shared with
the work queue workers. We'll also fix naming and use
"atfork_prepare" instead of "atfork_parent" to match
pthread_atfork(3) naming.
|
|
Relying on signal handlers to kill a particular worker was a
laggy/racy idea and I gave up on the idea of targetting workers
explicitly and instead chose to make wq_worker_decr stop the
next idle worker ->wq_exit.
We will however attempt to support sending signals to
a process group.
|
|
IO::FDPass is our last choice for implementing the workqueue
because its lack of atomicity makes it impossible to guarantee
all requests of a single group hit a single worker out of many.
So the only way to use IO::FDPass for workqueues it to only have
a single worker. A single worker still buys us a small amount
of parallelism because of the parent process.
|
|
Actually, sending 4 FDs will be useful for lei internal xsearch
work once we start accepting input from stdin. It won't be used
with the lightweight lei(1) client, however.
For WWW (eventually), a single FD may be enough.
|
|
Improve interactivity and user experience by allowing the user
to return to the terminal immediately when the pager is exited
(e.g. hitting the `q' key in less(1)).
This is a massive change which restructures query handling to
allow parallel search when --thread expansion is in use and
offloading to a separate worker when --thread is not in use.
The Xapian query offload changes allow us to reenter the event
loop right away once the search(es) are shipped off to the work
queue workers.
This means the main lei-daemon process can forget the lei(1)
client socket immediately once it's handed off to worker
processes.
We now unblock SIGPIPE in query workers and send an exit(141)
response to the lei(1) client socket to denote SIGPIPE.
This also allows parallelization for users using "lei q" from
multiple terminals.
JSON output is currently broken and will need to be restructured
for more flexibility and fork-safety.
|
|
We'll enable automatic cleanup when IPC classes go out-of-scope
to avoid leaving zombies around.
->wq_workers will be a useful convenience method to change
worker counts.
|
|
It is not used anywhere.
|
|
Increasing/decreasing workers count will be useful in
some situations.
|
|
We can just EOF the pipe, and instead rely on per-class
error handling to deal with uncommitted transactions and
what not.
|
|
This will allow any number of younger sibling processes to
communicate with older siblings directly without relying on a
mediator process. This is intended to be useful for
distributing search work across multiple workers without caring
which worker hits it (we only care about shard members).
And any request sent with this will be able to hit any worker
without locking on our part.
Unix stream sockets with a listener were also considered;
binding to a file on the FS may confuse users given there's
already a socket path for lei(1). Linux-only Abstract or
autobind sockets are rejected due to lack of portability.
SOCK_SEQPACKET via socketpair(2) was chosen since it's POSIX
2008 and available on FreeBSD 9+ in addition to Linux, and
doesn't require filesystem access.
|
|
We should not need an eval for warning with our code base.
Nowadays, dwaitpid() automatically does the right thing
regardless of whether we're in the event loop, so no eval
is needed there, either.
|
|
Similar to git->cat_async, this will let us deal with responses
asynchronously, as well as being able to mix synchronous and
asynchronous code transparently (though perhaps not optimally).
|
|
This lets us call dwaitpid long before a process exits
and not have to wait around for it.
This is advantageous for lei where we can run dwaitpid on the
pager as soon as we spawn it, instead of waiting for a client
socket to go away on DESTROY.
|
|
This fixes a performance regression in multi-process v2 indexing
due to the switch to PublicInbox::IPC. While Unix sockets are
fewer FDs to manage, pipes allow unprivileged processes to use
larger buffers (up to 1M) on out-of-the-box Linux instances.
A larger buffer via F_SETPIPE_SZ afforded by pipes was proven
valuable during v2 development in 2018 and continues to be
valuable when we get significant amounts of one-way traffic from
the producer parent to worker children.
Compression may be an option for systems without F_SETPIPE_SZ;
but it increases CPU usage with no memory bandwidth savings on
hosts where larger buffers are available.
|
|
It's nice to prove the new code works by swapping it into
the current V2Writable / SearchIdxShard packages. This is
only the first step for the core bits, and we'll be able
to delete more code in a subsequent patch.
|
|
Fix some comments and add some short summary descriptions to
hopefully make things easier-to-follow.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
shutdown(2) on a socket can be preferable if there's multiple
forked processes writing to a single worker and we really want
to shut things down ASAP.
It may also be good to provide an ipc_worker_exit method which
subclasses can override if needed for graceful shutdown. But we
won't need equivalents to atexit(3) since we can rely on DESTROY
handlers given this is Perl5.
|
|
Some testing will be needed to see if it's worth the code
and maintenance overhead, but it seems easy-enough to get
working.
|
|
I intend to use this with LeiStore when importing from multiple
slow sources at once (e.g. curl, IMAP, etc). This is because
over.sqlite3 can only have a single writer, and we'll have
several slow readers running in parallel.
Watch and SearchIdxShard should also be able to use this code
in the future, but this will be proven with LeiStore, first.
|