about summary refs log tree commit homepage
path: root/lib/PublicInbox/IPC.pm
DateCommit message (Collapse)
2021-02-07treewide: replace confess with croak
The PublicInbox::Eml (and previously Email::MIME) use of confess was the primary (or only) culprit behind the lei2mail segfaults fixed by commit 0795b0906cc81f40. ("ds: guard against stack-not-refcounted quirk of Perl 5"). We never care about a backtrace when dealing with Eml objects anyways, so it was just a worthless waste of CPU cycles. We can also drop confess in a few other places. Since we only use Perl and Inline::C, users will never be without source and can replace s/croak/Carp::confess/ on a per-callsite basis to help report problems. It's also possible to use PERL5OPT=-MCarp=verbose in the environment though still potentially risky. Link: https://public-inbox.org/meta/20210201082833.3293-1-e@80x24.org/
2021-02-05lei import: initial implementation
Only tested with .eml files so far, but Maildir + IMAP will be supported.
2021-02-05lei q: reorder internals to reduce FD passing
While FD passing is critical for script/lei <=> lei-daemon, lei-daemon doesn't need to use it internally if FDs are created in the proper order before forking.
2021-02-05ipc: localize fields assignment
We don't want circular references giving surprising behavior during worker exit.
2021-02-03lei q: support --jobs [SEARCHERS],[WRITERS]
This comma-delimited parameter allows controlling the number or lei_xsearch and lei2mail worker processes. With the change to make IPC wq_* work use the event loop, it's now safe to run fewer worker processes for searching with no risk of deadlocks. MAX_PER_HOST isn't configurable yet for remote hosts, and maybe it shouldn't be due to potential for abuse.
2021-02-03lei q: emit progress and counting via PktOp
Sometimes it can be confusing for "lei q" to finish writing to a Maildir|mbox and not know if it did anything. So show some per-external progress and stats. These can be disabled via the new --quiet/-q switch. We differ slightly from mairix(1) here, as we use stderr instead of stdout for reporting totals (and we support parallel queries from various sources).
2021-02-01lei: keep $lei around until workers are reaped
This prevents SharedKV->DESTROY in lei-daemon from triggering before DB handles are closed in lei2mail processes. The {each_smsg_not_done} pipe was not sufficient in this case: that gets closed at the end of the last git_to_mail callback invocation.
2021-02-01ipc: more helpful ETOOMANYREFS error messages
ETOOMANYREFS is probably a unfamiliar error to most users, so give a hint about RLIMIT_NOFILE. This can be hit on my system running 3 simultaneous queries with my system default limit of 1024. There's also no need to import Errno constants for uncommon errors, so we'll stop using Errno, here. We'll also try to bump RLIMIT_NOFILE as much as possible to avoid this error.
2021-02-01lei: remove SIGPIPE handler
It doesn't save us any code, and the action-at-a-distance element was making it confusing to track down actual problems. Another potential problem was keeping references alive too long. So do like we would a C100K server and check every write while still ensuring lei(1) exit with a proper SIGPIPE iff needed.
2021-02-01ipc: switch wq to use the event loop
This will let us to maximize the capability of our asynchronous git API. This lets us avoid relying on EOF to notify lei2mail workers; thus giving us the option of running fewer lei_xsearch worker processes in parallel than local sources. I tried using a synchronous git API; and even with libgit2 in the same process to avoid the IPC cost failed to match the throughput afforded by this change. This is because libgit2 is built (at least on Debian) with the SHA-1 collision code enabled and ubc_check stuff was dominating my profiles.
2021-01-30ipc: move on_destroy scope to inside the eval
It saves us a line of code
2021-01-30ipc: more consistent behavior between worker types
Localize signals inside the respective worker loops in case there's circular references. We'll also rely on OnDestroy to trigger exits from the ipc_worker_loop like we do with wq_worker_loop. And also add some more developer documentation to help future developers. The default signals remain different, for now. Cleanup some unnecessary "use" statements while we're loading OnDestroy.
2021-01-30ipc: wq: support passing fields to workers
This will be useful for pre-sharing certain file handles.
2021-01-24ipc: get rid of wq_set_recv_modes
Just open every FD as read/write. Perl (or any non-broken runtime) won't care and won't attempt to use F_SETFL to alter file description flags; as attempting to change those would lead to unpleasant side effects if the file description is shared with another process.
2021-01-24ipc: wq supports arbitrarily large payloads
This should not be needed, but somebody using lei could theoretically create thousands of external URLs and only have a handful of workers, which means the per-worker URI list could be large.
2021-01-24treewide: reseed RNG in child processes
This prevents name conflicts leading to retries and slowdowns in temporary file name generation. No actual data corruption resulted because all temporary files are opened with O_EXCL anyways. This may increase security for IMAP, NNTP, and HTTPS sessions using TLS, but it's all public data anyways.
2021-01-22lei: remove INT/QUIT/TERM handlers, fix daemon EOF
The signal handlers on the client side were unnecessary, all we need is to handle socket EOF properly in the daemon by killing xsearch and l2m workers.
2021-01-22lei: fix inadvertant FD sharing
$wq->{-ipc_atfork_child_close} neededed to be initialized properly. And start setting $0 in workers to improve visibility.
2021-01-18lei q: parallelize Maildir and mbox writing
With 4 dedicated workers, this seems to provide a 100-120% speedup on a 4 core machine when writing thousands of search results to a Maildir or mbox. This also sets us up for high-latency IMAP destinations in the future. This opens the door to more speedup opportunities such as optimizing dedupe locking and other ways to reduce contention. This change is fairly complex and convoluted, unfortunately. Further work may allow us to simplify it and even improve performance.
2021-01-18ipc: children don't kill on DESTROY, reduce FD sharing
Children should not be blindly killing siblings on ->DESTROY since they're typically shorter-lived than parents. We'll also be more careful about on-stack variables and now we can rely exclusively on delete ops to close FDs. We also need to fix our SIGPIPE handling for the oneshot case while fixing a typo for delete, so we write "!" to the EOF pipe to ensure the parent oneshot process exits on the first worker that hits SIGPIPE, rather than waiting for the last worker to hit SIGPIPE.
2021-01-14lei: test SIGPIPE, stop xsearch workers on client abort
The new test ensures consistency between oneshot and client/daemon users. Cancelling an in-progress result now also stops xsearch workers to avoid wasted CPU and I/O. Note the lei->atfork_child_wq usage changes, it is to workaround a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> This switches the internal protocol to use SOCK_SEQPACKET AF_UNIX sockets to prevent merging messages from the daemon to client to run pager and kill/exit the client script.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12lei: fork + FD cleanup
Do a better job of closing FDs that we don't want shared with the work queue workers. We'll also fix naming and use "atfork_prepare" instead of "atfork_parent" to match pthread_atfork(3) naming.
2021-01-12ipc: drop unused fields, default sighandlers for wq
Relying on signal handlers to kill a particular worker was a laggy/racy idea and I gave up on the idea of targetting workers explicitly and instead chose to make wq_worker_decr stop the next idle worker ->wq_exit. We will however attempt to support sending signals to a process group.
2021-01-12ipc: fix IO::FDPass use with a worker limit of 1
IO::FDPass is our last choice for implementing the workqueue because its lack of atomicity makes it impossible to guarantee all requests of a single group hit a single worker out of many. So the only way to use IO::FDPass for workqueues it to only have a single worker. A single worker still buys us a small amount of parallelism because of the parent process.
2021-01-12ipc: start supporting sending/receiving more than 3 FDs
Actually, sending 4 FDs will be useful for lei internal xsearch work once we start accepting input from stdin. It won't be used with the lightweight lei(1) client, however. For WWW (eventually), a single FD may be enough.
2021-01-12lei: query: ensure pager exit is instantaneous
Improve interactivity and user experience by allowing the user to return to the terminal immediately when the pager is exited (e.g. hitting the `q' key in less(1)). This is a massive change which restructures query handling to allow parallel search when --thread expansion is in use and offloading to a separate worker when --thread is not in use. The Xapian query offload changes allow us to reenter the event loop right away once the search(es) are shipped off to the work queue workers. This means the main lei-daemon process can forget the lei(1) client socket immediately once it's handed off to worker processes. We now unblock SIGPIPE in query workers and send an exit(141) response to the lei(1) client socket to denote SIGPIPE. This also allows parallelization for users using "lei q" from multiple terminals. JSON output is currently broken and will need to be restructured for more flexibility and fork-safety.
2021-01-12ipc: DESTROY and wq_workers methods
We'll enable automatic cleanup when IPC classes go out-of-scope to avoid leaving zombies around. ->wq_workers will be a useful convenience method to change worker counts.
2021-01-12ipc: drop -ipc_parent_pid field
It is not used anywhere.
2021-01-12ipc: wq: support dynamic worker count change
Increasing/decreasing workers count will be useful in some situations.
2021-01-12ipc: eliminate ipc_worker_stop method
We can just EOF the pipe, and instead rely on per-class error handling to deal with uncommitted transactions and what not.
2021-01-12ipc: work queue support via SOCK_SEQPACKET
This will allow any number of younger sibling processes to communicate with older siblings directly without relying on a mediator process. This is intended to be useful for distributing search work across multiple workers without caring which worker hits it (we only care about shard members). And any request sent with this will be able to hit any worker without locking on our part. Unix stream sockets with a listener were also considered; binding to a file on the FS may confuse users given there's already a socket path for lei(1). Linux-only Abstract or autobind sockets are rejected due to lack of portability. SOCK_SEQPACKET via socketpair(2) was chosen since it's POSIX 2008 and available on FreeBSD 9+ in addition to Linux, and doesn't require filesystem access.
2021-01-12ipc: avoid excessive evals
We should not need an eval for warning with our code base. Nowadays, dwaitpid() automatically does the right thing regardless of whether we're in the event loop, so no eval is needed there, either.
2021-01-12ipc: add support for asynchronous callbacks
Similar to git->cat_async, this will let us deal with responses asynchronously, as well as being able to mix synchronous and asynchronous code transparently (though perhaps not optimally).
2021-01-12ds: block signals when reaping
This lets us call dwaitpid long before a process exits and not have to wait around for it. This is advantageous for lei where we can run dwaitpid on the pager as soon as we spawn it, instead of waiting for a client socket to go away on DESTROY.
2021-01-03ipc: switch to one-way pipes
This fixes a performance regression in multi-process v2 indexing due to the switch to PublicInbox::IPC. While Unix sockets are fewer FDs to manage, pipes allow unprivileged processes to use larger buffers (up to 1M) on out-of-the-box Linux instances. A larger buffer via F_SETPIPE_SZ afforded by pipes was proven valuable during v2 development in 2018 and continues to be valuable when we get significant amounts of one-way traffic from the producer parent to worker children. Compression may be an option for systems without F_SETPIPE_SZ; but it increases CPU usage with no memory bandwidth savings on hosts where larger buffers are available.
2021-01-03searchidxshard: use PublicInbox::IPC to kill lots of code
It's nice to prove the new code works by swapping it into the current V2Writable / SearchIdxShard packages. This is only the first step for the core bits, and we'll be able to delete more code in a subsequent patch.
2021-01-03ipc: some documentation comments
Fix some comments and add some short summary descriptions to hopefully make things easier-to-follow.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01ipc: use shutdown(2), base atfork* callback
shutdown(2) on a socket can be preferable if there's multiple forked processes writing to a single worker and we really want to shut things down ASAP. It may also be good to provide an ipc_worker_exit method which subclasses can override if needed for graceful shutdown. But we won't need equivalents to atexit(3) since we can rely on DESTROY handlers given this is Perl5.
2021-01-01ipc: support Sereal
Some testing will be needed to see if it's worth the code and maintenance overhead, but it seems easy-enough to get working.
2021-01-01ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple slow sources at once (e.g. curl, IMAP, etc). This is because over.sqlite3 can only have a single writer, and we'll have several slow readers running in parallel. Watch and SearchIdxShard should also be able to use this code in the future, but this will be proven with LeiStore, first.