about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiImportKw.pm
DateCommit message (Collapse)
2021-10-15lei + ipc: simplify process reaping
Simplify our APIs and force dwaitpid() to work in async mode for all lei workers. This avoids having lingering zombies for parallel searches if one worker finishes soon before another. The old distinction between "old" and "new" workers was needlessly complex, error-prone, and embarrasingly bad. We also never handled v2:// writers properly before on Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS to ensure they get handled by $lei when appropropriate.
2021-10-13lei: use standard warn() in more places
warn() is easier to augment with context information, and frankly unavoidable in the presence of 3rd-party libraries we don't control.
2021-09-22lei: drop redundant WQ EOF callbacks
Redundant code is noise and therefore confusing :<
2021-09-21lei_mail_sync: account for non-unique cases
NNTP servers, IMAP servers, and various MUAs may recycle "unique" identifiers due to software bugs or careless BOFHs. Warn about them, but always be prepared to account for them.
2021-09-19lei: clamp internal worker processes to 4
"All" my CPUs is only 4, but it's probably ridiculous for somebody with a 16-core system to have 16 processes for accessing SQLite DBs. We do the same thing in Pmdir for parallel Maildir access (and V2Writable).
2021-09-19lei/store: use SOCK_SEQPACKET rather than pipe
This has several advantages: * no need to use ipc.lock to protect a pipe for non-atomic writes * ability to pass FDs. In another commit, this will let us simplify lei->sto_done_request and pass newly-created sockets to lei/store directly. disadvantages: - an extra pipe is required for rare messages over several hundred KB, this is probably a non-issue, though The performance delta is unknown, but I expect shards (which remain pipes) to be the primary bottleneck IPC-wise for lei/store.
2021-08-24lei: non-blocking lei/store->done in lei-daemon
This allows client sockets to wait for "done" commits to lei/store while the daemon reacts asynchronously. The goal of this change is to keep the script/lei client alive until lei/store commits changes to the filesystem, but without blocking the lei-daemon event loop. It depends on Perl refcounting to close the socket. This change also highlighted our over-use of "done" requests to lei/store processes, which is now corrected so we only issue it on collective socket EOF rather than upon reaping every single worker. This also fixes "lei forget-mail-sync" when it is the initial command. This took several iterations and much debugging to arrive at the current implementation: 1. The initial iteration of this change utilized socket passing from lei-daemon to lei/store, which necessitated switching from faster pipes to slower Unix sockets. 2. The second iteration switched to registering notification sockets independently of "done" requests, but that could lead to early wakeups when "done" was requested by other workers. This appeared to work most of the time, but suffered races under high load which were difficult to track down. Finally, this iteration passes the stringified socket GLOB ref to lei/store which is echoed back to lei-daemon upon completion of that particular "done" request.
2021-06-03lei import: speed up kw updates for old IMAP messages
On a 4-core CPU, this speeds up "lei import" on a largish IMAP inbox with 75K messages from ~21 minutes down to 40s. Parallelizing with the new LeiImportKw WQ worker class gives a near-linear speedup and brought the runtime down to ~5:40. The new idx_fid_uid index on the "fid" and "uid" columns of blob2num in mail_sync.sqlite3 brought us the final speedup. An additional index on over.sqlite3#xref3(oidbin) did not help, since idx_nntp already exists and speeds up the new ->oidbin_exists internal API. I initially experimented with a separate "lei import-kw" command but decided against it since it's useless outside of IMAP+JMAP and would require extra cognitive overhead for both users and hackers. So LeiImportKw is just a WQ worker used by "lei import" and not its own user-visible command. v2: fix ikw_done_wait arg handling (ugh, confusing API :x)