about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiImportKw.pm
DateCommit message (Collapse)
2023-12-30lei: support reading MH for convert+import+index
The MH format is widely-supported and used by various MUAs such as mutt and sylpheed, and a MH-like format is used by mlmmj for archives, as well. Locking implementations for writes are inconsistent, so this commit doesn't support writes, yet. inotify|EVFILT_VNODE watches aren't supported, yet, but that'll have to come since MH allows packing unused integers and renaming files.
2023-06-09add compat package for List::Util::uniqstr
This will make it easier to switch in the far future while making callers easier-to-read (and more callers will be added). Anyways, Perl 5.26 is a long time away for enterprise users; but isolating compatibility code away can improve readability of code we actually care about in the meantime.
2022-04-22lei: commit store on interrupted partial imports
This change prevents lingering shard and git-fast-import processes from remaining after interrupted "lei import" (and similar). It also reduces the likelyhood of data-loss in case of subsequent abnormal termination of the daemon. I think this is the least surprising way to handle users prematurely aborting imports or other similar operations which write to lei/store and will result in reduced bandwidth waste for users with intermittent connections. This is because the lei/store processes may be shared by parallel "lei import" callers, and commits done by any "lei import" caller will inevitably trigger writes for all of them.
2022-04-05lei: always open mail_sync.sqlite3 R/W
This will make transparently upgrading from 1.7.0 -> 1.8.x easier. Only a single user has access to mail_sync.sqlite3, and R/W at the kernel-level is required for WAL, anyways.
2021-10-15lei + ipc: simplify process reaping
Simplify our APIs and force dwaitpid() to work in async mode for all lei workers. This avoids having lingering zombies for parallel searches if one worker finishes soon before another. The old distinction between "old" and "new" workers was needlessly complex, error-prone, and embarrasingly bad. We also never handled v2:// writers properly before on Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS to ensure they get handled by $lei when appropropriate.
2021-10-13lei: use standard warn() in more places
warn() is easier to augment with context information, and frankly unavoidable in the presence of 3rd-party libraries we don't control.
2021-09-22lei: drop redundant WQ EOF callbacks
Redundant code is noise and therefore confusing :<
2021-09-21lei_mail_sync: account for non-unique cases
NNTP servers, IMAP servers, and various MUAs may recycle "unique" identifiers due to software bugs or careless BOFHs. Warn about them, but always be prepared to account for them.
2021-09-19lei: clamp internal worker processes to 4
"All" my CPUs is only 4, but it's probably ridiculous for somebody with a 16-core system to have 16 processes for accessing SQLite DBs. We do the same thing in Pmdir for parallel Maildir access (and V2Writable).
2021-09-19lei/store: use SOCK_SEQPACKET rather than pipe
This has several advantages: * no need to use ipc.lock to protect a pipe for non-atomic writes * ability to pass FDs. In another commit, this will let us simplify lei->sto_done_request and pass newly-created sockets to lei/store directly. disadvantages: - an extra pipe is required for rare messages over several hundred KB, this is probably a non-issue, though The performance delta is unknown, but I expect shards (which remain pipes) to be the primary bottleneck IPC-wise for lei/store.
2021-08-24lei: non-blocking lei/store->done in lei-daemon
This allows client sockets to wait for "done" commits to lei/store while the daemon reacts asynchronously. The goal of this change is to keep the script/lei client alive until lei/store commits changes to the filesystem, but without blocking the lei-daemon event loop. It depends on Perl refcounting to close the socket. This change also highlighted our over-use of "done" requests to lei/store processes, which is now corrected so we only issue it on collective socket EOF rather than upon reaping every single worker. This also fixes "lei forget-mail-sync" when it is the initial command. This took several iterations and much debugging to arrive at the current implementation: 1. The initial iteration of this change utilized socket passing from lei-daemon to lei/store, which necessitated switching from faster pipes to slower Unix sockets. 2. The second iteration switched to registering notification sockets independently of "done" requests, but that could lead to early wakeups when "done" was requested by other workers. This appeared to work most of the time, but suffered races under high load which were difficult to track down. Finally, this iteration passes the stringified socket GLOB ref to lei/store which is echoed back to lei-daemon upon completion of that particular "done" request.
2021-06-03lei import: speed up kw updates for old IMAP messages
On a 4-core CPU, this speeds up "lei import" on a largish IMAP inbox with 75K messages from ~21 minutes down to 40s. Parallelizing with the new LeiImportKw WQ worker class gives a near-linear speedup and brought the runtime down to ~5:40. The new idx_fid_uid index on the "fid" and "uid" columns of blob2num in mail_sync.sqlite3 brought us the final speedup. An additional index on over.sqlite3#xref3(oidbin) did not help, since idx_nntp already exists and speeds up the new ->oidbin_exists internal API. I initially experimented with a separate "lei import-kw" command but decided against it since it's useless outside of IMAP+JMAP and would require extra cognitive overhead for both users and hackers. So LeiImportKw is just a WQ worker used by "lei import" and not its own user-visible command. v2: fix ikw_done_wait arg handling (ugh, confusing API :x)