about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-01-03ipc: switch to one-way pipes
This fixes a performance regression in multi-process v2 indexing due to the switch to PublicInbox::IPC. While Unix sockets are fewer FDs to manage, pipes allow unprivileged processes to use larger buffers (up to 1M) on out-of-the-box Linux instances. A larger buffer via F_SETPIPE_SZ afforded by pipes was proven valuable during v2 development in 2018 and continues to be valuable when we get significant amounts of one-way traffic from the producer parent to worker children. Compression may be an option for systems without F_SETPIPE_SZ; but it increases CPU usage with no memory bandwidth savings on hosts where larger buffers are available.
2021-01-03use Eml (or MIME) objects for all indexing paths
We don't need to be keeping the raw message around after it hits git. Shard work now relies on Storable (or Sereal) and all of the indexing code relies on the Email::MIME-like API of Eml to access interesting parts of the message. Similarly, smsg->{raw_bytes} is no longer carried around and we do the CRLF adjustment when setting smsg->{bytes}. There's also a small simplification to t/import.t while we're in the area to use xqx instead of spawn/popen_rd.
2021-01-03searchidxshard: replace index_raw with index_eml
Since Storable and Sereal are designed for lossless serialization, we'll just pass $eml objects to whatever process is running SearchIdx.
2021-01-03searchidxshard: IPC conversion, part 2
We can remove some now-pointless wrapper functions by using ->ipc_do in even more places.
2021-01-03searchidxshard: use PublicInbox::IPC to kill lots of code
It's nice to prove the new code works by swapping it into the current V2Writable / SearchIdxShard packages. This is only the first step for the core bits, and we'll be able to delete more code in a subsequent patch.
2021-01-03ipc: some documentation comments
Fix some comments and add some short summary descriptions to hopefully make things easier-to-follow.
2021-01-03gcf2client: split out request API from regular git
While Gcf2Client is designed to mimic what git-cat-file writes to stdout, its request format is different to support requests with a git repository path included. We'll highlight the distinction and make the GitAsyncCat support code easier-to-follow as a result. Since Gcf2Client relies on DS, we can rely on DS-specific code here, too, and use a single Unix socket instead of separate input and output pipes, reducing memory overhead in both users and kernel space. Due to the interactive nature of requests and responses, the buffer size limitations of Unix sockets on Linux seems inconsequential here (just like it is for existing "git cat-file --batch" use).
2021-01-03lei: fix output race in client/daemon mode
The daemon needs to flush stdout before disconnecting or killing clients, otherwise they may reread empty data on redirected outputs. We also don't want to unbuffer stdout too early in case we have lots of small chunks of data to output. The received ($self->{2}) will always have autoflush, matching normal STDERR behavior.
2021-01-03send and receive all 3 FDs at once
We'll always be transferring stdin, stdout, and stderr together for lei. Perhaps I lack imagination or foresight, but I can't think of a reason to send more or less FDs.
2021-01-03spawn: support send_fd+recv_fd w/o IO::FDPass
IO::FDPass may be an extra installation burden I don't want to impose on users. We only support Linux and *BSDs, however.
2021-01-03testcommon: prepare_redirects: fix error message
I never hit these die() calls, but noticed it while debugging another problem on FreeBSD.
2021-01-02qspawn: switch to ProcessPipe via popen_rd
ProcessPipe has a built-in mechanism to prevent siblings from reaping children.
2021-01-02git: manifest_entry: use ProcessPipe via popen_rd
Only saves us one line of code, but that's better than nothing.
2021-01-02import: switch to using ProcessPipe
This saves us a few lines of code, but also prevents misreaping by sibling processes.
2021-01-02git: qx: waitpid synchronously via ProcessPipe->CLOSE
If we're using ->qx, we're operating synchronously anyways, so there's little point in relying on the event loop for waitpid.
2021-01-02processpipe: lazy-require PublicInbox::DS for dwaitpid
This saves over 20ms with scripts that only use PublicInbox::Spawn.
2021-01-02processpipe: allow synchronous close to set $?
To get rid of the ugly $PublicInbox::DS::in_loop localization in MboxReader, we'll distinguish between ->CLOSE and ->DESTROY with ProcessPipe. If we end up closing via ->DESTROY, we'll assume the caller will want to deal with $? asynchronously via the event loop (or not even care about $?). If we hit ->CLOSE directly, we'll assume the caller called close() and wants to check $? synchronously. Note: wantarray doesn't seem to propagate into tied methods, otherwise I'd be relying on that.
2021-01-02lei_store: alternative unconfigured "git var" workaround
While the changes to git->qx/git->popen from commit 171a9c24022ad7ef will be useful for the lei daemon, hiding git error messages from actual users is probably wrong and we'll just localize GIT_* vars for testing.
2021-01-02treewide: reduce load_xapian* callsites
Hopefully this will make it easier to spot dependency bugs in the future.
2021-01-02import: unset GIT_CONFIG with `git config --global'
GIT_CONFIG is set by -convert, and user may have it set for other reasons. In either case, it conflicts with any any attempt to use `git config --global` so we have to unset it. This fixes t/multi-mid.t under TEST_RUN_MODE=0
2021-01-02search: do not use $QP_FLAGS until Xapian is loaded
The default $QP_FLAGS won't be set until after Xapian is loaded, duh... This fixes t/imapd.t with TEST_RUN_MODE=0
2021-01-01lei_store: quiet down "git var" failures
$git->qx and $git->popen now $env and $opt for redirects like lower-level popen_rd. This may be beneficial in other places.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01on_destroy: support PID owner guard
Since we'll be forking for Xapian indexing and maybe other places, having a simple guard in place to ensure OnDestroy doesn't unexpectedly unlink files or similar is a safer option.
2021-01-01ds: clobber $in_loop first at reset
This may help ensure DESTROY callbacks will see in_loop before the others.
2021-01-01avoid calling waitpid from children in DESTROY
Objects with DESTROY callbacks get propagated to children, so we must be careful to not invoke waitpid from children on their sibling processes. Only parents (and their parents...) can reap child processes.
2021-01-01lei: avoid Spawn package when starting daemon
Spawn was designed to speed up process spawning inside long-lived daemons with largish memory usage. It does not help for short-lived scripts which only exist to start and connect to a daemon. This change actually speeds up initial lei startup from ~190ms to ~140ms(!). Normal usage once the daemon is running is unaffected, at <20ms for help text. While we're in the area, simplify Cwd error message generation, too.
2021-01-01syscall: SFD_NONBLOCK can be a constant, again
Since Perl exposes O_NONBLOCK as a constant, we can safely make SFD_NONBLOCK a constant, too. This is not the case for SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite being used internally in the interpreter.
2021-01-01use PublicInbox::DS for dwaitpid
This simplifies our code and provides a more consistent API for error handling. PublicInbox::DS can be loaded nowadays on all *BSDs and Linux distros easily without extra packages to install. The downside is possibly increased startup time, but it's probably not as a big problem with lei being a daemon (and -mda possibly following suite).
2021-01-01searchidxshard: call DS->Reset at worker start
The daemon for the local email interface will be inside the DS->EventLoop. -watch currently doesn't trigger this bug since it doesn't enable parallelism, but it may in the future.
2021-01-01lei_to_mail: open FIFOs O_WRONLY so we block
Opening a FIFO with O_RDWR always succeeds on Linux, which cause the cat(1) process invoked by t/lei_to_mail.t to get stuck. Furthermore O_APPEND makes no sense on FIFOs and perhaps there's some kernel out there which will reject it.
2021-01-01gcf2client: reap process on DESTROY
We don't want to leave Xapcmd waitpid(-1, ...) call to hit it.
2021-01-01spawn: move run_die here from PublicInbox::Import
It seems like a more logical place for it, but we'll favor the newly-added xsys_e() in tests for BAIL_OUT use.
2021-01-01lei: add --mfolder as an --output alias
This will be helpful for mairix users.
2021-01-01lei_to_mail: unlink mboxes if not augmenting
This matches mairix(1) behavior and may be safer if there's concurrent readers on the existing mbox, especially since we don't do currently implement mbox locking (nor does mairix).
2021-01-01ipc: use shutdown(2), base atfork* callback
shutdown(2) on a socket can be preferable if there's multiple forked processes writing to a single worker and we really want to shut things down ASAP. It may also be good to provide an ipc_worker_exit method which subclasses can override if needed for graceful shutdown. But we won't need equivalents to atexit(3) since we can rely on DESTROY handlers given this is Perl5.
2021-01-01lei_store: handle messages without Message-ID at all
For personal mail, unsent drafts messages are a common source of messages without Message-IDs.
2021-01-01mid: hoist out mids_in sub
We'll be using it for Resent-Message-ID with lei, and possibly other places.
2021-01-01mid: use defined-or with `push' for uniqueness check
As shown recently in commit a05445fb400108e60ede7d377cf3b26a0392eb24 ("config: config_fh_parse: micro-optimize"), the relying on the return value of `push' and defined-or operators can avoid modifying a the hash value scalar with an increment.
2021-01-01lei: rename "extinbox" => "external"
The words "extinbox" and "extindex" are too close and easy to confuse with the other. Rename "extinbox" to "external", since these could be IMAP, JMAP or other non-public-inbox search APIs. Link: https://public-inbox.org/meta/20201226112649.GB6226@dcvr/
2021-01-01lei_store: add ->set_eml, ->add_eml can return smsg
Add a ->set_eml method which can be a useful fire-and-forget way of either adding new files to store OR setting keywords on them. When seeing brand-new messages, add_eml can afford to return more information in the smsg instead of just the OID.
2021-01-01ipc: support Sereal
Some testing will be needed to see if it's worth the code and maintenance overhead, but it seems easy-enough to get working.
2021-01-01ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple slow sources at once (e.g. curl, IMAP, etc). This is because over.sqlite3 can only have a single writer, and we'll have several slow readers running in parallel. Watch and SearchIdxShard should also be able to use this code in the future, but this will be proven with LeiStore, first.
2021-01-01lei_to_mail: support Maildir, fix+test --augment
Maildir should be plenty fine for short-lived output folders.
2021-01-01lei_to_mail: support for non-seekable outputs
Users may wish to pipe output to "git am", "spamc", or similar, so we need to support those cases and not bail out on lseek(2) or ftruncate(2) failures.
2021-01-01lei_to_mail: lazy-require LeiDedupe
LeiDedupe requires SQLite, so we may want to be able to test writing mail without DBI or SQLite down the line.
2021-01-01lei: implement various deduplication strategies
For writing mboxes and Maildirs, users may wish to use stricter or looser deduplication strategies. This gives them more control.
2021-01-01lei_to_mail: start --augment, dedupe, bz2 and xz
--augment will match the mairix(1) option of the same name to augment existing search results. We'll need to implement deduplication for a better user experience. mutt ships with compressed mbox support for bz2 and xz, at least, so we'll support those out-of-the-box.
2021-01-01mboxreader: new class for reading various mbox formats
This is only lightly-tested against stuff LeiToMail generates and will need real-world tests to validate.
2021-01-01lei_to_mail: start atomic and compressed mbox writing
We'll allow using multiple workers to write to a single mbox (which could be compressed). This is can be done safely with O_APPEND + syswrite for uncompressed files, and using a lock when piping to pigz/gzip/bzip2/xz.