about summary refs log tree commit homepage
path: root/t
DateCommit message (Collapse)
2021-01-14cmd_ipc: support + test EINTR + EAGAIN, no FDs
We'll ensure our {send,recv}_cmd4 implementations are consistent w.r.t. non-blocking and interrupted sockets. We'll also support receiving messages without FDs associated so we don't have to send dummy FDs to keep receivers from reporting EOF.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12ipc: fix IO::FDPass use with a worker limit of 1
IO::FDPass is our last choice for implementing the workqueue because its lack of atomicity makes it impossible to guarantee all requests of a single group hit a single worker out of many. So the only way to use IO::FDPass for workqueues it to only have a single worker. A single worker still buys us a small amount of parallelism because of the parent process.
2021-01-12ipc: start supporting sending/receiving more than 3 FDs
Actually, sending 4 FDs will be useful for lei internal xsearch work once we start accepting input from stdin. It won't be used with the lightweight lei(1) client, however. For WWW (eventually), a single FD may be enough.
2021-01-12ipc: DESTROY and wq_workers methods
We'll enable automatic cleanup when IPC classes go out-of-scope to avoid leaving zombies around. ->wq_workers will be a useful convenience method to change worker counts.
2021-01-12ipc: wq: support dynamic worker count change
Increasing/decreasing workers count will be useful in some situations.
2021-01-12ipc: work queue support via SOCK_SEQPACKET
This will allow any number of younger sibling processes to communicate with older siblings directly without relying on a mediator process. This is intended to be useful for distributing search work across multiple workers without caring which worker hits it (we only care about shard members). And any request sent with this will be able to hit any worker without locking on our part. Unix stream sockets with a listener were also considered; binding to a file on the FS may confuse users given there's already a socket path for lei(1). Linux-only Abstract or autobind sockets are rejected due to lack of portability. SOCK_SEQPACKET via socketpair(2) was chosen since it's POSIX 2008 and available on FreeBSD 9+ in addition to Linux, and doesn't require filesystem access.
2021-01-12cmd_ipc: send FDs with buffer payload
For another step in in syscall reduction, we'll support transferring 3 FDs and a buffer with a single sendmsg/recvmsg syscall using Socket::MsgHdr if available. Beyond script/lei itself, this will be used for internal IPC between search backends (perhaps with SOCK_SEQPACKET). There's a chance this could make it to the public-facing daemons, too. This adds an optional dependency on the Socket::MsgHdr package, available as libsocket-msghdr-perl on Debian-based distros (but not CentOS 7.x and FreeBSD 11.x, at least). Our Inline::C version in PublicInbox::Spawn remains the last choice for script/lei due to the high startup time, and IO::FDPass remains supported for non-Debian distros. Since the socket name prefix changes from 3 to 4, we'll also take this opportunity to make the argv+env buffer transfer less error-prone by relying on argc instead of designated delimiters.
2021-01-12ipc: add support for asynchronous callbacks
Similar to git->cat_async, this will let us deal with responses asynchronously, as well as being able to mix synchronous and asynchronous code transparently (though perhaps not optimally).
2021-01-12ds: block signals when reaping
This lets us call dwaitpid long before a process exits and not have to wait around for it. This is advantageous for lei where we can run dwaitpid on the pager as soon as we spawn it, instead of waiting for a client socket to go away on DESTROY.
2021-01-12lei q: deduplicate smsg
We don't want duplicate messages in results overviews, either.
2021-01-12lei query + pagination sorta working
Parallelism and interactivity with pager + SIGPIPE needs work; but results are shown and phrase search works without shell users having to apply Xapian quoting rules on top of standard shell quoting.
2021-01-06address: pairs: new helper for JMAP (and maybe lei)
Per JMAP RFC 8621 sec 4.1.2.3, we should be able to denote the lack of a phrase/comment corresponding to an email address with a JSON "null" (or Perl `undef'). [ { "name": "James Smythe", "email": "james@example.com" }, { "name": null, "email": "jane@example.com" }, { "name": "John Smith", "email": "john@example.com" } ] The new "pairs" method just returns a 2 dimensional array and the consumer will fill in the field names if necessary (or not). lei(1) may use the two dimensional array as-is for JSON output.
2021-01-06lei: use client env as-is, drop daemon-env command
There may be subtle misbehaviours when mixing the existing daemon env and the client-supplied env. Just do the simplest thing and use the client env as-is. We'll also start the ->event_step callback since we'll need to remember some things for long-lived commands.
2021-01-05imap: fix uninitialized var on MSN search miss
It seems only triggered by bots trying to steal information.
2021-01-04lei: prefer IO::FDPass over our Inline::C recv_3fds
While our recv_3fds() implementation is more efficient syscall-wise, loading Inline takes nearly 50ms on my machine even after Inline::C memoizes the build. The current ~20ms in the fast path is barely acceptable to me, and 50ms would be unusable. Eventually, script/lei may invoke tcc(1) or cc(1) directly in the fast path, but it needs @INC for the slow path, at least. We'll encode the number of FDs into the socket name allow parallel installations, for now.
2021-01-03use Eml (or MIME) objects for all indexing paths
We don't need to be keeping the raw message around after it hits git. Shard work now relies on Storable (or Sereal) and all of the indexing code relies on the Email::MIME-like API of Eml to access interesting parts of the message. Similarly, smsg->{raw_bytes} is no longer carried around and we do the CRLF adjustment when setting smsg->{bytes}. There's also a small simplification to t/import.t while we're in the area to use xqx instead of spawn/popen_rd.
2021-01-03gcf2client: split out request API from regular git
While Gcf2Client is designed to mimic what git-cat-file writes to stdout, its request format is different to support requests with a git repository path included. We'll highlight the distinction and make the GitAsyncCat support code easier-to-follow as a result. Since Gcf2Client relies on DS, we can rely on DS-specific code here, too, and use a single Unix socket instead of separate input and output pipes, reducing memory overhead in both users and kernel space. Due to the interactive nature of requests and responses, the buffer size limitations of Unix sockets on Linux seems inconsequential here (just like it is for existing "git cat-file --batch" use).
2021-01-03send and receive all 3 FDs at once
We'll always be transferring stdin, stdout, and stderr together for lei. Perhaps I lack imagination or foresight, but I can't think of a reason to send more or less FDs.
2021-01-03spawn: support send_fd+recv_fd w/o IO::FDPass
IO::FDPass may be an extra installation burden I don't want to impose on users. We only support Linux and *BSDs, however.
2021-01-03t/lei: use $lei->() callback wrapper
This shortens the test and should make it easier to debug and add new tests.
2021-01-02processpipe: allow synchronous close to set $?
To get rid of the ugly $PublicInbox::DS::in_loop localization in MboxReader, we'll distinguish between ->CLOSE and ->DESTROY with ProcessPipe. If we end up closing via ->DESTROY, we'll assume the caller will want to deal with $? asynchronously via the event loop (or not even care about $?). If we hit ->CLOSE directly, we'll assume the caller called close() and wants to check $? synchronously. Note: wantarray doesn't seem to propagate into tied methods, otherwise I'd be relying on that.
2021-01-02lei_store: alternative unconfigured "git var" workaround
While the changes to git->qx/git->popen from commit 171a9c24022ad7ef will be useful for the lei daemon, hiding git error messages from actual users is probably wrong and we'll just localize GIT_* vars for testing.
2021-01-02treewide: reduce load_xapian* callsites
Hopefully this will make it easier to spot dependency bugs in the future.
2021-01-02t/lei: fix TEST_RUN_MODE=0, simplify oneshot fallback
We need to use an absolute path after chdir in run modes where scripts aren't loaded into in-memory subs. The oneshot test was also failing under TEST_RUN_MODE=0 due to no "lei-oneshot" command existing on the FS. So we force a socket failure by making XDG_RUNTIME_DIR too large to fit into the 108-byte .sun_path field of "struct sockaddr_un". This even lets us simplify lei-oneshot significantly.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01on_destroy: support PID owner guard
Since we'll be forking for Xapian indexing and maybe other places, having a simple guard in place to ensure OnDestroy doesn't unexpectedly unlink files or similar is a safer option.
2021-01-01syscall: SFD_NONBLOCK can be a constant, again
Since Perl exposes O_NONBLOCK as a constant, we can safely make SFD_NONBLOCK a constant, too. This is not the case for SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite being used internally in the interpreter.
2021-01-01t/ipc.t: test for references via `die'
We'll probably start using references as exceptions in some places for more exact matching.
2021-01-01t/run: avoid uninitialized var on incomplete test
Diagnosing an occasional FIFO failure in t/lei_to_mail.t...
2021-01-01spawn: move run_die here from PublicInbox::Import
It seems like a more logical place for it, but we'll favor the newly-added xsys_e() in tests for BAIL_OUT use.
2021-01-01lei_store: handle messages without Message-ID at all
For personal mail, unsent drafts messages are a common source of messages without Message-IDs.
2021-01-01lei: rename "extinbox" => "external"
The words "extinbox" and "extindex" are too close and easy to confuse with the other. Rename "extinbox" to "external", since these could be IMAP, JMAP or other non-public-inbox search APIs. Link: https://public-inbox.org/meta/20201226112649.GB6226@dcvr/
2021-01-01lei_store: add ->set_eml, ->add_eml can return smsg
Add a ->set_eml method which can be a useful fire-and-forget way of either adding new files to store OR setting keywords on them. When seeing brand-new messages, add_eml can afford to return more information in the smsg instead of just the OID.
2021-01-01ipc: support Sereal
Some testing will be needed to see if it's worth the code and maintenance overhead, but it seems easy-enough to get working.
2021-01-01ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple slow sources at once (e.g. curl, IMAP, etc). This is because over.sqlite3 can only have a single writer, and we'll have several slow readers running in parallel. Watch and SearchIdxShard should also be able to use this code in the future, but this will be proven with LeiStore, first.
2021-01-01lei_to_mail: support Maildir, fix+test --augment
Maildir should be plenty fine for short-lived output folders.
2021-01-01lei_to_mail: support for non-seekable outputs
Users may wish to pipe output to "git am", "spamc", or similar, so we need to support those cases and not bail out on lseek(2) or ftruncate(2) failures.
2021-01-01lei: implement various deduplication strategies
For writing mboxes and Maildirs, users may wish to use stricter or looser deduplication strategies. This gives them more control.
2021-01-01lei_to_mail: start --augment, dedupe, bz2 and xz
--augment will match the mairix(1) option of the same name to augment existing search results. We'll need to implement deduplication for a better user experience. mutt ships with compressed mbox support for bz2 and xz, at least, so we'll support those out-of-the-box.
2021-01-01mboxreader: new class for reading various mbox formats
This is only lightly-tested against stuff LeiToMail generates and will need real-world tests to validate.
2021-01-01lei_to_mail: start atomic and compressed mbox writing
We'll allow using multiple workers to write to a single mbox (which could be compressed). This is can be done safely with O_APPEND + syswrite for uncompressed files, and using a lock when piping to pigz/gzip/bzip2/xz.
2021-01-01sharedkv: split out index_values
In most cases, we won't need to index by value, so don't waste cycles or space on it.
2021-01-01sharedkv: fork()-friendly key-value store
This is intended for maintaining Maildir states, mbox message deduplication, but may be useful for other purposes...
2021-01-01lei_to_mail: initial implementation for writing mbox formats
No Maildir, support, yet, but it'll come.
2020-12-31Merge remote-tracking branch 'origin/master' into lorelei
* origin/master: (58 commits) ds: flatten + reuse @events, epoll_wait style fixes ds: simplify EventLoop implementation check defined return value for localized slurp errors import: check for git->qx errors, clearer return values git: qx: avoid extra "local" for scalar context case search: remove {mset} option for ->mset method search: remove pointless {relevance} setting miscsearch: take reopen from Search and use it extsearch: unconditionally reopen on access extindex: allow using --all without EXTINDEX_DIR extindex: add undocumented --no-scan switch extindex: enable autoflush on STDOUT/STDERR extindex: various --watch signal handling fixes extindex: --watch for inotify-based updates eml: fix undefined vars on <Perl 5.28 t/config: test --get-urlmatch for git <2.26 default to CORE::warn in $SIG{__WARN__} handlers inbox: name variable for values loop iterator inboxidle: avoid needless syscalls on refresh inboxidle: clue users into resolving ENOSPC from inotify ...
2020-12-31lei_xsearch: cross-(inbox|extindex) search
While a single extindex combines multiple inboxes into a single search index, extindex still requires up-front indexing on items which can be searched. XSearch has no on-disk footprint itself and uses Xapian DBs of existing publicinbox and extindex ("extinbox") exclusively. XSearch still suffers from the multi-shard Xapian scalability problems which led to the creation of extindex, but I expect the number of shards to remain relatively low. I envision users hosting public-inbox instances on their workstations will only have two extindex combined by this, one read-only extindex for serving public archives, and one read-write extindex managed by LeiStore for private mail.
2020-12-28ds: flatten + reuse @events, epoll_wait style fixes
Consistently returning the equivalent of pollfd.revents in a portable manner was never worth the effort for us, as we use the same ->event_step callback regardless of POLLIN/POLLOUT/POLLHUP. Being a Perl, @events knows it size and we don't have to return a maximum index for the caller to iterate on. We can also avoid redundant integer coercion ("+0") since we ensure everything is an IV in other places. Finally, vec() is preferable to ("\0" x $size) for resizing buffers because it only needs to write the extended portion and not overwrite the entire buffer.
2020-12-28import: check for git->qx errors, clearer return values
Those git commands can fail and git->qx will set $? when it fails. There's no need for the extra indirection of the @ret array, either. Improve git->qx coverage to check for $? while we're at it.
2020-12-28git: qx: avoid extra "local" for scalar context case
We can use the ternary operator to avoid an early return, here