about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-01-18lei: q: results output to Maildir and mbox* working
All the augment and deduplication stuff seems to be working based on unit tests. OpPipe is a nice general addition that will probably make future state machines easier.
2021-01-18ipc: children don't kill on DESTROY, reduce FD sharing
Children should not be blindly killing siblings on ->DESTROY since they're typically shorter-lived than parents. We'll also be more careful about on-stack variables and now we can rely exclusively on delete ops to close FDs. We also need to fix our SIGPIPE handling for the oneshot case while fixing a typo for delete, so we write "!" to the EOF pipe to ensure the parent oneshot process exits on the first worker that hits SIGPIPE, rather than waiting for the last worker to hit SIGPIPE.
2021-01-18lei_to_mail: prepare for worker offload
We'll be doing most of the work in forked off worker processes, so ensure some of it is fork and serialization-friendly.
2021-01-18extindex: fix w/ Xapian 1.2.21..1.2.24
Xapian v1.2.21..v1.2.24 failed to set the close-on-exec flag on the flintlock FD, causing "git cat-file" processes to hold onto the lock and prevent subsequent Xapian::WritableDatabase from locking the DB. So cleanup git processes after committing the miscidx transaction.
2021-01-18initialize scalar for `vec' perlop modification
Older Perls (tested 5.16.3) would warn on uninitialized scalars while newer (tested 5.28.1) do not. Just initialize it to an empty string since it'll be filled in by `vec'.
2021-01-18address: pairs: enable pure Perl version
Oops, this is needed for systems lacking Email::Address::XS
2021-01-15lei: pass FD to CWD via cmsg, use fchdir on server
Perl chdir() automatically does fchdir(2) if given a file or directory handle since 5.8.8/5.10.0, so we can safely rely on it given our 5.10.1+ requirement. This means we no longer have to waste several milliseconds loading the Cwd.so and making stat() calls to ensure ENV{PWD} is correct and usable in the server. It also lets us work in directories that are no longer accessible via pathname.
2021-01-15lei: remove temporary var on open
We can place the IO/GLOB ref directly into $self, here.
2021-01-15leixsearch: remove some commented out code
Dedupe is active, now, and we have $each_smsg->(...)
2021-01-15lei: q: lock stdout on overview output
Most writes to stdout aren't atomic and we need locking to prevent workers from interleaving and corrupting JSON output. The one case stdout won't require locking is if it's pointed to a regular file with O_APPEND; as POSIX O_APPEND semantics guarantees atomicity.
2021-01-14lei_overview: rename "references" to "refs"
"references" was too long of a name compared to the other field names we output in the JSON. While we currently don't have a "refs:" search prefix for the "References:" header, we may in the future.
2021-01-14search: rename "ts:" prefix to "rt:"
Meaning "Received time", as it is the best description of the value we use from the "Received:" header, if present. JMAP calls it "receivedAt", but "rt:" seems like a better abbreviation being in line with "dt:" for the "Date" header. "Timestamp" ("ts") was potentially ambiguous given the presence of the "Date" header.
2021-01-14lei q: reinstate smsg dedupe
Now that dedupe is serialization and fork-safe, we can wire it back up in our query results paths.
2021-01-14lei_dedupe+shared_kv: ensure round-tripping serialization
We'll be passing these objects via PublicInbox::IPC which uses Storable (or Sereal), so ensure they're safe to use after serialization.
2021-01-14lei: rely on localized $current_lei for warnings
This lets us get rid of the Sys::Syslog import and __WARN__ override in LeiXSearch, though we still need it with ->atfork_child_wq.
2021-01-14lei: reduce live FD references in wq child
We can shrink the @TO_CLOSE_ATFORK_CHILD array by two elements, at least. I may be possible to eliminate this array entirely but clobbering $quit doesn't seem to remove references to $eof_w or the $listener socket.
2021-01-14lei: do not unlink socket path at exit
This matches existing -httpd/-nntpd/-imapd daemon behavior. From what I can recall, it is less racy for the process doing bind(2) to unlink it if stale.
2021-01-14daemon+watch: fix localization of %SIG for non-signalfd users
It turns out "local" did not take effect in the way we used it: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Fortunately, none of the old use cases seem affected, unlike the previous lei change to ensure consistent SIGPIPE handling.
2021-01-14lei: test SIGPIPE, stop xsearch workers on client abort
The new test ensures consistency between oneshot and client/daemon users. Cancelling an in-progress result now also stops xsearch workers to avoid wasted CPU and I/O. Note the lei->atfork_child_wq usage changes, it is to workaround a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> This switches the internal protocol to use SOCK_SEQPACKET AF_UNIX sockets to prevent merging messages from the daemon to client to run pager and kill/exit the client script.
2021-01-14cmd_ipc: support + test EINTR + EAGAIN, no FDs
We'll ensure our {send,recv}_cmd4 implementations are consistent w.r.t. non-blocking and interrupted sockets. We'll also support receiving messages without FDs associated so we don't have to send dummy FDs to keep receivers from reporting EOF.
2021-01-12lei: query: restore JSON output overview
This internal API is better suited for fork-friendliness (but locking + dedupe still needs to be re-added). Normal "json" is the default, though stream-friendly "concatjson" and "jsonl" (AKA "ndjson" AKA "ldjson") all seem working (though tests aren't working, yet). For normal "json", the biggest downside is the necessity of a trailing "null" element at the end of the array because of parallel processes, since (AFAIK) regular JSON doesn't allow trailing commas, unlike JavaScript.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12lei: run pager in client script
While most single keystrokes work fine when the pager is launched from the background daemon, Ctrl-C and WINCH can cause strangeness when connected to the wrong terminal.
2021-01-12lei: fork + FD cleanup
Do a better job of closing FDs that we don't want shared with the work queue workers. We'll also fix naming and use "atfork_prepare" instead of "atfork_parent" to match pthread_atfork(3) naming.
2021-01-12lei: get rid of client {pid} field
Using kill(2) is too dangerous since extremely long queries may mean the original PID of the aborted lei(1) client process to be recycled by a new process. It would be bad if the lei_xsearch worker process issued a kill on the wrong process. So just rely on sending the exit message via socket.
2021-01-12ipc: drop unused fields, default sighandlers for wq
Relying on signal handlers to kill a particular worker was a laggy/racy idea and I gave up on the idea of targetting workers explicitly and instead chose to make wq_worker_decr stop the next idle worker ->wq_exit. We will however attempt to support sending signals to a process group.
2021-01-12ipc: fix IO::FDPass use with a worker limit of 1
IO::FDPass is our last choice for implementing the workqueue because its lack of atomicity makes it impossible to guarantee all requests of a single group hit a single worker out of many. So the only way to use IO::FDPass for workqueues it to only have a single worker. A single worker still buys us a small amount of parallelism because of the parent process.
2021-01-12ipc: start supporting sending/receiving more than 3 FDs
Actually, sending 4 FDs will be useful for lei internal xsearch work once we start accepting input from stdin. It won't be used with the lightweight lei(1) client, however. For WWW (eventually), a single FD may be enough.
2021-01-12lei: query: ensure pager exit is instantaneous
Improve interactivity and user experience by allowing the user to return to the terminal immediately when the pager is exited (e.g. hitting the `q' key in less(1)). This is a massive change which restructures query handling to allow parallel search when --thread expansion is in use and offloading to a separate worker when --thread is not in use. The Xapian query offload changes allow us to reenter the event loop right away once the search(es) are shipped off to the work queue workers. This means the main lei-daemon process can forget the lei(1) client socket immediately once it's handed off to worker processes. We now unblock SIGPIPE in query workers and send an exit(141) response to the lei(1) client socket to denote SIGPIPE. This also allows parallelization for users using "lei q" from multiple terminals. JSON output is currently broken and will need to be restructured for more flexibility and fork-safety.
2021-01-12lei: fix oneshot TTY detection by passing STD*{GLOB}
... instead of STD*{IO}. I'm not sure why *STDOUT{IO} being an IO::File object disqualifies it from the "-t" perlop check returning true on TTY, but it does. So use *STDOUT{GLOB} for now. http://nntp.perl.org/group/perl.perl5.porters/258760 Message-ID: <X/kgIqIuh4ZtUZNR@dcvr>
2021-01-12lei: rename $w to $wpager for warning message
Perl keeps track of the variable name for error messages when auto-closing an FD fails, so this will help identify the source of a close error..
2021-01-12ipc: DESTROY and wq_workers methods
We'll enable automatic cleanup when IPC classes go out-of-scope to avoid leaving zombies around. ->wq_workers will be a useful convenience method to change worker counts.
2021-01-12ipc: drop -ipc_parent_pid field
It is not used anywhere.
2021-01-12ipc: wq: support dynamic worker count change
Increasing/decreasing workers count will be useful in some situations.
2021-01-12ipc: eliminate ipc_worker_stop method
We can just EOF the pipe, and instead rely on per-class error handling to deal with uncommitted transactions and what not.
2021-01-12ipc: work queue support via SOCK_SEQPACKET
This will allow any number of younger sibling processes to communicate with older siblings directly without relying on a mediator process. This is intended to be useful for distributing search work across multiple workers without caring which worker hits it (we only care about shard members). And any request sent with this will be able to hit any worker without locking on our part. Unix stream sockets with a listener were also considered; binding to a file on the FS may confuse users given there's already a socket path for lei(1). Linux-only Abstract or autobind sockets are rejected due to lack of portability. SOCK_SEQPACKET via socketpair(2) was chosen since it's POSIX 2008 and available on FreeBSD 9+ in addition to Linux, and doesn't require filesystem access.
2021-01-12ipc: avoid excessive evals
We should not need an eval for warning with our code base. Nowadays, dwaitpid() automatically does the right thing regardless of whether we're in the event loop, so no eval is needed there, either.
2021-01-12cmd_ipc: send FDs with buffer payload
For another step in in syscall reduction, we'll support transferring 3 FDs and a buffer with a single sendmsg/recvmsg syscall using Socket::MsgHdr if available. Beyond script/lei itself, this will be used for internal IPC between search backends (perhaps with SOCK_SEQPACKET). There's a chance this could make it to the public-facing daemons, too. This adds an optional dependency on the Socket::MsgHdr package, available as libsocket-msghdr-perl on Debian-based distros (but not CentOS 7.x and FreeBSD 11.x, at least). Our Inline::C version in PublicInbox::Spawn remains the last choice for script/lei due to the high startup time, and IO::FDPass remains supported for non-Debian distros. Since the socket name prefix changes from 3 to 4, we'll also take this opportunity to make the argv+env buffer transfer less error-prone by relying on argc instead of designated delimiters.
2021-01-12ipc: add support for asynchronous callbacks
Similar to git->cat_async, this will let us deal with responses asynchronously, as well as being able to mix synchronous and asynchronous code transparently (though perhaps not optimally).
2021-01-12ds: block signals when reaping
This lets us call dwaitpid long before a process exits and not have to wait around for it. This is advantageous for lei where we can run dwaitpid on the pager as soon as we spawn it, instead of waiting for a client socket to go away on DESTROY.
2021-01-12lei q: deduplicate smsg
We don't want duplicate messages in results overviews, either.
2021-01-12lei query + pagination sorta working
Parallelism and interactivity with pager + SIGPIPE needs work; but results are shown and phrase search works without shell users having to apply Xapian quoting rules on top of standard shell quoting.
2021-01-09v2writable: exact discontiguous history handling
We've always temporarily unindexeded messages before reindexing them again if there's discontiguous history. This change improves the mechanism we use to prevent NNTP and IMAP clients from seeing duplicate messages. Previously, we relied on mapping Message-IDs to NNTP article numbers to ensure clients would not see the same message twice. This worked for most messages, but not for for messages with reused or duplicate Message-IDs. Instead of relying on Message-IDs as a key, we now rely on the git blob object ID for exact content matching. This allows truly different messages to show up for NNTP|IMAP clients, while still those clients from seeing the message again.
2021-01-06address: pairs: new helper for JMAP (and maybe lei)
Per JMAP RFC 8621 sec 4.1.2.3, we should be able to denote the lack of a phrase/comment corresponding to an email address with a JSON "null" (or Perl `undef'). [ { "name": "James Smythe", "email": "james@example.com" }, { "name": null, "email": "jane@example.com" }, { "name": "John Smith", "email": "john@example.com" } ] The new "pairs" method just returns a 2 dimensional array and the consumer will fill in the field names if necessary (or not). lei(1) may use the two dimensional array as-is for JSON output.
2021-01-06lei: use client env as-is, drop daemon-env command
There may be subtle misbehaviours when mixing the existing daemon env and the client-supplied env. Just do the simplest thing and use the client env as-is. We'll also start the ->event_step callback since we'll need to remember some things for long-lived commands.
2021-01-06lei: automatic pager support
Just like git, we'll start a pager when outputting to a terminal for user-friendliness when reading many messages.
2021-01-05imap: fix uninitialized var on MSN search miss
It seems only triggered by bots trying to steal information.
2021-01-04lei: improve idempotent "init" error message
Showing "leistore.dir= already initialized" because $cur is undefined isn't useful.
2021-01-04lei: fix opt_dash to pass non-dash args to @argv
The special "<>" handling in Getopt::Long actually invokes the callback for every single command-line arg, not just those prefixed by "-". This will let us pass arbitrary non-dashed words for search queries so users can type queries naturally without quoting (unless they want phrase search).
2021-01-04lei: prefer IO::FDPass over our Inline::C recv_3fds
While our recv_3fds() implementation is more efficient syscall-wise, loading Inline takes nearly 50ms on my machine even after Inline::C memoizes the build. The current ~20ms in the fast path is barely acceptable to me, and 50ms would be unusable. Eventually, script/lei may invoke tcc(1) or cc(1) directly in the fast path, but it needs @INC for the slow path, at least. We'll encode the number of FDs into the socket name allow parallel installations, for now.