Date | Commit message (Collapse) |
|
We want new environment variables when spawning the
pager from oneshot (non-daemon) mode.
|
|
All the augment and deduplication stuff seems to be working
based on unit tests. OpPipe is a nice general addition that
will probably make future state machines easier.
|
|
Children should not be blindly killing siblings on ->DESTROY
since they're typically shorter-lived than parents. We'll
also be more careful about on-stack variables and now we
can rely exclusively on delete ops to close FDs.
We also need to fix our SIGPIPE handling for the oneshot case
while fixing a typo for delete, so we write "!" to the EOF pipe
to ensure the parent oneshot process exits on the first worker
that hits SIGPIPE, rather than waiting for the last worker to
hit SIGPIPE.
|
|
We'll be doing most of the work in forked off worker processes,
so ensure some of it is fork and serialization-friendly.
|
|
Xapian v1.2.21..v1.2.24 failed to set the close-on-exec flag
on the flintlock FD, causing "git cat-file" processes to
hold onto the lock and prevent subsequent Xapian::WritableDatabase
from locking the DB. So cleanup git processes after committing
the miscidx transaction.
|
|
The version of File::Spec shipped with Perl 5.16.3 memoizes the
value of File::Spec->tmpdir, causing changes to $ENV{TMPDIR}
after-the-fact to be ignored.
We'll only work around this in the test since it's innocuous and
unlikely to matter in real-world usage (and there's many places
where we'd have to workaround this in non-test code).
|
|
...by avoiding selectall_array in favor of selectall_arrayref.
Tested with DBI 1.627.
|
|
Older Perls (tested 5.16.3) would warn on uninitialized scalars while
newer (tested 5.28.1) do not. Just initialize it to an empty string
since it'll be filled in by `vec'.
|
|
Oops, this is needed for systems lacking Email::Address::XS
|
|
Perl chdir() automatically does fchdir(2) if given a file
or directory handle since 5.8.8/5.10.0, so we can safely
rely on it given our 5.10.1+ requirement.
This means we no longer have to waste several milliseconds
loading the Cwd.so and making stat() calls to ensure
ENV{PWD} is correct and usable in the server. It also lets
us work in directories that are no longer accessible via
pathname.
|
|
We can place the IO/GLOB ref directly into $self, here.
|
|
Dedupe is active, now, and we have $each_smsg->(...)
|
|
Most writes to stdout aren't atomic and we need locking to
prevent workers from interleaving and corrupting JSON output.
The one case stdout won't require locking is if it's pointed
to a regular file with O_APPEND; as POSIX O_APPEND semantics
guarantees atomicity.
|
|
"references" was too long of a name compared to the other field
names we output in the JSON. While we currently don't have a
"refs:" search prefix for the "References:" header, we may in
the future.
|
|
Meaning "Received time", as it is the best description of the
value we use from the "Received:" header, if present. JMAP
calls it "receivedAt", but "rt:" seems like a better
abbreviation being in line with "dt:" for the "Date" header.
"Timestamp" ("ts") was potentially ambiguous given the presence
of the "Date" header.
|
|
Now that dedupe is serialization and fork-safe, we can
wire it back up in our query results paths.
|
|
We'll be passing these objects via PublicInbox::IPC which uses
Storable (or Sereal), so ensure they're safe to use after
serialization.
|
|
This lets us get rid of the Sys::Syslog import and __WARN__
override in LeiXSearch, though we still need it with
->atfork_child_wq.
|
|
We can shrink the @TO_CLOSE_ATFORK_CHILD array by two
elements, at least. I may be possible to eliminate this
array entirely but clobbering $quit doesn't seem to
remove references to $eof_w or the $listener socket.
|
|
This matches existing -httpd/-nntpd/-imapd daemon behavior.
From what I can recall, it is less racy for the process doing
bind(2) to unlink it if stale.
|
|
It turns out "local" did not take effect in the way we used it:
http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
Fortunately, none of the old use cases seem affected, unlike the
previous lei change to ensure consistent SIGPIPE handling.
|
|
The new test ensures consistency between oneshot and
client/daemon users. Cancelling an in-progress result now also
stops xsearch workers to avoid wasted CPU and I/O.
Note the lei->atfork_child_wq usage changes, it is to workaround
a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
This switches the internal protocol to use SOCK_SEQPACKET
AF_UNIX sockets to prevent merging messages from the daemon to
client to run pager and kill/exit the client script.
|
|
We'll ensure our {send,recv}_cmd4 implementations are
consistent w.r.t. non-blocking and interrupted sockets.
We'll also support receiving messages without FDs associated
so we don't have to send dummy FDs to keep receivers from
reporting EOF.
|
|
This internal API is better suited for fork-friendliness (but
locking + dedupe still needs to be re-added).
Normal "json" is the default, though stream-friendly "concatjson"
and "jsonl" (AKA "ndjson" AKA "ldjson") all seem working
(though tests aren't working, yet).
For normal "json", the biggest downside is the necessity of a
trailing "null" element at the end of the array because of
parallel processes, since (AFAIK) regular JSON doesn't allow
trailing commas, unlike JavaScript.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
While most single keystrokes work fine when the pager is
launched from the background daemon, Ctrl-C and WINCH can cause
strangeness when connected to the wrong terminal.
|
|
Do a better job of closing FDs that we don't want shared with
the work queue workers. We'll also fix naming and use
"atfork_prepare" instead of "atfork_parent" to match
pthread_atfork(3) naming.
|
|
Using kill(2) is too dangerous since extremely long
queries may mean the original PID of the aborted lei(1)
client process to be recycled by a new process. It would
be bad if the lei_xsearch worker process issued a kill
on the wrong process.
So just rely on sending the exit message via socket.
|
|
Relying on signal handlers to kill a particular worker was a
laggy/racy idea and I gave up on the idea of targetting workers
explicitly and instead chose to make wq_worker_decr stop the
next idle worker ->wq_exit.
We will however attempt to support sending signals to
a process group.
|
|
IO::FDPass is our last choice for implementing the workqueue
because its lack of atomicity makes it impossible to guarantee
all requests of a single group hit a single worker out of many.
So the only way to use IO::FDPass for workqueues it to only have
a single worker. A single worker still buys us a small amount
of parallelism because of the parent process.
|
|
Actually, sending 4 FDs will be useful for lei internal xsearch
work once we start accepting input from stdin. It won't be used
with the lightweight lei(1) client, however.
For WWW (eventually), a single FD may be enough.
|
|
Improve interactivity and user experience by allowing the user
to return to the terminal immediately when the pager is exited
(e.g. hitting the `q' key in less(1)).
This is a massive change which restructures query handling to
allow parallel search when --thread expansion is in use and
offloading to a separate worker when --thread is not in use.
The Xapian query offload changes allow us to reenter the event
loop right away once the search(es) are shipped off to the work
queue workers.
This means the main lei-daemon process can forget the lei(1)
client socket immediately once it's handed off to worker
processes.
We now unblock SIGPIPE in query workers and send an exit(141)
response to the lei(1) client socket to denote SIGPIPE.
This also allows parallelization for users using "lei q" from
multiple terminals.
JSON output is currently broken and will need to be restructured
for more flexibility and fork-safety.
|
|
... instead of STD*{IO}. I'm not sure why *STDOUT{IO} being an
IO::File object disqualifies it from the "-t" perlop check
returning true on TTY, but it does. So use *STDOUT{GLOB} for
now.
http://nntp.perl.org/group/perl.perl5.porters/258760
Message-ID: <X/kgIqIuh4ZtUZNR@dcvr>
|
|
Perl keeps track of the variable name for error messages
when auto-closing an FD fails, so this will help identify
the source of a close error..
|
|
We'll enable automatic cleanup when IPC classes go out-of-scope
to avoid leaving zombies around.
->wq_workers will be a useful convenience method to change
worker counts.
|
|
It is not used anywhere.
|
|
Increasing/decreasing workers count will be useful in
some situations.
|
|
We can just EOF the pipe, and instead rely on per-class
error handling to deal with uncommitted transactions and
what not.
|
|
This will allow any number of younger sibling processes to
communicate with older siblings directly without relying on a
mediator process. This is intended to be useful for
distributing search work across multiple workers without caring
which worker hits it (we only care about shard members).
And any request sent with this will be able to hit any worker
without locking on our part.
Unix stream sockets with a listener were also considered;
binding to a file on the FS may confuse users given there's
already a socket path for lei(1). Linux-only Abstract or
autobind sockets are rejected due to lack of portability.
SOCK_SEQPACKET via socketpair(2) was chosen since it's POSIX
2008 and available on FreeBSD 9+ in addition to Linux, and
doesn't require filesystem access.
|
|
We should not need an eval for warning with our code base.
Nowadays, dwaitpid() automatically does the right thing
regardless of whether we're in the event loop, so no eval
is needed there, either.
|
|
For another step in in syscall reduction, we'll support
transferring 3 FDs and a buffer with a single sendmsg/recvmsg
syscall using Socket::MsgHdr if available.
Beyond script/lei itself, this will be used for internal IPC
between search backends (perhaps with SOCK_SEQPACKET). There's
a chance this could make it to the public-facing daemons, too.
This adds an optional dependency on the Socket::MsgHdr package,
available as libsocket-msghdr-perl on Debian-based distros
(but not CentOS 7.x and FreeBSD 11.x, at least).
Our Inline::C version in PublicInbox::Spawn remains the last
choice for script/lei due to the high startup time, and
IO::FDPass remains supported for non-Debian distros.
Since the socket name prefix changes from 3 to 4, we'll also
take this opportunity to make the argv+env buffer transfer less
error-prone by relying on argc instead of designated delimiters.
|
|
Similar to git->cat_async, this will let us deal with responses
asynchronously, as well as being able to mix synchronous and
asynchronous code transparently (though perhaps not optimally).
|
|
This lets us call dwaitpid long before a process exits
and not have to wait around for it.
This is advantageous for lei where we can run dwaitpid on the
pager as soon as we spawn it, instead of waiting for a client
socket to go away on DESTROY.
|
|
We don't want duplicate messages in results overviews, either.
|
|
Parallelism and interactivity with pager + SIGPIPE needs work;
but results are shown and phrase search works without shell
users having to apply Xapian quoting rules on top of standard
shell quoting.
|
|
We've always temporarily unindexeded messages before reindexing
them again if there's discontiguous history.
This change improves the mechanism we use to prevent NNTP and
IMAP clients from seeing duplicate messages.
Previously, we relied on mapping Message-IDs to NNTP article
numbers to ensure clients would not see the same message twice.
This worked for most messages, but not for for messages with
reused or duplicate Message-IDs.
Instead of relying on Message-IDs as a key, we now rely on the
git blob object ID for exact content matching. This allows
truly different messages to show up for NNTP|IMAP clients, while
still those clients from seeing the message again.
|
|
Per JMAP RFC 8621 sec 4.1.2.3, we should be able to
denote the lack of a phrase/comment corresponding to an
email address with a JSON "null" (or Perl `undef').
[
{ "name": "James Smythe", "email": "james@example.com" },
{ "name": null, "email": "jane@example.com" },
{ "name": "John Smith", "email": "john@example.com" }
]
The new "pairs" method just returns a 2 dimensional array
and the consumer will fill in the field names if necessary
(or not).
lei(1) may use the two dimensional array as-is for JSON output.
|
|
There may be subtle misbehaviours when mixing the existing
daemon env and the client-supplied env. Just do the simplest
thing and use the client env as-is.
We'll also start the ->event_step callback since we'll need
to remember some things for long-lived commands.
|
|
Just like git, we'll start a pager when outputting to a terminal
for user-friendliness when reading many messages.
|
|
"-o default" is what we want from "complete", "-o filename" just
tells readline the result from the "_lei" function might be a
filename and quote appropriately.
|