Date | Commit message (Collapse) |
|
We'll ensure our {send,recv}_cmd4 implementations are
consistent w.r.t. non-blocking and interrupted sockets.
We'll also support receiving messages without FDs associated
so we don't have to send dummy FDs to keep receivers from
reporting EOF.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
IO::FDPass is our last choice for implementing the workqueue
because its lack of atomicity makes it impossible to guarantee
all requests of a single group hit a single worker out of many.
So the only way to use IO::FDPass for workqueues it to only have
a single worker. A single worker still buys us a small amount
of parallelism because of the parent process.
|
|
Actually, sending 4 FDs will be useful for lei internal xsearch
work once we start accepting input from stdin. It won't be used
with the lightweight lei(1) client, however.
For WWW (eventually), a single FD may be enough.
|
|
We'll enable automatic cleanup when IPC classes go out-of-scope
to avoid leaving zombies around.
->wq_workers will be a useful convenience method to change
worker counts.
|
|
Increasing/decreasing workers count will be useful in
some situations.
|
|
This will allow any number of younger sibling processes to
communicate with older siblings directly without relying on a
mediator process. This is intended to be useful for
distributing search work across multiple workers without caring
which worker hits it (we only care about shard members).
And any request sent with this will be able to hit any worker
without locking on our part.
Unix stream sockets with a listener were also considered;
binding to a file on the FS may confuse users given there's
already a socket path for lei(1). Linux-only Abstract or
autobind sockets are rejected due to lack of portability.
SOCK_SEQPACKET via socketpair(2) was chosen since it's POSIX
2008 and available on FreeBSD 9+ in addition to Linux, and
doesn't require filesystem access.
|
|
For another step in in syscall reduction, we'll support
transferring 3 FDs and a buffer with a single sendmsg/recvmsg
syscall using Socket::MsgHdr if available.
Beyond script/lei itself, this will be used for internal IPC
between search backends (perhaps with SOCK_SEQPACKET). There's
a chance this could make it to the public-facing daemons, too.
This adds an optional dependency on the Socket::MsgHdr package,
available as libsocket-msghdr-perl on Debian-based distros
(but not CentOS 7.x and FreeBSD 11.x, at least).
Our Inline::C version in PublicInbox::Spawn remains the last
choice for script/lei due to the high startup time, and
IO::FDPass remains supported for non-Debian distros.
Since the socket name prefix changes from 3 to 4, we'll also
take this opportunity to make the argv+env buffer transfer less
error-prone by relying on argc instead of designated delimiters.
|
|
Similar to git->cat_async, this will let us deal with responses
asynchronously, as well as being able to mix synchronous and
asynchronous code transparently (though perhaps not optimally).
|
|
This lets us call dwaitpid long before a process exits
and not have to wait around for it.
This is advantageous for lei where we can run dwaitpid on the
pager as soon as we spawn it, instead of waiting for a client
socket to go away on DESTROY.
|
|
We don't want duplicate messages in results overviews, either.
|
|
Parallelism and interactivity with pager + SIGPIPE needs work;
but results are shown and phrase search works without shell
users having to apply Xapian quoting rules on top of standard
shell quoting.
|
|
Per JMAP RFC 8621 sec 4.1.2.3, we should be able to
denote the lack of a phrase/comment corresponding to an
email address with a JSON "null" (or Perl `undef').
[
{ "name": "James Smythe", "email": "james@example.com" },
{ "name": null, "email": "jane@example.com" },
{ "name": "John Smith", "email": "john@example.com" }
]
The new "pairs" method just returns a 2 dimensional array
and the consumer will fill in the field names if necessary
(or not).
lei(1) may use the two dimensional array as-is for JSON output.
|
|
There may be subtle misbehaviours when mixing the existing
daemon env and the client-supplied env. Just do the simplest
thing and use the client env as-is.
We'll also start the ->event_step callback since we'll need
to remember some things for long-lived commands.
|
|
It seems only triggered by bots trying to steal information.
|
|
While our recv_3fds() implementation is more efficient
syscall-wise, loading Inline takes nearly 50ms on my machine
even after Inline::C memoizes the build. The current ~20ms in
the fast path is barely acceptable to me, and 50ms would be
unusable.
Eventually, script/lei may invoke tcc(1) or cc(1) directly in
the fast path, but it needs @INC for the slow path, at least.
We'll encode the number of FDs into the socket name allow
parallel installations, for now.
|
|
We don't need to be keeping the raw message around after it hits
git. Shard work now relies on Storable (or Sereal) and all of
the indexing code relies on the Email::MIME-like API of Eml to
access interesting parts of the message.
Similarly, smsg->{raw_bytes} is no longer carried around and we
do the CRLF adjustment when setting smsg->{bytes}.
There's also a small simplification to t/import.t while
we're in the area to use xqx instead of spawn/popen_rd.
|
|
While Gcf2Client is designed to mimic what git-cat-file writes
to stdout, its request format is different to support requests
with a git repository path included.
We'll highlight the distinction and make the GitAsyncCat support
code easier-to-follow as a result.
Since Gcf2Client relies on DS, we can rely on DS-specific code
here, too, and use a single Unix socket instead of separate
input and output pipes, reducing memory overhead in both users
and kernel space. Due to the interactive nature of requests and
responses, the buffer size limitations of Unix sockets on Linux
seems inconsequential here (just like it is for existing "git
cat-file --batch" use).
|
|
We'll always be transferring stdin, stdout, and stderr together
for lei. Perhaps I lack imagination or foresight, but I can't
think of a reason to send more or less FDs.
|
|
IO::FDPass may be an extra installation burden I don't want to
impose on users. We only support Linux and *BSDs, however.
|
|
This shortens the test and should make it easier to debug and
add new tests.
|
|
To get rid of the ugly $PublicInbox::DS::in_loop localization
in MboxReader, we'll distinguish between ->CLOSE and ->DESTROY
with ProcessPipe.
If we end up closing via ->DESTROY, we'll assume the caller will
want to deal with $? asynchronously via the event loop (or not
even care about $?).
If we hit ->CLOSE directly, we'll assume the caller called
close() and wants to check $? synchronously.
Note: wantarray doesn't seem to propagate into tied methods,
otherwise I'd be relying on that.
|
|
While the changes to git->qx/git->popen from commit 171a9c24022ad7ef
will be useful for the lei daemon, hiding git error messages from
actual users is probably wrong and we'll just localize GIT_*
vars for testing.
|
|
Hopefully this will make it easier to spot dependency
bugs in the future.
|
|
We need to use an absolute path after chdir in run modes
where scripts aren't loaded into in-memory subs.
The oneshot test was also failing under TEST_RUN_MODE=0 due to
no "lei-oneshot" command existing on the FS. So we force a
socket failure by making XDG_RUNTIME_DIR too large to fit into
the 108-byte .sun_path field of "struct sockaddr_un". This
even lets us simplify lei-oneshot significantly.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Since we'll be forking for Xapian indexing and maybe
other places, having a simple guard in place to ensure
OnDestroy doesn't unexpectedly unlink files or similar
is a safer option.
|
|
Since Perl exposes O_NONBLOCK as a constant, we can safely make
SFD_NONBLOCK a constant, too. This is not the case for
SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite
being used internally in the interpreter.
|
|
We'll probably start using references as exceptions in
some places for more exact matching.
|
|
Diagnosing an occasional FIFO failure in t/lei_to_mail.t...
|
|
It seems like a more logical place for it, but we'll favor the
newly-added xsys_e() in tests for BAIL_OUT use.
|
|
For personal mail, unsent drafts messages are a common source of
messages without Message-IDs.
|
|
The words "extinbox" and "extindex" are too close and easy to
confuse with the other. Rename "extinbox" to "external", since
these could be IMAP, JMAP or other non-public-inbox search APIs.
Link: https://public-inbox.org/meta/20201226112649.GB6226@dcvr/
|
|
Add a ->set_eml method which can be a useful fire-and-forget
way of either adding new files to store OR setting keywords
on them.
When seeing brand-new messages, add_eml can afford to return
more information in the smsg instead of just the OID.
|
|
Some testing will be needed to see if it's worth the code
and maintenance overhead, but it seems easy-enough to get
working.
|
|
I intend to use this with LeiStore when importing from multiple
slow sources at once (e.g. curl, IMAP, etc). This is because
over.sqlite3 can only have a single writer, and we'll have
several slow readers running in parallel.
Watch and SearchIdxShard should also be able to use this code
in the future, but this will be proven with LeiStore, first.
|
|
Maildir should be plenty fine for short-lived output folders.
|
|
Users may wish to pipe output to "git am", "spamc",
or similar, so we need to support those cases and
not bail out on lseek(2) or ftruncate(2) failures.
|
|
For writing mboxes and Maildirs, users may wish to use
stricter or looser deduplication strategies. This
gives them more control.
|
|
--augment will match the mairix(1) option of the same
name to augment existing search results. We'll need
to implement deduplication for a better user experience.
mutt ships with compressed mbox support for bz2 and xz,
at least, so we'll support those out-of-the-box.
|
|
This is only lightly-tested against stuff LeiToMail generates
and will need real-world tests to validate.
|
|
We'll allow using multiple workers to write to a single
mbox (which could be compressed). This is can be done
safely with O_APPEND + syswrite for uncompressed files,
and using a lock when piping to pigz/gzip/bzip2/xz.
|
|
In most cases, we won't need to index by value, so
don't waste cycles or space on it.
|
|
This is intended for maintaining Maildir states, mbox message
deduplication, but may be useful for other purposes...
|
|
No Maildir, support, yet, but it'll come.
|
|
* origin/master: (58 commits)
ds: flatten + reuse @events, epoll_wait style fixes
ds: simplify EventLoop implementation
check defined return value for localized slurp errors
import: check for git->qx errors, clearer return values
git: qx: avoid extra "local" for scalar context case
search: remove {mset} option for ->mset method
search: remove pointless {relevance} setting
miscsearch: take reopen from Search and use it
extsearch: unconditionally reopen on access
extindex: allow using --all without EXTINDEX_DIR
extindex: add undocumented --no-scan switch
extindex: enable autoflush on STDOUT/STDERR
extindex: various --watch signal handling fixes
extindex: --watch for inotify-based updates
eml: fix undefined vars on <Perl 5.28
t/config: test --get-urlmatch for git <2.26
default to CORE::warn in $SIG{__WARN__} handlers
inbox: name variable for values loop iterator
inboxidle: avoid needless syscalls on refresh
inboxidle: clue users into resolving ENOSPC from inotify
...
|
|
While a single extindex combines multiple inboxes into a single
search index, extindex still requires up-front indexing on items
which can be searched. XSearch has no on-disk footprint itself
and uses Xapian DBs of existing publicinbox and extindex
("extinbox") exclusively.
XSearch still suffers from the multi-shard Xapian scalability
problems which led to the creation of extindex, but I expect the
number of shards to remain relatively low.
I envision users hosting public-inbox instances on their
workstations will only have two extindex combined by this, one
read-only extindex for serving public archives, and one
read-write extindex managed by LeiStore for private mail.
|
|
Consistently returning the equivalent of pollfd.revents in a
portable manner was never worth the effort for us, as we use the
same ->event_step callback regardless of POLLIN/POLLOUT/POLLHUP.
Being a Perl, @events knows it size and we don't have to return
a maximum index for the caller to iterate on.
We can also avoid redundant integer coercion ("+0") since we
ensure everything is an IV in other places.
Finally, vec() is preferable to ("\0" x $size) for resizing
buffers because it only needs to write the extended portion
and not overwrite the entire buffer.
|
|
Those git commands can fail and git->qx will set $? when it
fails. There's no need for the extra indirection of the @ret
array, either.
Improve git->qx coverage to check for $? while we're at it.
|
|
We can use the ternary operator to avoid an early return, here
|