Date | Commit message (Collapse) |
|
This has several advantages:
* no need to use ipc.lock to protect a pipe for non-atomic writes
* ability to pass FDs. In another commit, this will let us
simplify lei->sto_done_request and pass newly-created
sockets to lei/store directly.
disadvantages:
- an extra pipe is required for rare messages over several
hundred KB, this is probably a non-issue, though
The performance delta is unknown, but I expect shards
(which remain pipes) to be the primary bottleneck IPC-wise
for lei/store.
|
|
Since we can't use maxuid for remote externals, automatically
maintaining the last time we got results and appending a dt:
range to the query will prevent HTTP(S) responses from getting
too big.
We could be using "rt:", but no stable release of public-inbox
supports it, yet, so we'll use dt:, instead.
By default, there's a two day fudge factor to account for MTA
downtime and delays; which is hopefully enough. The fudge
factor may be changed per-invocation with the
--remote-fudge-factor=INTERVAL option
Since different externals can have different message transport
routes, "lastresult" entries are stored on a per-external basis.
|
|
We'll be using binary SHA-1 and SHA-256 in-memory since that's
what mail_sync.sqlite3 stores.
|
|
I've been creating and destroying lots of externals, lately...
|
|
Having redundant "+" in URLs is ugly and can hurt cacheability
of queries. Even with "quoted phrase searches", Xapian seems
unaffected by redundant spaces, so just normalize the ASCII
white spaces to ' ' (%20) when fed via STDIN or saved-search
config file.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
|
|
"# 0 written to $FOLDER" messages aren't important to the
user, so we can show them in real time and allow them to
be lost in the terminal scroll. When >0 messages are
written to a folder, we'll show them last so a user
will know which folders to open with their MUA.
|
|
We need to use LeiSearch->qparse_new to handle (and filter out)
"L:" and "kw:" search prefixes to avoid hitting false positives
when externals are involved. Unfortunately, this doesn't work
for remote HTTP(S) externals, but those aren't enabled by
default.
|
|
The "# $NR written to $DEST ($total matches)" messages are
arguably the most useful output of "lei up --all=local",
but they get intermixed with progress messages from various
workers. Queue up these finalization messages and only spit
them out on ->DESTROY.
|
|
This allows client sockets to wait for "done" commits to
lei/store while the daemon reacts asynchronously. The goal
of this change is to keep the script/lei client alive until
lei/store commits changes to the filesystem, but without
blocking the lei-daemon event loop. It depends on Perl
refcounting to close the socket.
This change also highlighted our over-use of "done" requests to
lei/store processes, which is now corrected so we only issue it
on collective socket EOF rather than upon reaping every single
worker.
This also fixes "lei forget-mail-sync" when it is the initial
command.
This took several iterations and much debugging to arrive at the
current implementation:
1. The initial iteration of this change utilized socket passing
from lei-daemon to lei/store, which necessitated switching
from faster pipes to slower Unix sockets.
2. The second iteration switched to registering notification sockets
independently of "done" requests, but that could lead to early
wakeups when "done" was requested by other workers. This
appeared to work most of the time, but suffered races under
high load which were difficult to track down.
Finally, this iteration passes the stringified socket GLOB ref
to lei/store which is echoed back to lei-daemon upon completion
of that particular "done" request.
|
|
PublicInbox::Import never imports @UNWANTED_HEADERS, so ensure
our mock blob OIDs do the same. This ought to prevent
duplicates if the PSGI mboxrd download starts setting
"X-Status: F" like "lei q -tt .."
|
|
This ought to avoid /Document \d+ not found/ errors from Xapian
when seeing a message for the first time by not attempting to
read keywords for totally unseen messages.
|
|
Displaying $! can help users diagnose resource limit problems
such as EMFILE/ENFILE/ENOMEM. $@ is currently useful for XS
Search::Xapian and perhaps future versions of the Xapian.pm SWIG
bindings.
|
|
SQLite COUNT() is a slow operation that does a full table scan
with no conditions. There's no need for it, since lei dedupe
only needs to know if it's empty or not to decide between
new/ and cur/ for Maildir outputs.
|
|
This allows us to simplify callers throughout, and exceptions are
can no longer be silently hidden. MiscSearch now uses xap_terms
for looking up eidx_key terms for a code reduction.
We also simplify LeiStore->_msg_kw for runtime use by moving the
MsetIterator handling into t/lei_store.t test case.
|
|
op_wait_event is now more lei-specific since we no longer have
to care about oneshot and use a synchronous loop.
{ikw} (import-keywords) started a trend, but LeiPmdir (parallel
Maildir) is an upcoming WQ class that will follow this idea.
Eventually, {l2m} usage may be updated to follow this, too.
|
|
This will make it easier to use for internal use such as
managing Maildir and IMAP IDLE watches.
|
|
lcat can now dump the memoized contents of entire IMAP folders,
not just a single UID. It's now parallelized and pipelined for
multiple lei2mail workers.
Furthemore, various forms of JSON output work consistently
with blob-only output, now.
While working on this, I noticed NetReader was passing UID URLs
to imap_each callbacks, which was causing mail_sync.sqlite3 to
store UIDs in `folders' and clearly wrong so it's now fixed.
|
|
This allows "lei-managed pseudo mailing lists" as described
by Konstantin.
Alternates use is optional and can be enables via --shared.
This doesn't manage or edit ~/.public-inbox/config; presumably
there'll need to be some tweaking of search parameters before
finalizing and making the inbox publicly accessible via HTTP/NNTP.
Link: https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/T/
|
|
Xapian DBs may be modified by a parallel process while we're
reading it, and Xapian's MVCC model places the burden on readers
to retry operations.
We'll also have retry_reopen croak instead of die on errors,
which ought to help us track down some "Document not found"
errors I've occasionally seen when using "lei <q|up>".
|
|
Every tick of the event loop can change the working directory,
so we need to restore it for every client if they operate
in different directories.
This would be easier if we had openat(2) and friends in Perl;
but Inline::C is practically required for lei, now.
|
|
I'm not 100% sure why, but "lei up" seems to cause uncommitted
transaction errors. LeiToMail calls sto->set_sync_info, but
LeiXSearch should call sto->done and lms_commit, so I'm not
sure where the uncommited transaction is coming from...
|
|
Despite JMAP not supporting the equivalent of the IMAP \Recent
flag, it is useful for "lei q --augment", and "lei up" users to
be able to distinguish new results from old-but-unread messages
in an mbox or Maildir.
For mbox family messages, we'll drop the "O" status flag when
appending to mboxes, and we'll write to the "new" subdirectory
of Maildirs.
Behavior when writing to initially empty Maildirs and mboxes
remains unchanged since there's no need to distinguish between
new and old results in the initial case. Having users wait
for a rename(2) storm or complete mbox rewrite hurts UX.
With IMAP mailboxes, \Recent is already enforced by the IMAP
server and IMAP clients have no way of changing it(*)
(*) mutt uses the "Old" IMAP flag which isn't part of RFC 3501,
other MUAs may do similar things.
|
|
We must not accumulate mset totals for messages which
have already been counted. Furthermore, the combined
search was being passed an extra arg and causing the
total to go missing.
|
|
Having multiple lines of output mean they can be interleaved in
daemon mode. Put stats into one line to reduce screen
real-estate size and improve readability.
|
|
We use the "done" term elsewhere for similar things, and
my easily-confused mind equates "complete" with shell
completion.
|
|
The number of messages we write to --output is usually different
than the mset count due to deduplication from combining multiple
sources.
This change makes the stderr output of "lei up --all=local" way
more useful IMHO.
|
|
"lei import" is probably the only place where it users
might care about warnings.
|
|
This makes "lei up --all=local" output easier-to-understand
when it's updating multiple saved searches.
|
|
This will have a over.sqlite3 for content-based deduplication.
It may exhibit ibxish methods, so serving a read-only (or even
R/W) IMAP or instance or displaying HTML isn't outside the realm
of possibility.
|
|
We only need the combined mset query when we care about sort
order. When writing to --output destinations intended for MUA
consumption, sort order is irrelevant as MUAs are expected to
offer their own sorting, so run queries to each external in
parallel.
This prepares us for docid-sort-based saved search support.
It will also become faster than the combined mset query for
users with many externals due to current Xapian exhibiting poor
performance with many shards (the same reason -extindex exists)
|
|
IMAP authentication info is only shared amongst lei2mail workers,
so we must ensure all IMAP writes go through lei2mail workers
even if we don't have to access the mail through git.
This allows us to decouple the latency of the remote mboxrd from
the latency of the IMAP --output at the expense of extra IPC
overhead within our own processes.
|
|
Remote results can safely use the same mset progress reporting
as local results, despite not knowing the size of the result
set. We're assuming terminal MUAs, for now.
|
|
This is compatible with default gunzip(1) behavior and
future-proofs us against potential changes in PublicInbox::WWW
to save memory on public-inbox-httpd instances.
|
|
Provide a consistent ->op_wait_event method instead of
forcing callers to loop (or not) at each callsite.
This also avoid a leak possibility by avoiding circular
references.
|
|
It may hide errors/bugs, instead do it explicitly for each
worker that writes to it. For lei_xsearch, it will be better
to close before spawning the MUA for future use since we may
need it again once the user starts changing keywords.
|
|
JSON outputs won't write to lei/store at all, so there's
no point in forking the store worker if it's not already
running.
LeiSearch object ($lse) is also fork-safe until it opens a
persistent FD for Xapian/SQLite so we can unconditionally
carry it across fork.
|
|
"lei q" now displays labels in JSON output, "lei mark"
can add or remove labels for any messages.
"lei ls-label" is supported, too.
Unfortunately, "lei q" won't hande "kw:" or "L:" for
external messages, they must be imported, first.
|
|
We'll also hoist wait_startq out of the per-message loops
since it's not worth having to check every single message
when filling in smsg info is reasonably fast, anyways.
|
|
We need to consistently ensure pkt_op_c doesn't lead to a
long-lived circular reference if an exception is thrown in
pre_augment. Maybe the API could be better, but this fixes an
FD leak when attempting to --augment a FIFO.
Followup-to: b9524082ba39e665 ("lei_xsearch: cleanup {pkt_op_p} on exceptions")
|
|
Otherwise we could get non-sensical results if somebody tries
running "lei atfork_child" from the command-line.
|
|
This will let us tie keywords from remote externals
to those which only exist in local externals.
|
|
Stop showing `docid' since it's not useful with shards.
`bytes' and `lines' are probably noise, but maybe could be
visible in some "fuller" view.
v2: t/lei_xsearch: fix warnings from {docid} removal
|
|
"lei q" now preserves changes per-message keywords across
invocations when it's --output (Maildir or mbox) is reused
(with or without --augment).
In the future, these changes will be monitored via inotify,
EVFILT_VNODE or IMAP IDLE, too.
Unfortunately, this currently prevents "lei import" from ever
importing a message that's in an external. That will be fixed
in a future change.
|
|
This will be used for keyword (and label) storage for externals.
We'll be using this to ensure we don't redundantly auto-import
messages into lei/store if they're already in a local external
(they can still be imported explicitly via "lei import").
|
|
git 2.11 and earlier could not handle git directories with
newlines in them, nor does libgit2 support them.
Followup-to: d87dd0e679587043 ("config: reject `\n' in `inboxdir'")
|
|
We only want to auto import messages that are exclusively in
remote externals. Messages in local externals are not
auto-imported to save space and reduce wear on storage device.
|
|
We only want to set `flagged' if a user requests it via
a two '-t' switches.
Fixes: 232f8e376fe2856c ("lei q: -tt marks direct hits as flagged")
|
|
So far, searching by size has never been publicly documented,
and IMHO, of questionable utility. In any case, "z:" is what
mairix(1) uses, so it may be familiar to existing mairix users
(I've never used this prefix myself).
So far, this prefix is only used internally in tests and in
auto-translated queries from IMAP; thus this incompatible change
is unlikely to affect anyone.
|
|
We must ensure pkt_op_p doesn't live beyond the scope of
->do_query in the top-level lei-daemon, otherwise it can leave a
stray socket hanging around in case of exceptions.
|
|
This will eventually be supported for other mail stores,
but Maildir is the easiest to test and support, here.
This lets us avoid a situation where flag changes get
lost between search results.
|