about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiXSearch.pm
DateCommit message (Collapse)
2021-09-19lei/store: use SOCK_SEQPACKET rather than pipe
This has several advantages: * no need to use ipc.lock to protect a pipe for non-atomic writes * ability to pass FDs. In another commit, this will let us simplify lei->sto_done_request and pass newly-created sockets to lei/store directly. disadvantages: - an extra pipe is required for rare messages over several hundred KB, this is probably a non-issue, though The performance delta is unknown, but I expect shards (which remain pipes) to be the primary bottleneck IPC-wise for lei/store.
2021-09-18lei up: automatically use dt: for remote externals
Since we can't use maxuid for remote externals, automatically maintaining the last time we got results and appending a dt: range to the query will prevent HTTP(S) responses from getting too big. We could be using "rt:", but no stable release of public-inbox supports it, yet, so we'll use dt:, instead. By default, there's a two day fudge factor to account for MTA downtime and delays; which is hopefully enough. The fudge factor may be changed per-invocation with the --remote-fudge-factor=INTERVAL option Since different externals can have different message transport routes, "lastresult" entries are stored on a per-external basis.
2021-09-16lei: git_oid: replace git_blob_id
We'll be using binary SHA-1 and SHA-256 in-memory since that's what mail_sync.sqlite3 stores.
2021-09-13lei_xsearch: sensible errors for missing/broken externals
I've been creating and destroying lots of externals, lately...
2021-09-11lei: normalize whitespace in remote queries
Having redundant "+" in URLs is ugly and can hurt cacheability of queries. Even with "quoted phrase searches", Xapian seems unaffected by redundant spaces, so just normalize the ASCII white spaces to ' ' (%20) when fed via STDIN or saved-search config file. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
2021-09-10lei up: only delay non-zero "# $NR written to ..."
"# 0 written to $FOLDER" messages aren't important to the user, so we can show them in real time and allow them to be lost in the terminal scroll. When >0 messages are written to a folder, we'll show them last so a user will know which folders to open with their MUA.
2021-09-03lei_xsearch: avoid false-positives on externals w/ L: and kw:
We need to use LeiSearch->qparse_new to handle (and filter out) "L:" and "kw:" search prefixes to avoid hitting false positives when externals are involved. Unfortunately, this doesn't work for remote HTTP(S) externals, but those aren't enabled by default.
2021-08-25lei up: improve --all=local stderr output
The "# $NR written to $DEST ($total matches)" messages are arguably the most useful output of "lei up --all=local", but they get intermixed with progress messages from various workers. Queue up these finalization messages and only spit them out on ->DESTROY.
2021-08-24lei: non-blocking lei/store->done in lei-daemon
This allows client sockets to wait for "done" commits to lei/store while the daemon reacts asynchronously. The goal of this change is to keep the script/lei client alive until lei/store commits changes to the filesystem, but without blocking the lei-daemon event loop. It depends on Perl refcounting to close the socket. This change also highlighted our over-use of "done" requests to lei/store processes, which is now corrected so we only issue it on collective socket EOF rather than upon reaping every single worker. This also fixes "lei forget-mail-sync" when it is the initial command. This took several iterations and much debugging to arrive at the current implementation: 1. The initial iteration of this change utilized socket passing from lei-daemon to lei/store, which necessitated switching from faster pipes to slower Unix sockets. 2. The second iteration switched to registering notification sockets independently of "done" requests, but that could lead to early wakeups when "done" was requested by other workers. This appeared to work most of the time, but suffered races under high load which were difficult to track down. Finally, this iteration passes the stringified socket GLOB ref to lei/store which is echoed back to lei-daemon upon completion of that particular "done" request.
2021-08-14lei: hexdigest mocks account for unwanted headers
PublicInbox::Import never imports @UNWANTED_HEADERS, so ensure our mock blob OIDs do the same. This ought to prevent duplicates if the PSGI mboxrd download starts setting "X-Status: F" like "lei q -tt .."
2021-08-14lei <q|up>: wait on remote mboxrd imports synchronously
This ought to avoid /Document \d+ not found/ errors from Xapian when seeing a message for the first time by not attempting to read keywords for totally unseen messages.
2021-08-09lei_xsearch: improve Xapian open failure messages
Displaying $! can help users diagnose resource limit problems such as EMFILE/ENFILE/ENOMEM. $@ is currently useful for XS Search::Xapian and perhaps future versions of the Xapian.pm SWIG bindings.
2021-07-25lei: avoid SQLite COUNT() for dedupe
SQLite COUNT() is a slow operation that does a full table scan with no conditions. There's no need for it, since lei dedupe only needs to know if it's empty or not to decide between new/ and cur/ for Maildir outputs.
2021-06-23search: make xap_terms easier-to-use and use it more
This allows us to simplify callers throughout, and exceptions are can no longer be silently hidden. MiscSearch now uses xap_terms for looking up eidx_key terms for a code reduction. We also simplify LeiStore->_msg_kw for runtime use by moving the MsetIterator handling into t/lei_store.t test case.
2021-06-08lei: generalize auxiliary WQ handling
op_wait_event is now more lei-specific since we no longer have to care about oneshot and use a synchronous loop. {ikw} (import-keywords) started a trend, but LeiPmdir (parallel Maildir) is an upcoming WQ class that will follow this idea. Eventually, {l2m} usage may be updated to follow this, too.
2021-06-04pkt_op: make pkt_do an OO method
This will make it easier to use for internal use such as managing Maildir and IMAP IDLE watches.
2021-05-30lei import|lcat: improve+fix single message IMAP support
lcat can now dump the memoized contents of entire IMAP folders, not just a single UID. It's now parallelized and pipelined for multiple lei2mail workers. Furthemore, various forms of JSON output work consistently with blob-only output, now. While working on this, I noticed NetReader was passing UID URLs to imap_each callbacks, which was causing mail_sync.sqlite3 to store UIDs in `folders' and clearly wrong so it's now fixed.
2021-05-28lei q|up: support v2:/path/to/inboxdir destination
This allows "lei-managed pseudo mailing lists" as described by Konstantin. Alternates use is optional and can be enables via --shared. This doesn't manage or edit ~/.public-inbox/config; presumably there'll need to be some tweaking of search parameters before finalizing and making the inbox publicly accessible via HTTP/NNTP. Link: https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/T/
2021-05-28lei: retry_reopen on read-only Xapian access
Xapian DBs may be modified by a parallel process while we're reading it, and Xapian's MVCC model places the burden on readers to retry operations. We'll also have retry_reopen croak instead of die on errors, which ought to help us track down some "Document not found" errors I've occasionally seen when using "lei <q|up>".
2021-05-28lei: restore working directory in more places
Every tick of the event loop can change the working directory, so we need to restore it for every client if they operate in different directories. This would be easier if we had openat(2) and friends in Perl; but Inline::C is practically required for lei, now.
2021-05-28lei_mail_sync: debug code for uncommitted txn
I'm not 100% sure why, but "lei up" seems to cause uncommitted transaction errors. LeiToMail calls sto->set_sync_info, but LeiXSearch should call sto->done and lms_commit, so I'm not sure where the uncommited transaction is coming from...
2021-05-23lei <q|up>: set \Recent on non-empty mbox and Maildir
Despite JMAP not supporting the equivalent of the IMAP \Recent flag, it is useful for "lei q --augment", and "lei up" users to be able to distinguish new results from old-but-unread messages in an mbox or Maildir. For mbox family messages, we'll drop the "O" status flag when appending to mboxes, and we'll write to the "new" subdirectory of Maildirs. Behavior when writing to initially empty Maildirs and mboxes remains unchanged since there's no need to distinguish between new and old results in the initial case. Having users wait for a rename(2) storm or complete mbox rewrite hurts UX. With IMAP mailboxes, \Recent is already enforced by the IMAP server and IMAP clients have no way of changing it(*) (*) mutt uses the "Old" IMAP flag which isn't part of RFC 3501, other MUAs may do similar things.
2021-05-06lei_xsearch: fix accounting bugs in for remote mboxrd
We must not accumulate mset totals for messages which have already been counted. Furthermore, the combined search was being passed an extra arg and causing the total to go missing.
2021-05-03lei <q|up>: combine written/results into one line
Having multiple lines of output mean they can be interleaved in daemon mode. Put stats into one line to reduce screen real-estate size and improve readability.
2021-05-01lei_auth: s/net_merge_complete/net_merge_all_done/
We use the "done" term elsewhere for similar things, and my easily-confused mind equates "complete" with shell completion.
2021-05-01lei <q|up>: distinguish between mset and l2m counts
The number of messages we write to --output is usually different than the mset count due to deduplication from combining multiple sources. This change makes the stderr output of "lei up --all=local" way more useful IMHO.
2021-04-28lei: quiet down Eml-related warnings consistently
"lei import" is probably the only place where it users might care about warnings.
2021-04-25lei_xsearch: show --output location with match count
This makes "lei up --all=local" output easier-to-understand when it's updating multiple saved searches.
2021-04-13lei q: start wiring up saved search
This will have a over.sqlite3 for content-based deduplication. It may exhibit ibxish methods, so serving a read-only (or even R/W) IMAP or instance or displaying HTML isn't outside the realm of possibility.
2021-04-13lei_xsearch: use per-external queries when not sorting
We only need the combined mset query when we care about sort order. When writing to --output destinations intended for MUA consumption, sort order is irrelevant as MUAs are expected to offer their own sorting, so run queries to each external in parallel. This prepares us for docid-sort-based saved search support. It will also become faster than the combined mset query for users with many externals due to current Xapian exhibiting poor performance with many shards (the same reason -extindex exists)
2021-04-05lei q: fix auth IMAP --output with remote mboxrd
IMAP authentication info is only shared amongst lei2mail workers, so we must ensure all IMAP writes go through lei2mail workers even if we don't have to access the mail through git. This allows us to decouple the latency of the remote mboxrd from the latency of the IMAP --output at the expense of extra IPC overhead within our own processes.
2021-04-03lei q: don't show remote progress if MUA is running
Remote results can safely use the same mset progress reporting as local results, despite not knowing the size of the result set. We're assuming terminal MUAs, for now.
2021-03-29lei: use IO::Uncompress::Gunzip MultiStream
This is compatible with default gunzip(1) behavior and future-proofs us against potential changes in PublicInbox::WWW to save memory on public-inbox-httpd instances.
2021-03-28lei: simplify PktOp callers
Provide a consistent ->op_wait_event method instead of forcing callers to loop (or not) at each callsite. This also avoid a leak possibility by avoiding circular references.
2021-03-26lei: do not blindly commit to lei/store on close
It may hide errors/bugs, instead do it explicitly for each worker that writes to it. For lei_xsearch, it will be better to close before spawning the MUA for future use since we may need it again once the user starts changing keywords.
2021-03-26lei q: skip lei/store->write_prepare for JSON outputs
JSON outputs won't write to lei/store at all, so there's no point in forking the store worker if it's not already running. LeiSearch object ($lse) is also fork-safe until it opens a persistent FD for Xapian/SQLite so we can unconditionally carry it across fork.
2021-03-26lei: add some labels support
"lei q" now displays labels in JSON output, "lei mark" can add or remove labels for any messages. "lei ls-label" is supported, too. Unfortunately, "lei q" won't hande "kw:" or "L:" for external messages, they must be imported, first.
2021-03-26lei_xsearch: wait for kw updates for non-threaded case, too
We'll also hoist wait_startq out of the per-message loops since it's not worth having to check every single message when filling in smsg info is reasonably fast, anyways.
2021-03-24lei: clean up pkt_op consumer on exception, too
We need to consistently ensure pkt_op_c doesn't lead to a long-lived circular reference if an exception is thrown in pre_augment. Maybe the API could be better, but this fixes an FD leak when attempting to --augment a FIFO. Followup-to: b9524082ba39e665 ("lei_xsearch: cleanup {pkt_op_p} on exceptions")
2021-03-24lei: hide *_atfork_child from command-line
Otherwise we could get non-sensical results if somebody tries running "lei atfork_child" from the command-line.
2021-03-21lei q: fix warning on remote imports
This will let us tie keywords from remote externals to those which only exist in local externals.
2021-03-21lei q: trim JSON output
Stop showing `docid' since it's not useful with shards. `bytes' and `lines' are probably noise, but maybe could be visible in some "fuller" view. v2: t/lei_xsearch: fix warnings from {docid} removal
2021-03-21lei q: support vmd for external-only messages
"lei q" now preserves changes per-message keywords across invocations when it's --output (Maildir or mbox) is reused (with or without --augment). In the future, these changes will be monitored via inotify, EVFILT_VNODE or IMAP IDLE, too. Unfortunately, this currently prevents "lei import" from ever importing a message that's in an external. That will be fixed in a future change.
2021-03-21lei: All Local Externals: bare git dir for alternates
This will be used for keyword (and label) storage for externals. We'll be using this to ensure we don't redundantly auto-import messages into lei/store if they're already in a local external (they can still be imported explicitly via "lei import").
2021-03-19lei: disallow "\n" in local externals paths
git 2.11 and earlier could not handle git directories with newlines in them, nor does libgit2 support them. Followup-to: d87dd0e679587043 ("config: reject `\n' in `inboxdir'")
2021-03-15lei q: do not import unnecessarily from externals
We only want to auto import messages that are exclusively in remote externals. Messages in local externals are not auto-imported to save space and reduce wear on storage device.
2021-03-05lei q: one -t shouldn't set `flagged' on external mail
We only want to set `flagged' if a user requests it via a two '-t' switches. Fixes: 232f8e376fe2856c ("lei q: -tt marks direct hits as flagged")
2021-03-05search: use "z:" instead of "bytes:" prefix
So far, searching by size has never been publicly documented, and IMHO, of questionable utility. In any case, "z:" is what mairix(1) uses, so it may be familiar to existing mairix users (I've never used this prefix myself). So far, this prefix is only used internally in tests and in auto-translated queries from IMAP; thus this incompatible change is unlikely to affect anyone.
2021-03-04lei_xsearch: cleanup {pkt_op_p} on exceptions
We must ensure pkt_op_p doesn't live beyond the scope of ->do_query in the top-level lei-daemon, otherwise it can leave a stray socket hanging around in case of exceptions.
2021-03-04lei q: import flags when clobbering/augmenting Maildirs
This will eventually be supported for other mail stores, but Maildir is the easiest to test and support, here. This lets us avoid a situation where flag changes get lost between search results.