about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiSearch.pm
DateCommit message (Collapse)
2021-10-26lei q: enable expensive Xapian flags
FLAG_PURE_NOT is too expensive for public-facing WWW use, but lei isn't public-facing. We'll also unconditionally enable phrase search on old "chert" DBs since lei doesn't need to worry about fairness across 10K users.
2021-10-22lei_search: try harder to associate "lei index"-ed messages
Allow checking for keyword changes if we have an known OID, even if the blob isn't currently reachable.
2021-10-14lei inspect: account for non-extindex inboxes
Inbox->xdb does not exist, but this code path was apparently never tested :x I noticed this on basic v2 inbox, but it could happen with any v1/v2 inbox. Move ->num2docid into Search so it's less awkward to use.
2021-10-08git: use async_wait_all everywhere
Some code paths may use maximum size checks, so ensure any checks are waited on, too.
2021-09-26lei note-event: ignore kw_changed exceptions
The note-event worker may see changes before a Xapian shard commit happens, meaning keyword lookups fail as a result. Just emit the request to the lei/store worker since it's a fairly cheap operation at this point. We'll try harder to look for kw changes, too, since deduplication changes may lead to multiple docids being resolved for a single message.
2021-09-06lei_search: xsmsg_vmd: retry_reopen properly
The deeper eval was preventing retry_reopen from retrying with readers and writers working in parallel: FOO=imaps://example.com/INBOX.huge lei lcat $FOO -f mboxcl | lei tag -F mboxcl +L:bar - Fixes: c7bcfe6cd6648ff0 ("lei: diagnostics for /Document \d+ not found/ errors")
2021-08-16lei_search: avoid unconditional warning when no exception
Oops, we shouldn't warn on "$@" unless "$@" is truthy. Fixes: c7bcfe6cd6648ff0 ("lei: diagnostics for /Document \d+ not found/ errors")
2021-08-14lei: diagnostics for /Document \d+ not found/ errors
This may help diagnose "Exception: Document \d+ not found" errors I'm seeing from "lei up" with HTTPS endpoints.
2021-07-25lei_search: favor binary OID comparisons
Reduce memory traffic and code, too.
2021-07-22lei: start implementing inotify Maildir support
This allows lei to automatically note keyword (message flag) changes made to a Maildir and propagate it into lei/store: lei add-watch --state=tag-ro /path/to/Maildir This doesn't persist across restarts, yet. In the future, it will be applied automatically to "lei q" output Maildirs by default (with an option to disable it). State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all be supported for Maildir. This represents a fairly major internal change that's fairly intrusive, but the whole daemon-oriented design was to facilitate being able to automatically monitor (and propagate) Maildir/IMAP flag changes.
2021-06-23search: make xap_terms easier-to-use and use it more
This allows us to simplify callers throughout, and exceptions are can no longer be silently hidden. MiscSearch now uses xap_terms for looking up eidx_key terms for a code reduction. We also simplify LeiStore->_msg_kw for runtime use by moving the MsetIterator handling into t/lei_store.t test case.
2021-06-01lei import: reduce writes to lei/store on IMAP sync
We don't need to write VMD changes to lei/store if local keywords are unchanged.
2021-05-28lei: retry_reopen on read-only Xapian access
Xapian DBs may be modified by a parallel process while we're reading it, and Xapian's MVCC model places the burden on readers to retry operations. We'll also have retry_reopen croak instead of die on errors, which ought to help us track down some "Document not found" errors I've occasionally seen when using "lei <q|up>".
2021-05-23lei export-kw: new command to export keywords to Maildirs
IMAP will eventually be supported.
2021-05-23lei tag: support tagging index-only messages
This will make some of our tests faster and allow users to try more features of lei without high storage requirements.
2021-05-05lei blob: support "lei index"-ed mail
Normal git retrieval don't work for Maildir blobs indexed using "lei index". Fortunately, this oddness is limited to the LeiStore class and we can override smsg_eml with a fallback to read blobs from Maildirs.
2021-04-24lei import: keep sync info for Maildir and IMAP folders
We aren't using it, yet, but the plan is to be able to use this information to propagate keyword changes back to IMAP and Maildir folders using some to-be-implemented command. "lei inspect" is a half-baked new command to make testing this change easier. It will be updated to support more SQLite+Xapian introspection duties in the future, including public-inbox things independent of lei.
2021-04-22lei: flesh out `forwarded' kw support for Maildir and IMAP
Maildir and IMAP can both handle `forwarded'. Ensure we don't lose `forwarded' when reading from stores which do not support it, but ensure we can set it when reading from IMAP and Maildir stores.
2021-04-05lei_search: ignore Resent-Message-ID for indexing
It currently conflicts with the way OverIdx and SearchIdx index messages, ultimately leading to violating a NOT NULL constraint on id2num.id in over.sqlite3. We may allow searching Resent-* fields separately, though I'm not sure how useful it'll be.
2021-04-03lei: improve handling of Message-ID-less draft messages
We need a stable fallback time for digest2mid in the presence of messages without Received/Date headers. Furthermore, we must avoid using uninitialized smsg->{mid} when parsing References for draft replies.
2021-04-01lei q: reduce lei/store work for kw changes to stored mail
We can tweak lse->kw_changed to return docids and reduce IPC traffic and reduce work the lei/store worker needs to do.
2021-03-26lei: add some labels support
"lei q" now displays labels in JSON output, "lei mark" can add or remove labels for any messages. "lei ls-label" is supported, too. Unfortunately, "lei q" won't hande "kw:" or "L:" for external messages, they must be imported, first.
2021-03-21lei import: vivify external-only messages
Keyword storage for external-only messages was preventing messages from being explicitly imported. Teach lei_store to vivify keyword-only entries into fully-indexed messages on import.
2021-03-21lei q: support vmd for external-only messages
"lei q" now preserves changes per-message keywords across invocations when it's --output (Maildir or mbox) is reused (with or without --augment). In the future, these changes will be monitored via inotify, EVFILT_VNODE or IMAP IDLE, too. Unfortunately, this currently prevents "lei import" from ever importing a message that's in an external. That will be fixed in a future change.
2021-03-15lei q: do not import unnecessarily from externals
We only want to auto import messages that are exclusively in remote externals. Messages in local externals are not auto-imported to save space and reduce wear on storage device.
2021-03-04lei q: import flags when clobbering/augmenting Maildirs
This will eventually be supported for other mail stores, but Maildir is the easiest to test and support, here. This lets us avoid a situation where flag changes get lost between search results.
2021-01-22lei q: retrieve keywords for local, non-external messages
This isn't tested for now, so maybe it works.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-31lei_xsearch: cross-(inbox|extindex) search
While a single extindex combines multiple inboxes into a single search index, extindex still requires up-front indexing on items which can be searched. XSearch has no on-disk footprint itself and uses Xapian DBs of existing publicinbox and extindex ("extinbox") exclusively. XSearch still suffers from the multi-shard Xapian scalability problems which led to the creation of extindex, but I expect the number of shards to remain relatively low. I envision users hosting public-inbox instances on their workstations will only have two extindex combined by this, one read-only extindex for serving public archives, and one read-write extindex managed by LeiStore for private mail.
2020-12-19search: simplify initialization, add ->xdb_shards_flat
This reduces differences between v1 and v2 code, and introduces ->xdb_shards_flat to provide read-only access to shards without using Xapian::MultiDatabase. This will allow us to combine shards of several inboxes AND extindexes for lei.
2020-12-19lei_store: local storage for Local Email Interface
Still unstable, this builds off the equally unstable extindex :P This will be used for caching/memoization of traditional mail stores (IMAP, Maildir, etc) while providing indexing via Xapian, along with compression, and checksumming from git. Most notably, this adds the ability to add/remove per-message keywords (draft, seen, flagged, answered) as described in the JMAP specification (RFC 8621 section 4.1.1). We'll use `.' (a single period) as an $eidx_key since it's an invalid {inboxdir} or {newsgroup} name.