about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiStore.pm
DateCommit message (Collapse)
2021-06-02lei export-kw: do not write directly to mail_sync.sqlite3
Only the lei/store process should be writing to files/DBs in lei/store.
2021-05-27lei rm: new command to remove messages from index
This is similar to "public-inbox-learn rm", but it's possible to point an entire Maildir/IMAP/mbox*/newsgroup at it.
2021-05-26lei: require Socket::MsgHdr or Inline::C, drop oneshot
The cost of supporting separate code paths between oneshot and daemon isn't worth the trouble; especially if there are more users to support. The test suite time nearly doubles with oneshot, so that's hurting developer productivity. FD passing is currently required to work efficiently with remote HTTP(S) queries which return large messages, as seen in commit 708b182a57373172f5523f3dc297659d58e03b58 ("ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case"). Additionally, upcoming support for IMAP IDLE and inotify-based monitoring of Maildirs cannot work properly without a background daemon.
2021-05-04lei index: new command to index mail w/o git storage
Since completely purging blobs from git is slow, users may wish to index messages in Maildirs (and eventually other local storage) without storing data in git. Much code from LeiImport and LeiInput is reused, and a new dummy FakeImport class supplies a non-storing $im->add and minimize changes to LeiStore. The tricky part of this command is to support "lei import" after a message has gone through "lei index". Relying on $smsg->{bytes} == 0 (as we do for external-only vmd storage) does not work here, since it would break searching for "z:" byte-ranges when not using externals. This eventually required PublicInbox::Import::add to use a SharedKV to keep track of imported blobs and prevent duplication.
2021-05-03lei <q|up>: writes to Maildirs and IMAP use mail-sync
This will allow keyword updates from other folders to propagate to folders where search results may be duplicated.
2021-04-30lei_store: fix locking w.r.t epoch creation
Prior to this change, it was possible for oneshot lei processes to race on epoch creation/rollover. lei-daemon normally prevents the problem by funnelling all writes to a single socket, but oneshot lei has no such protection.
2021-04-28lei: quiet down Eml-related warnings consistently
"lei import" is probably the only place where it users might care about warnings.
2021-04-24lei import: keep sync info for Maildir and IMAP folders
We aren't using it, yet, but the plan is to be able to use this information to propagate keyword changes back to IMAP and Maildir folders using some to-be-implemented command. "lei inspect" is a half-baked new command to make testing this change easier. It will be updated to support more SQLite+Xapian introspection duties in the future, including public-inbox things independent of lei.
2021-04-07lei_store: use getpwuid and hostname for ident
It's nicer in case a user transfers lei/store across machines and wants a way to track when/where they imported something.
2021-04-03lei/store: (more) synchronous non-fatal error output
Since every command that writes to lei/store calls ->done to commit its output, we can rely on that to return a pathname for a readable file with errors in it. Errors can still get crossed up if multiple lei commands are writing to the store at once, but reduces the delay in seeing them and ensures it won't get seen when somebody is attempting to use shell completion.
2021-04-03lei_store: update alternates on new epoch
We'll just let the ExtSearchIdx code handle this uncommon case by doing a full commit.
2021-04-01lei_store: quiet down per-message related warnings
It's needless noise when doing augment and output preparation and shows up way too late and out-of-band with lei-daemon.
2021-04-01lei_store: quiet down git user info being unset
lei_store contents aren't intended to become public, so there's no point in nagging users for their email address for git committer information like git does.
2021-04-01lei_store: set_xvmd: don't add if no vmd at all
There's no point in adding vmd information for an external message if it was never stored and there's no vmd at all. We also don't need to check _docids_for for similar messages, either, since we always check lse->kw_changed, first.
2021-04-01lei q: reduce lei/store work for kw changes to stored mail
We can tweak lse->kw_changed to return docids and reduce IPC traffic and reduce work the lei/store worker needs to do.
2021-03-26lei: add some labels support
"lei q" now displays labels in JSON output, "lei mark" can add or remove labels for any messages. "lei ls-label" is supported, too. Unfortunately, "lei q" won't hande "kw:" or "L:" for external messages, they must be imported, first.
2021-03-24lei_store: give process a better name
We'll prioritize the last two components of the path name ("lei/store") since that's how I often refer to the on-disk location. Then, show the XDG_DATA_HOME it belongs to in case a user changes HOME or XDG_* for testing purposes.
2021-03-24lei: hide *_atfork_child from command-line
Otherwise we could get non-sensical results if somebody tries running "lei atfork_child" from the command-line.
2021-03-24lei mark: command for (un)setting keywords and labels
Only tested for keywords and labels with file inputs, so far; but it seems to do what it needs to do. There's a bit more redundant code than I'd like, and more opportunities for code sharing in the future "lei import" will be expanded to support +kw:$KEYWORD and +L:$LABEL in the future.
2021-03-21lei import: vivify external-only messages
Keyword storage for external-only messages was preventing messages from being explicitly imported. Teach lei_store to vivify keyword-only entries into fully-indexed messages on import.
2021-03-21lei q: support vmd for external-only messages
"lei q" now preserves changes per-message keywords across invocations when it's --output (Maildir or mbox) is reused (with or without --augment). In the future, these changes will be monitored via inotify, EVFILT_VNODE or IMAP IDLE, too. Unfortunately, this currently prevents "lei import" from ever importing a message that's in an external. That will be fixed in a future change.
2021-03-21lei: All Local Externals: bare git dir for alternates
This will be used for keyword (and label) storage for externals. We'll be using this to ensure we don't redundantly auto-import messages into lei/store if they're already in a local external (they can still be imported explicitly via "lei import").
2021-03-20lei_store: initialize IPC lock properly
This was causing errors in a mass keyword import patch I'm working on.
2021-03-17lei_store: keywords => vmd (volatile metadata), prepare for labels
Since keywords and mailboxes (AKA labels) are separate things in JMAP; and only keywords can map reliably to Maildir and mbox; we'll keep them separate in our internal data representations, too. I initially wanted to call this just "meta" for "metadata", but that might be confused with our mailing list name. "metadata" is already used in Xapian's own API, to add another layer of confusion. "tags" was also considered, but probably confusing to notmuch users since our "labels" are analogous to "tags" in notmuch, and notmuch doesn't seem to cover "keywords" separately... So "vmd" it is, since we haven't used this particular three-letter-abbreviation anywhere before; and "volatile" seems like a good description of this metadata since everything else up to this point has been mostly WORM (write-once, read-many).
2021-03-16lei_store: remove maildir_keywords
It's redundant and the same functionality is in MdirReader.
2021-03-16mbox: move mbox_keywords to MboxReader
MboxReader is a more appropriate place for it than LeiStore.
2021-03-15lei q: do not import unnecessarily from externals
We only want to auto import messages that are exclusively in remote externals. Messages in local externals are not auto-imported to save space and reduce wear on storage device.
2021-03-10lei import: simplify Maildir handling
Having a one-off Maildir functionality in LeiStore doesn't seem worth the maintenance burden, especially given an upcoming change to skip trashed messages. I expect this will hurt performance slightly with extra IPC overhead for the socket copy, but "lei import" may eventually become rare or at least not hit messages redundantly.
2021-03-04lei q: import flags when clobbering/augmenting Maildirs
This will eventually be supported for other mail stores, but Maildir is the easiest to test and support, here. This lets us avoid a situation where flag changes get lost between search results.
2021-02-22lei_store: populate ALL.git/alternates with new epochs
Since eidx_init updates ALL.git/objects/info/alternates, we need to ensure new epochs we create from LeiStore->importer exist before eidx_init writes alternates. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/8735xou0gq.fsf@kyleam.com/
2021-02-08lei import: support Maildirs
It seems to be working trivially, though I'm probably going to split out Maildir reading into a separate package rather than using LeiToMail.
2021-02-05lei import: initial implementation
Only tested with .eml files so far, but Maildir + IMAP will be supported.
2021-01-12lei query + pagination sorta working
Parallelism and interactivity with pager + SIGPIPE needs work; but results are shown and phrase search works without shell users having to apply Xapian quoting rules on top of standard shell quoting.
2021-01-03use Eml (or MIME) objects for all indexing paths
We don't need to be keeping the raw message around after it hits git. Shard work now relies on Storable (or Sereal) and all of the indexing code relies on the Email::MIME-like API of Eml to access interesting parts of the message. Similarly, smsg->{raw_bytes} is no longer carried around and we do the CRLF adjustment when setting smsg->{bytes}. There's also a small simplification to t/import.t while we're in the area to use xqx instead of spawn/popen_rd.
2021-01-03searchidxshard: replace index_raw with index_eml
Since Storable and Sereal are designed for lossless serialization, we'll just pass $eml objects to whatever process is running SearchIdx.
2021-01-03searchidxshard: IPC conversion, part 2
We can remove some now-pointless wrapper functions by using ->ipc_do in even more places.
2021-01-02lei_store: alternative unconfigured "git var" workaround
While the changes to git->qx/git->popen from commit 171a9c24022ad7ef will be useful for the lei daemon, hiding git error messages from actual users is probably wrong and we'll just localize GIT_* vars for testing.
2021-01-02treewide: reduce load_xapian* callsites
Hopefully this will make it easier to spot dependency bugs in the future.
2021-01-01lei_store: quiet down "git var" failures
$git->qx and $git->popen now $env and $opt for redirects like lower-level popen_rd. This may be beneficial in other places.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01lei_store: handle messages without Message-ID at all
For personal mail, unsent drafts messages are a common source of messages without Message-IDs.
2021-01-01lei_store: add ->set_eml, ->add_eml can return smsg
Add a ->set_eml method which can be a useful fire-and-forget way of either adding new files to store OR setting keywords on them. When seeing brand-new messages, add_eml can afford to return more information in the smsg instead of just the OID.
2021-01-01ipc: generic IPC dispatch based on Storable
I intend to use this with LeiStore when importing from multiple slow sources at once (e.g. curl, IMAP, etc). This is because over.sqlite3 can only have a single writer, and we'll have several slow readers running in parallel. Watch and SearchIdxShard should also be able to use this code in the future, but this will be proven with LeiStore, first.
2021-01-01revert "lei_store: use per-machine refname as git HEAD"
In retrospect, per-machine HEADs was a bad idea because users of removable storage would be thrown off when moving storage between different machines. This is only a partial revert, the Import::init_bare change to support alternate head names still exists because we may use it for other reasons.
2021-01-01lei_store: use per-machine refname as git HEAD
It may be helpful to identify the source of messages and perhaps avoid conflicting history. On the other hand, this may be a terrible idea for users who move portable storage (e.g. USB sticks) across computers...
2020-12-19lei_store: keyword extraction from mbox and Maildir
Dovecot, mutt, and likely much other software support mbox Status/X-Status headers. Ensure we have a way to extract these headers as JMAP-compatible keywords before removing them for git storage. ->add_eml now accepts setting keywords at import time, and will probably be called like this: $lst->add_eml($eml, $lst->mbox_keywords($eml)); $lst->add_eml($eml, $lst->maildir_keywords($fn));
2020-12-19lei_store: relax GIT_COMMITTER_IDENT check
It's pretty meaningless, since probably nobody notices committer info we extract author info from individual emails, anyways.
2020-12-19lei_store: simplify git_epoch_max, slightly
This follows how we detect the max epoch for v2 and shard count in Xapian.
2020-12-19lei: refine help/option parsing, implement "init"
There's a bunch of work in here as the foundations are being fleshed out. One of the UI/UX is to make it easy to keep built-in help and shell completions consistent
2020-12-19lei_store: local storage for Local Email Interface
Still unstable, this builds off the equally unstable extindex :P This will be used for caching/memoization of traditional mail stores (IMAP, Maildir, etc) while providing indexing via Xapian, along with compression, and checksumming from git. Most notably, this adds the ability to add/remove per-message keywords (draft, seen, flagged, answered) as described in the JMAP specification (RFC 8621 section 4.1.1). We'll use `.' (a single period) as an $eidx_key since it's an invalid {inboxdir} or {newsgroup} name.