about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiToMail.pm
DateCommit message (Collapse)
2021-03-30lei_to_mail: update some comments and style
Note that update_kw_maybe is critical in preventing accidental data loss with default "lei q --output" behavior. Also avoid treating (proposed) MH support as lock-free, since appears to lack specifications for locking and be even worse than mbox* in that regard...
2021-03-29lei_input: support compressed mboxes
Since "lei q" and "lei convert" already support writing these compressed inboxes, it makes sense that all mbox readers support them, as well. Using compression is one reliable way to know an mboxrd or mboxo hasn't been unexpectedly truncated.
2021-03-26lei: support /dev/fd/[0-2] inputs and outputs in daemon
Since lei-daemon won't have the same FDs as the client, we need to special-case thse mappings and won't be able to open arbitrary, non-standard FDs. We also won't attempt to support /proc/self/fd/[0-2] since that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err} are portable to FreeBSD, at least. mawk(1) also supports /dev/std{out,err}, as does gawk(1) (which supports everything we can support, and arbitrary /dev/fd/$FD).
2021-03-26lei q: skip lei/store->write_prepare for JSON outputs
JSON outputs won't write to lei/store at all, so there's no point in forking the store worker if it's not already running. LeiSearch object ($lse) is also fork-safe until it opens a persistent FD for Xapian/SQLite so we can unconditionally carry it across fork.
2021-03-24lei: hide *_atfork_child from command-line
Otherwise we could get non-sensical results if somebody tries running "lei atfork_child" from the command-line.
2021-03-21lei_to_mail: match mutt order of status headers
These changes may make it easier to do byte-for-byte comparisons with mail copied out of mutt, a popular MUA for our target audience. mutt currently outputs the 'R' (seen) flag before the 'O' character in the Status: header. We'll assume that stays the case (it has been for a while). Status now comes before X-Status, also matching mutt behavior.
2021-03-21lei q: support vmd for external-only messages
"lei q" now preserves changes per-message keywords across invocations when it's --output (Maildir or mbox) is reused (with or without --augment). In the future, these changes will be monitored via inotify, EVFILT_VNODE or IMAP IDLE, too. Unfortunately, this currently prevents "lei import" from ever importing a message that's in an external. That will be fixed in a future change.
2021-03-21lei: All Local Externals: bare git dir for alternates
This will be used for keyword (and label) storage for externals. We'll be using this to ensure we don't redundantly auto-import messages into lei/store if they're already in a local external (they can still be imported explicitly via "lei import").
2021-03-20maildir: avoid redundant slashes
Redundant slashes look ugly in strace(1) output.
2021-03-17lei_store: keywords => vmd (volatile metadata), prepare for labels
Since keywords and mailboxes (AKA labels) are separate things in JMAP; and only keywords can map reliably to Maildir and mbox; we'll keep them separate in our internal data representations, too. I initially wanted to call this just "meta" for "metadata", but that might be confused with our mailing list name. "metadata" is already used in Xapian's own API, to add another layer of confusion. "tags" was also considered, but probably confusing to notmuch users since our "labels" are analogous to "tags" in notmuch, and notmuch doesn't seem to cover "keywords" separately... So "vmd" it is, since we haven't used this particular three-letter-abbreviation anywhere before; and "volatile" seems like a good description of this metadata since everything else up to this point has been mostly WORM (write-once, read-many).
2021-03-16mbox: move mbox_keywords to MboxReader
MboxReader is a more appropriate place for it than LeiStore.
2021-03-15lei q: do not import unnecessarily from externals
We only want to auto import messages that are exclusively in remote externals. Messages in local externals are not auto-imported to save space and reduce wear on storage device.
2021-03-13lei q: mbox*: disable changing parallelism, add --rsyncable
Unfortunately, being mairix-compatible with --threads means we can't change thread-count of gzip, bzip2, or xz when writing to compressed mbox with a --threads= parameter. It's probably not worth changing, anyways, so another switch or additional value for --jobs= won't be added. While we're in the area, add --rsyncable support since most installations of gzip support it nowadays. Fixes: 5beb4a5f6585acd ("lei: replace --thread with --threads")
2021-03-05lei q: fix --import-before default and FIFO output
commit 6c551bffd75afb41d9b5e4774068abe7e06ed0e7 ("lei q: --import-augment for mbox and mbox.gz") added a check to in _pre_augment_mbox for the option being a ref() to distinguish between default values and user-supplied values (which are non-ref SCALARs from Getopt::Long). However, LeiQuery failed to use a SCALAR ref as the default value, making the check in _pre_augment_mbox useless. We now update LeiQuery to use \1 instead of 1 as the default value so "lei q -f mboxrd ..." to stdout works once again. Unfortunately, testing with redirects pointed to regular files didn't trigger the code paths being updated. Testing with a FIFO revealed further bugs in the FIFO handling code which are also fixed in this commit. We'll also update the $lei->out error message to be less-specific about "stdout" and use the term "output", instead, since LeiToMail replaces stdout for all mbox outputs.
2021-03-04lei q: s/import-augment/import-before/g
Since this importing of keywords is active even when --augment isn't specified, calling it --import-before seems more appropriate. In the future, this will likely default to adding unseen emails to lei/store, not just updating keywords. Link: https://public-inbox.org/meta/20210303222930.GA18597@dcvr/T/
2021-03-04lei q: --import-augment for mbox and mbox.gz
The trickiest output formats we support due to the possibility of filesystem FIFOS and pipes for <gzip|xz|bzip2>. This completes another phase of keyword sync support.
2021-03-04lei q: support --import-augment for IMAP
IMAP is similar to Maildir and we can now preserve keyword updates done on IMAP folders.
2021-03-04lei q: import flags when clobbering/augmenting Maildirs
This will eventually be supported for other mail stores, but Maildir is the easiest to test and support, here. This lets us avoid a situation where flag changes get lost between search results.
2021-03-04lei: use maildir_each_eml in more places
This saves us some code and redundant callsites for eml_from_path. We'll change maildir_each_eml to include the filename to facilitate an upcoming change to "lei q" without --augment
2021-02-26lei q: support mbox locking by default
While this diverges from from mairix(1) behavior, it's the safer option. We'll follow Debian policy by supporting fcntl and dotlocks by default (in that order). Users who do not want locking can use "--lock=none" This will be used in a read-only capacity for watching mailboxes for keyword updates via inotify or EVFILT_VNODE.
2021-02-24lei: avoid needless env passing to subcommands
We already localize %ENV before calling dispatch(), so it's needless overhead in spawn() to be checking env for undef values in those cases.
2021-02-23lei_to_mail: remove unused OnDestroy import
2021-02-22lei q: reduce wasted IMAP connection for auth
We can rework the first lei2mail worker to authenticate, and then share auth info with the rest of the lei2mail workers. As with "lei import", this uses PktOp and lei-daemon to share updated credentials between the first an subsequent l2m workers.
2021-02-21lei2mail: parallel augment for lock-free stores
This lets us make use of multiple cores on IMAP and Maildir backed by SSD (or better) storage. This benefits IMAP stores with high network latency, but may still penalize IMAP servers with rotational storage.
2021-02-21net_reader: use and accept URIimap objects in more places
This flexibility should save us some code down-the-line.
2021-02-21lei q: move augment into lei2mail workers
This is a step which will allow us to parallelize augment on Maildir and IMAP.
2021-02-21lei q: support IMAP/IMAPS --output destinations
Augment (and dedupe) aren't parallel, yet, so its more sensitive to high-latency networks.
2021-02-19lei_to_mail: get rid of empty _post_augment_maildir
We won't have _post_augment_imap when we add IMAP support, either. _pre_augment_imap will not exist, either, since opening an IMAP(S) connection can be time consuming so we'll roll that into imap_common_init.
2021-02-19t/lei-externals: favor "-o format:$PATHNAME" over "-f"
It'll be less ambiguous for inputs with "lei convert" and "lei import" cf. https://public-inbox.org/meta/20210217044032.GA17934@dcvr/
2021-02-19lei_to_mail: Maildir: ensure link(2) succeeds
link(2) may fail with errors other than EEXIST; just bail out since something is likely seriously wrong.
2021-02-18lei convert: mail format conversion sub-command
This will make testing IMAP support for other commands easier, as it doesn't write to lei/store at all. Like the pager and MUA, "git credential" is always spawned by script/lei (and not lei-daemon) so it has a controlling terminal for password prompts. v2: fix missing requires, correct test ordering v3: ensure config exists for IMAP auth
2021-02-10lei: split out MdirReader package, lazy-require earlier
We'll do more requires in the top-level lei-daemon process to save work in workers. We can also work towards aborting on user errors in lei-daemon rather than worker processes. "lei import -f mbox*" is finally tested inside t/lei_to_mail.t
2021-02-08lei q: improve remote mboxrd UX + MUA
For early MUA spawners using lock-free outputs, we we need to on the startq pipe to silence progress reporting. For --augment users, we can start the MUA even earlier by creating Maildirs in the pre-augment phase. To improve progress reporting for non-MUA (or late-MUA) spawners, we'll no longer blindly append "--compressed" to the curl(1) command when POST-ing for the gzipped mboxrd. Furthermore, we'll overload stringify ('""') in LeiCurl to ensure the empty -d '' string shows up properly. v2: fix startq waiting with --threads mset_progress is never shown with early MUA spawning, The plan is to still show progress when augmenting and deduping. This fixes all local search cases. A leftover debug bit is dropped, too
2021-02-08lei import: support Maildirs
It seems to be working trivially, though I'm probably going to split out Maildir reading into a separate package rather than using LeiToMail.
2021-02-07ipc: wq_do => wq_io_do
We will have a ->wq_do that doesn't pass FDs for I/O.
2021-02-05lei q: eliminate $not_done temporary git dir hack
Another step towards simplifying lei internals. None of our current uses of ->wq_do involve FD passing, and the plan is only rely on FD passing between lei-daemon and lei(1). Internally, it ought to be possible for lei-daemon internal bits to be ordered properly to not need FD passing.
2021-02-05eml: handle warning ignores for lei
There's nothing we can do about bad emails in our search results, so quiet things down and don't fight the MUA for the terminal.
2021-02-05lei q: reinstate early MUA spawn for Maildir
Once all files are written, we can use utime() to poke Maildirs to wake up MUAs that fail to account for nanosecond timestamps resolution.
2021-02-05lei q: reorder internals to reduce FD passing
While FD passing is critical for script/lei <=> lei-daemon, lei-daemon doesn't need to use it internally if FDs are created in the proper order before forking.
2021-02-04lei: reduce FD pressure from lei2mail worker
lei2mail doesn't need stdin anymore, so we can use the [0] slot for the $not_done keepalive purposes.
2021-02-01lei_to_mail: reduce spew on Maildir removal
At most, we'll only warn once per worker when a Maildir disappears from under us. We'll also use the '!' OpPipe to note the exceptional condition, and use '|' to SIGPIPE so it'll be a bit easier for hackers to remember.
2021-02-01lei: remove SIGPIPE handler
It doesn't save us any code, and the action-at-a-distance element was making it confusing to track down actual problems. Another potential problem was keeping references alive too long. So do like we would a C100K server and check every write while still ensuring lei(1) exit with a proper SIGPIPE iff needed.
2021-02-01ipc: switch wq to use the event loop
This will let us to maximize the capability of our asynchronous git API. This lets us avoid relying on EOF to notify lei2mail workers; thus giving us the option of running fewer lei_xsearch worker processes in parallel than local sources. I tried using a synchronous git API; and even with libgit2 in the same process to avoid the IPC cost failed to match the throughput afforded by this change. This is because libgit2 is built (at least on Debian) with the SHA-1 collision code enabled and ubc_check stuff was dominating my profiles.
2021-02-01lei: more consistent dedupe and ovv_buf init
This fixes "--dedupe none" with Maildir where we don't create the object at all.
2021-01-30lei: less error-prone FD mapping
Keeping track of non-standard FDs gets tricky, so make it easier by relying on st_dev/st_ino mapping in the transmitted objects. We'll keep using numbers for the standard FDs since we need to be able to easily redirect them in the producer (main daemon) process for (gzip|bzip2|xz) if writing to a compressed mbox.
2021-01-24ipc: get rid of wq_set_recv_modes
Just open every FD as read/write. Perl (or any non-broken runtime) won't care and won't attempt to use F_SETFL to alter file description flags; as attempting to change those would lead to unpleasant side effects if the file description is shared with another process.
2021-01-23lei: oneshot: preserve stdout if writing mbox
We still need stdout if launching an MUA.
2021-01-23lei_to_mail: drop cyclic reference if not using IPC
This may fix another interrupt-related segfault I'm occasionally seeing (but so far unable to reproduce).
2021-01-23lei: support remote externals
Via curl(1), since that lets us easily use tor on a per-connection basis via LD_PRELOAD (torsocks) or proxy. We'll eventually support more curl options which can allow users to get past firewalls and deal with other odd network configurations.
2021-01-22lei_to_mail: avoid segfault on exit
Worker exit causes DESTROY ordering to become unpredictable and leads to Perl segfaulting. Instead, rely on OnDestroy and explicit triggering after wq_worker_loop to ensure we finish all outstanding git requests before worker exit.