Date | Commit message (Collapse) |
|
Note that update_kw_maybe is critical in preventing accidental
data loss with default "lei q --output" behavior.
Also avoid treating (proposed) MH support as lock-free, since
appears to lack specifications for locking and be even worse
than mbox* in that regard...
|
|
Since "lei q" and "lei convert" already support writing these
compressed inboxes, it makes sense that all mbox readers support
them, as well.
Using compression is one reliable way to know an mboxrd or mboxo
hasn't been unexpectedly truncated.
|
|
Since lei-daemon won't have the same FDs as the client, we
need to special-case thse mappings and won't be able to open
arbitrary, non-standard FDs.
We also won't attempt to support /proc/self/fd/[0-2] since
that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err}
are portable to FreeBSD, at least. mawk(1) also supports
/dev/std{out,err}, as does gawk(1) (which supports everything
we can support, and arbitrary /dev/fd/$FD).
|
|
JSON outputs won't write to lei/store at all, so there's
no point in forking the store worker if it's not already
running.
LeiSearch object ($lse) is also fork-safe until it opens a
persistent FD for Xapian/SQLite so we can unconditionally
carry it across fork.
|
|
Otherwise we could get non-sensical results if somebody tries
running "lei atfork_child" from the command-line.
|
|
These changes may make it easier to do byte-for-byte comparisons
with mail copied out of mutt, a popular MUA for our target
audience.
mutt currently outputs the 'R' (seen) flag before the 'O'
character in the Status: header. We'll assume that stays
the case (it has been for a while).
Status now comes before X-Status, also matching mutt behavior.
|
|
"lei q" now preserves changes per-message keywords across
invocations when it's --output (Maildir or mbox) is reused
(with or without --augment).
In the future, these changes will be monitored via inotify,
EVFILT_VNODE or IMAP IDLE, too.
Unfortunately, this currently prevents "lei import" from ever
importing a message that's in an external. That will be fixed
in a future change.
|
|
This will be used for keyword (and label) storage for externals.
We'll be using this to ensure we don't redundantly auto-import
messages into lei/store if they're already in a local external
(they can still be imported explicitly via "lei import").
|
|
Redundant slashes look ugly in strace(1) output.
|
|
Since keywords and mailboxes (AKA labels) are separate things in
JMAP; and only keywords can map reliably to Maildir and mbox;
we'll keep them separate in our internal data representations,
too.
I initially wanted to call this just "meta" for "metadata", but
that might be confused with our mailing list name. "metadata"
is already used in Xapian's own API, to add another layer of
confusion.
"tags" was also considered, but probably confusing to notmuch
users since our "labels" are analogous to "tags" in notmuch,
and notmuch doesn't seem to cover "keywords" separately...
So "vmd" it is, since we haven't used this particular
three-letter-abbreviation anywhere before; and "volatile" seems
like a good description of this metadata since everything else
up to this point has been mostly WORM (write-once, read-many).
|
|
MboxReader is a more appropriate place for it than LeiStore.
|
|
We only want to auto import messages that are exclusively in
remote externals. Messages in local externals are not
auto-imported to save space and reduce wear on storage device.
|
|
Unfortunately, being mairix-compatible with --threads means we
can't change thread-count of gzip, bzip2, or xz when writing to
compressed mbox with a --threads= parameter. It's probably not
worth changing, anyways, so another switch or additional value
for --jobs= won't be added.
While we're in the area, add --rsyncable support since
most installations of gzip support it nowadays.
Fixes: 5beb4a5f6585acd ("lei: replace --thread with --threads")
|
|
commit 6c551bffd75afb41d9b5e4774068abe7e06ed0e7
("lei q: --import-augment for mbox and mbox.gz") added a check to
in _pre_augment_mbox for the option being a ref() to distinguish
between default values and user-supplied values (which are
non-ref SCALARs from Getopt::Long).
However, LeiQuery failed to use a SCALAR ref as the default
value, making the check in _pre_augment_mbox useless. We
now update LeiQuery to use \1 instead of 1 as the default
value so "lei q -f mboxrd ..." to stdout works once again.
Unfortunately, testing with redirects pointed to regular
files didn't trigger the code paths being updated. Testing
with a FIFO revealed further bugs in the FIFO handling code
which are also fixed in this commit.
We'll also update the $lei->out error message to be
less-specific about "stdout" and use the term "output", instead,
since LeiToMail replaces stdout for all mbox outputs.
|
|
Since this importing of keywords is active even when --augment
isn't specified, calling it --import-before seems more
appropriate.
In the future, this will likely default to adding unseen emails
to lei/store, not just updating keywords.
Link: https://public-inbox.org/meta/20210303222930.GA18597@dcvr/T/
|
|
The trickiest output formats we support due to the possibility
of filesystem FIFOS and pipes for <gzip|xz|bzip2>.
This completes another phase of keyword sync support.
|
|
IMAP is similar to Maildir and we can now preserve keyword
updates done on IMAP folders.
|
|
This will eventually be supported for other mail stores,
but Maildir is the easiest to test and support, here.
This lets us avoid a situation where flag changes get
lost between search results.
|
|
This saves us some code and redundant callsites for
eml_from_path. We'll change maildir_each_eml to include the
filename to facilitate an upcoming change to "lei q" without
--augment
|
|
While this diverges from from mairix(1) behavior, it's the safer
option. We'll follow Debian policy by supporting fcntl and
dotlocks by default (in that order). Users who do not want
locking can use "--lock=none"
This will be used in a read-only capacity for watching
mailboxes for keyword updates via inotify or EVFILT_VNODE.
|
|
We already localize %ENV before calling dispatch(), so
it's needless overhead in spawn() to be checking env for
undef values in those cases.
|
|
|
|
We can rework the first lei2mail worker to authenticate, and
then share auth info with the rest of the lei2mail workers. As
with "lei import", this uses PktOp and lei-daemon to share
updated credentials between the first an subsequent l2m workers.
|
|
This lets us make use of multiple cores on IMAP and Maildir
backed by SSD (or better) storage. This benefits IMAP stores
with high network latency, but may still penalize IMAP servers
with rotational storage.
|
|
This flexibility should save us some code down-the-line.
|
|
This is a step which will allow us to parallelize augment
on Maildir and IMAP.
|
|
Augment (and dedupe) aren't parallel, yet, so its more sensitive to
high-latency networks.
|
|
We won't have _post_augment_imap when we add IMAP support,
either.
_pre_augment_imap will not exist, either, since opening an
IMAP(S) connection can be time consuming so we'll roll that
into imap_common_init.
|
|
It'll be less ambiguous for inputs with "lei convert" and "lei import"
cf. https://public-inbox.org/meta/20210217044032.GA17934@dcvr/
|
|
link(2) may fail with errors other than EEXIST; just bail out
since something is likely seriously wrong.
|
|
This will make testing IMAP support for other commands easier, as
it doesn't write to lei/store at all. Like the pager and MUA,
"git credential" is always spawned by script/lei (and not
lei-daemon) so it has a controlling terminal for password
prompts.
v2: fix missing requires, correct test ordering
v3: ensure config exists for IMAP auth
|
|
We'll do more requires in the top-level lei-daemon process to
save work in workers. We can also work towards aborting on
user errors in lei-daemon rather than worker processes.
"lei import -f mbox*" is finally tested inside t/lei_to_mail.t
|
|
For early MUA spawners using lock-free outputs, we we need to
on the startq pipe to silence progress reporting. For
--augment users, we can start the MUA even earlier by
creating Maildirs in the pre-augment phase.
To improve progress reporting for non-MUA (or late-MUA)
spawners, we'll no longer blindly append "--compressed" to the
curl(1) command when POST-ing for the gzipped mboxrd.
Furthermore, we'll overload stringify ('""') in LeiCurl to
ensure the empty -d '' string shows up properly.
v2: fix startq waiting with --threads
mset_progress is never shown with early MUA spawning,
The plan is to still show progress when augmenting and
deduping. This fixes all local search cases.
A leftover debug bit is dropped, too
|
|
It seems to be working trivially, though I'm probably
going to split out Maildir reading into a separate
package rather than using LeiToMail.
|
|
We will have a ->wq_do that doesn't pass FDs for I/O.
|
|
Another step towards simplifying lei internals.
None of our current uses of ->wq_do involve FD passing, and the
plan is only rely on FD passing between lei-daemon and lei(1).
Internally, it ought to be possible for lei-daemon internal bits
to be ordered properly to not need FD passing.
|
|
There's nothing we can do about bad emails in our search
results, so quiet things down and don't fight the MUA for
the terminal.
|
|
Once all files are written, we can use utime() to poke Maildirs
to wake up MUAs that fail to account for nanosecond timestamps
resolution.
|
|
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
|
|
lei2mail doesn't need stdin anymore, so we can use the [0] slot
for the $not_done keepalive purposes.
|
|
At most, we'll only warn once per worker when a Maildir
disappears from under us. We'll also use the '!' OpPipe
to note the exceptional condition, and use '|' to SIGPIPE
so it'll be a bit easier for hackers to remember.
|
|
It doesn't save us any code, and the action-at-a-distance
element was making it confusing to track down actual problems.
Another potential problem was keeping references alive too long.
So do like we would a C100K server and check every write
while still ensuring lei(1) exit with a proper SIGPIPE
iff needed.
|
|
This will let us to maximize the capability of our asynchronous
git API. This lets us avoid relying on EOF to notify lei2mail
workers; thus giving us the option of running fewer lei_xsearch
worker processes in parallel than local sources.
I tried using a synchronous git API; and even with libgit2 in
the same process to avoid the IPC cost failed to match the
throughput afforded by this change. This is because libgit2 is
built (at least on Debian) with the SHA-1 collision code enabled
and ubc_check stuff was dominating my profiles.
|
|
This fixes "--dedupe none" with Maildir where we don't
create the object at all.
|
|
Keeping track of non-standard FDs gets tricky, so make it easier
by relying on st_dev/st_ino mapping in the transmitted objects.
We'll keep using numbers for the standard FDs since we need to
be able to easily redirect them in the producer (main daemon)
process for (gzip|bzip2|xz) if writing to a compressed mbox.
|
|
Just open every FD as read/write. Perl (or any non-broken
runtime) won't care and won't attempt to use F_SETFL to alter
file description flags; as attempting to change those would
lead to unpleasant side effects if the file description is
shared with another process.
|
|
We still need stdout if launching an MUA.
|
|
This may fix another interrupt-related segfault I'm occasionally
seeing (but so far unable to reproduce).
|
|
Via curl(1), since that lets us easily use tor on a
per-connection basis via LD_PRELOAD (torsocks) or proxy.
We'll eventually support more curl options which can allow
users to get past firewalls and deal with other odd network
configurations.
|
|
Worker exit causes DESTROY ordering to become unpredictable and
leads to Perl segfaulting. Instead, rely on OnDestroy and
explicit triggering after wq_worker_loop to ensure we finish
all outstanding git requests before worker exit.
|