Date | Commit message (Collapse) |
|
$wq->{-ipc_atfork_child_close} neededed to be initialized properly.
And start setting $0 in workers to improve visibility.
|
|
From_ lines are shown when mbox* variants are output to stdout,
making {oid} and {pct} information visible without risking being
propagated to other importer processes if they were in
lei-specific X-* headers.
Maildirs already had OIDs in the filename, now they gain Xapian
{pct} in case anybody cares.
|
|
It doesn't seem to matter at the moment, but it should
save us from some surprises down the line.
|
|
We may attempt to write an mbox to any terminal, block, or
character device, not just regular files and FIFOs/pipes.
The only thing that is known to not work is a directory.
Sockets may be possible with some OSes (e.g. Plan 9) or
filesystems. This fixes t/lei.t on FreeBSD 11.x
|
|
Because user errors happen...
|
|
We need to delay writing out the mailbox until the compressor
process is up and running, so have startq wait a bit. This
means we must create the pipe early and hand it off to the
workers before augmenting, despite spawning the
gzip/pigz/xz/bzip2 process after augment is complete.
|
|
I'm not sure why, but mutt sometimes won't detect small
quickly. We'll display a progress bar meter when writing
results, instead.
|
|
We need to properly propagate SIGPIPE to the top-level
lei-daemon process and avoid relying on auto-close,
since auto-close triggers Perl warnings when explicit
close() does not.
|
|
We don't need the result of query_prepare (for augmenting or
mass unlinking) until we're ready to deduplicate and write
results to the filesystem. This ought to let us hide some of
the cost of Xapian searches on multi-device/core systems for
extremely expensive searches.
|
|
Instead of optimizing our own performance, this optimizes
our data to reduce work done by the MUA consumer.
Maildir and mbox destinations no longer support any notion of
the IMAP \Recent flag. JMAP has no functioning \Recent
equivalent, and neither do we.
In practice, having MUAs (e.g. mutt) clear the \Recent flag when
committing changes to the mbox is expensive: it creates a
rename(2) storm with Maildir and overwrites the entire mbox.
For mboxcl2 (and mboxcl), we'll further optimize mutt behavior
by setting the Lines: header in addition to Content-Length.
With these changes, mutt exits instantaneously on mboxcl2,
mboxcl, and Maildirs generated by "lei q".
|
|
With 4 dedicated workers, this seems to provide a 100-120%
speedup on a 4 core machine when writing thousands of search
results to a Maildir or mbox. This also sets us up for
high-latency IMAP destinations in the future.
This opens the door to more speedup opportunities such
as optimizing dedupe locking and other ways to reduce
contention.
This change is fairly complex and convoluted, unfortunately.
Further work may allow us to simplify it and even improve
performance.
|
|
It can be convenient to invoke an MUA as search results
are being written to it, as an eager person may want to
start seeing results ASAP. This lets Maildir users
see results in the MUA as we are writing them. Users
of IMAP will eventually be able to take advantage of
them, too.
Since we don't support mbox locking (yet?), we'll only invoke
the MUA after results are done for mbox formats.
|
|
All the augment and deduplication stuff seems to be working
based on unit tests. OpPipe is a nice general addition that
will probably make future state machines easier.
|
|
We'll be doing most of the work in forked off worker processes,
so ensure some of it is fork and serialization-friendly.
|
|
Parallelism and interactivity with pager + SIGPIPE needs work;
but results are shown and phrase search works without shell
users having to apply Xapian quoting rules on top of standard
shell quoting.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Opening a FIFO with O_RDWR always succeeds on Linux, which
cause the cat(1) process invoked by t/lei_to_mail.t to get
stuck. Furthermore O_APPEND makes no sense on FIFOs and
perhaps there's some kernel out there which will reject it.
|
|
This matches mairix(1) behavior and may be safer if there's
concurrent readers on the existing mbox, especially since
we don't do currently implement mbox locking (nor does mairix).
|
|
Maildir should be plenty fine for short-lived output folders.
|
|
Users may wish to pipe output to "git am", "spamc",
or similar, so we need to support those cases and
not bail out on lseek(2) or ftruncate(2) failures.
|
|
LeiDedupe requires SQLite, so we may want to be able to test
writing mail without DBI or SQLite down the line.
|
|
For writing mboxes and Maildirs, users may wish to use
stricter or looser deduplication strategies. This
gives them more control.
|
|
--augment will match the mairix(1) option of the same
name to augment existing search results. We'll need
to implement deduplication for a better user experience.
mutt ships with compressed mbox support for bz2 and xz,
at least, so we'll support those out-of-the-box.
|
|
We'll allow using multiple workers to write to a single
mbox (which could be compressed). This is can be done
safely with O_APPEND + syswrite for uncompressed files,
and using a lock when piping to pigz/gzip/bzip2/xz.
|
|
No Maildir, support, yet, but it'll come.
|