Date | Commit message (Collapse) |
|
set_eml will clobber any existing keywords. Since remote
mboxrds cannot (and should not) be sending keywords to us,
we shouldn't let remote external requests clobber already-set
keywords if they exist.
|
|
We must issue LeiStore->done if a client disconnects
while we're streaming from a remote external. This
can happen via SIGPIPE, or if a client process is
interrupted by any other means.
|
|
I was just wondering this myself :x
|
|
While this diverges from from mairix(1) behavior, it's the safer
option. We'll follow Debian policy by supporting fcntl and
dotlocks by default (in that order). Users who do not want
locking can use "--lock=none"
This will be used in a read-only capacity for watching
mailboxes for keyword updates via inotify or EVFILT_VNODE.
|
|
This can be used to quickly distinguish messages which were
direct hits when doing thread expansion vs messages that
were merely part of the same thread.
This is NOT mairix-derived behavior, but I occasionally found
it useful when looking at results in an MUA to know whether
a message was a direct hit or not.
This makes "-t" consistent with non-"-t" cases as far as keyword
reading goes.
|
|
This lets users avoid network traffic on subsequent searches at
the expense of local disk space. --no-import-remote may be
specified to reverse this trade-off for users with little
storage.
|
|
We already localize %ENV before calling dispatch(), so
it's needless overhead in spawn() to be checking env for
undef values in those cases.
|
|
We can rework the first lei2mail worker to authenticate, and
then share auth info with the rest of the lei2mail workers. As
with "lei import", this uses PktOp and lei-daemon to share
updated credentials between the first an subsequent l2m workers.
|
|
This lets us make use of multiple cores on IMAP and Maildir
backed by SSD (or better) storage. This benefits IMAP stores
with high network latency, but may still penalize IMAP servers
with rotational storage.
|
|
We can use this to ensure sharded work doesn't do unexpected
things if workers are added/removed. We currently don't
increase/decrease workers once a workqueue is started, but
non-lei code (-httpd/imapd) may start doing so.
This also fixes a bug where lei2mail workers could not
be adjusted via --jobs on the command-line.
|
|
This is a step which will allow us to parallelize augment
on Maildir and IMAP.
|
|
While using utime on the destination Maildir is enough for mutt
to eventually notice new mail, "eventually" isn't good enough.
Send a SIGWINCH to wake mutt (and likely other MUAs)
immediately. This is more portable than relying on MUAs to
support inotify or EVFILT_VNODE.
|
|
This will probably cover full Atom/HTML feed generation or any
outputs which are order-dependent, but those aren't prioritized
at the moment.
|
|
For early MUA spawners using lock-free outputs, we we need to
on the startq pipe to silence progress reporting. For
--augment users, we can start the MUA even earlier by
creating Maildirs in the pre-augment phase.
To improve progress reporting for non-MUA (or late-MUA)
spawners, we'll no longer blindly append "--compressed" to the
curl(1) command when POST-ing for the gzipped mboxrd.
Furthermore, we'll overload stringify ('""') in LeiCurl to
ensure the empty -d '' string shows up properly.
v2: fix startq waiting with --threads
mset_progress is never shown with early MUA spawning,
The plan is to still show progress when augmenting and
deduping. This fixes all local search cases.
A leftover debug bit is dropped, too
|
|
Nobody is expected to use long options, but for consistency
with mairix(1), we'll use the pluralized option throughout
(including existing PublicInbox::{Search,SearchView}).
Link: https://public-inbox.org/meta/20210206090119.GA14519@dcvr/
|
|
We're able to propagate $? from wq_workers in a consistent
manner, now.
|
|
We will have a ->wq_do that doesn't pass FDs for I/O.
|
|
This also updates lei_xsearch to follow the same pattern for
stopping curl(1) and tail(1) processes it spawns.
|
|
This can be useful for users who want to clone and
mirror an existing public-inbox. This doesn't have
update support, yet, so users will need to run
"git fetch && public-inbox-index" for now.
|
|
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
|
|
Reaping is handled by the parent PublicInbox::IPC, and we
have no business using PublicInbox::Import since LeiXSearch
won't write to git directly (it will write via LeiStore).
|
|
Another step towards simplifying lei internals.
None of our current uses of ->wq_do involve FD passing, and the
plan is only rely on FD passing between lei-daemon and lei(1).
Internally, it ought to be possible for lei-daemon internal bits
to be ordered properly to not need FD passing.
|
|
Once all files are written, we can use utime() to poke Maildirs
to wake up MUAs that fail to account for nanosecond timestamps
resolution.
|
|
No need to be starting a pager if we're writing to a regular file.
|
|
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
|
|
Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
|
|
This will be useful on shared machines when a user doesn't want
search queries visible to other users looking at the ps(1)
output or similar.
|
|
IO::Uncompress::Gunzip seems to be losing $? when closing
PublicInbox::ProcessPipe. To workaround this, do a synchronous
waitpid ourselves to force proper $? reporting update tests to
use the new --only feature for testing invalid URLs.
This improves internal code consistency by having {pkt_op}
parse the same ASCII-only protocol script/lei understands.
We no longer pass {sock} to worker processes at all,
further reducing FD pressure on per-user limits.
|
|
No reason to check for $lei->{oneshot} here.
|
|
This comma-delimited parameter allows controlling the number or
lei_xsearch and lei2mail worker processes. With the change
to make IPC wq_* work use the event loop, it's now safe to
run fewer worker processes for searching with no risk of
deadlocks.
MAX_PER_HOST isn't configurable yet for remote hosts,
and maybe it shouldn't be due to potential for abuse.
|
|
We won't be reporting progress when output is going to stdout
since it can clutter up the terminal unless stderr != stdout,
which probably isn't worth checking.
We'll also use a more agnostic mset_progress which may
make it easier to support worker-less invocations.
|
|
We can safely rely on exit(0) here when interacting with curl(1)
and git(1), unlike query workers which hit Xapian directly,
where some badness happens when hit with a signal while
retrieving an mset.
|
|
Avoid on-stack shortcuts which may prevent destructors from
firing since we're not inside the event loop. We'll also tidy
up the unlink mechanism in LeiOverview while we're at it.
|
|
The daemon must not be fooled into thinking it's in oneshot
after a lei client disconnects and erases {sock}.
|
|
We may have further URLs to read in that process, so ensure
we don't end up having tail send stale data.
|
|
Sometimes it can be confusing for "lei q" to finish writing to a
Maildir|mbox and not know if it did anything. So show some
per-external progress and stats.
These can be disabled via the new --quiet/-q switch.
We differ slightly from mairix(1) here, as we use stderr
instead of stdout for reporting totals (and we support
parallel queries from various sources).
|
|
This will allow us to use larger messages and do progress
reporting to accumulate in the main daemon.
|
|
At most, we'll only warn once per worker when a Maildir
disappears from under us. We'll also use the '!' OpPipe
to note the exceptional condition, and use '|' to SIGPIPE
so it'll be a bit easier for hackers to remember.
|
|
We use $smsg->populate here, so ensure it's loaded although
PublicInbox::Search currently loads it.
|
|
This prevents SharedKV->DESTROY in lei-daemon from triggering
before DB handles are closed in lei2mail processes. The
{each_smsg_not_done} pipe was not sufficient in this case:
that gets closed at the end of the last git_to_mail callback
invocation.
|
|
It doesn't save us any code, and the action-at-a-distance
element was making it confusing to track down actual problems.
Another potential problem was keeping references alive too long.
So do like we would a C100K server and check every write
while still ensuring lei(1) exit with a proper SIGPIPE
iff needed.
|
|
Keeping track of non-standard FDs gets tricky, so make it easier
by relying on st_dev/st_ino mapping in the transmitted objects.
We'll keep using numbers for the standard FDs since we need to
be able to easily redirect them in the producer (main daemon)
process for (gzip|bzip2|xz) if writing to a compressed mbox.
|
|
Copy+paste error :x
|
|
torsocks is just one of many ways to get curl to use Tor,
so we'll continue if we can't find torsocks in our PATH
and assume the user has a proxy configured via curlrc,
the command-line, environment variable, or even firewall
rules.
|
|
This ought to provide a better user experience for
users if they attempt to use remote externals but
don't have curl installed.
We can avoid repeating PATH search in every worker here, too.
|
|
curl(1) writes to stderr one byte-at-a-time (presumably for the
progress bar). This ends up being unreadable on my terminal
when parallel processes are trying to write error messages.
So instead, we'll capture the output to a file and run
'tail -f' on it if --verbose is enabled.
Since HTTP 404s from non-existent results are a common response,
we'll ignore them and stay silent, matching behavior of local
searches.
|
|
Having parse_references in OverIdx was awkward and Smsg is
a better place for it.
|
|
We can't (and don't need to) repeatedly get the $each_smsg
callback for each URI since that clobbers {ovv_buf} before
it can be output.
I initially thought this was a dedupe-related bug and
moved the dedupe code into the $each_smsg callback to
minimize differences. Nevertheless it's a nice code
reduction.
I also thought it was related to incomplete smsg info,
so {references} is now filled in correctly for dedupe.
|
|
It appears Content-Length and/or Content-Type headers are
required by nginx with POST requests.
varnish alone doesn't have this requirement and my (perhaps
lossy) reading of RFC 2616, 7230, 7231 didn't note this, either.
In any case, we must support nginx even if it's overly strict.
Reported-By: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v9bmswkh.fsf@kyleam.com/
|
|
This can be useful for testing remote behavior, or for
augmenting local results. It'll also be possible to explicitly
include/exclude externals via CLI switches (once names are
decided).
|