Date | Commit message (Collapse) |
|
Having parse_references in OverIdx was awkward and Smsg is
a better place for it.
|
|
We can't (and don't need to) repeatedly get the $each_smsg
callback for each URI since that clobbers {ovv_buf} before
it can be output.
I initially thought this was a dedupe-related bug and
moved the dedupe code into the $each_smsg callback to
minimize differences. Nevertheless it's a nice code
reduction.
I also thought it was related to incomplete smsg info,
so {references} is now filled in correctly for dedupe.
|
|
It appears Content-Length and/or Content-Type headers are
required by nginx with POST requests.
varnish alone doesn't have this requirement and my (perhaps
lossy) reading of RFC 2616, 7230, 7231 didn't note this, either.
In any case, we must support nginx even if it's overly strict.
Reported-By: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v9bmswkh.fsf@kyleam.com/
|
|
This can be useful for testing remote behavior, or for
augmenting local results. It'll also be possible to explicitly
include/exclude externals via CLI switches (once names are
decided).
|
|
--remote should be explicitly enabled if local externals are
present, since users may be offline or on expensive + metered
Internet while traveling.
In the future, --remote will probably default to
caching/memoizing all messages it fetches to increase the
usefulness of --local.
|
|
Just open every FD as read/write. Perl (or any non-broken
runtime) won't care and won't attempt to use F_SETFL to alter
file description flags; as attempting to change those would
lead to unpleasant side effects if the file description is
shared with another process.
|
|
This should not be needed, but somebody using lei could
theoretically create thousands of external URLs and
only have a handful of workers, which means the per-worker
URI list could be large.
|
|
Unfortunately, this isn't a per-host limit, yet; but
nevertheless reduces load on existing PublicInbox::WWW
instances, since requesting a mboxrd is one of the more
expensive operations.
|
|
This prevents name conflicts leading to retries and slowdowns in
temporary file name generation. No actual data corruption
resulted because all temporary files are opened with O_EXCL
anyways.
This may increase security for IMAP, NNTP, and HTTPS sessions
using TLS, but it's all public data anyways.
|
|
At least not yet, though we may support mirroring via git.
|
|
Pathname/URL canonicalization may not change the result at
all, so there's no point in trying (and failing) the same
form twice if pre and post-canonicalization are identical.
|
|
Some of these options will make sense when on weird networks
(behind firewalls, etc.) Some of these options may not make
sense at all.
This allows users who prefer to use the SOCKS5 proxy support in
curl rather than torsocks(1), but we'll still support torsocks
by default since some Tor instances aren't on the default
127.0.0.1:9050.
|
|
No need to show the full key name since the user mainly
uses the location.
|
|
This improves the experience for developers running local
instances of PublicInbox::WWW without permissions to bind
port 80 or 443.
|
|
At least mail, mailx, mutt, and neomutt follow this convention.
Heirloom mailx doesn't support Maildir (our default), but GNU
mailutils mail/mailx does.
|
|
We still need stdout if launching an MUA.
|
|
This may fix another interrupt-related segfault I'm occasionally
seeing (but so far unable to reproduce).
|
|
Via curl(1), since that lets us easily use tor on a
per-connection basis via LD_PRELOAD (torsocks) or proxy.
We'll eventually support more curl options which can allow
users to get past firewalls and deal with other odd network
configurations.
|
|
This seems like a better place to put it given upcoming
URI support, which starts in this commit.
|
|
The tricky bit was getting around word splitting bash
does on URLs. This may work with other shells, too.
|
|
For proper matching, we'll do a better job canonicalizing
URLs and path names for matching. Of course, users may edit
the file outside of lei, so ensure we try both the canonicalized
and as-is form provided by the user.
I also don't think we'll need to store externals info in
MiscIdx; just the config file is fine.
|
|
..At least limit it to a single file handle. The write end
EOFpipe can be limited in scope and auto-closed when $quit is
clobbered, leaving only the listener. The listener is the only
handle that needs to be closed explicitly due to it being on the
stack in the Listener->event_step => accept_dispatch => lei_$FOO
code path.
Everything else gets clobbered by DS->Reset in children after
forking.
|
|
Having an extra reference to LeiXSearch from the OpPipe $done_op
map is unnecessary and makes the reference graph more complex
than it needs to be. Just use $lei->{lxs} to simplify and
reduce the likelyhood of bugs.
|
|
The signal handlers on the client side were unnecessary,
all we need is to handle socket EOF properly in the daemon
by killing xsearch and l2m workers.
|
|
STDERR may actually get closed in ->ipc_atfork_child in
oneshot mode, so ensure we pass in a valid file handle
to avoid warnings ->wq_do.
|
|
Worker exit causes DESTROY ordering to become unpredictable and
leads to Perl segfaulting. Instead, rely on OnDestroy and
explicit triggering after wq_worker_loop to ensure we finish
all outstanding git requests before worker exit.
|
|
$wq->{-ipc_atfork_child_close} neededed to be initialized properly.
And start setting $0 in workers to improve visibility.
|
|
From_ lines are shown when mbox* variants are output to stdout,
making {oid} and {pct} information visible without risking being
propagated to other importer processes if they were in
lei-specific X-* headers.
Maildirs already had OIDs in the filename, now they gain Xapian
{pct} in case anybody cares.
|
|
|
|
This isn't tested for now, so maybe it works.
|
|
The old name was too long compared to the rest of the field
names. With the Xapian method being named ->get_percent,
"pct" is a well known abbreviation for "percent" and already
used internally by our wrapper.
..And cleanup some excess whitespace while we're in the area.
|
|
It doesn't seem to matter at the moment, but it should
save us from some surprises down the line.
|
|
This caused a performance regression which made parallel
lei2mail processes fail prematurely and fall back to
writing blobs in the lei_xsearch worker.
|
|
Inspired by "dmesg -c", this should help users report bugs
and avoids eating up $XDG_RUNTIME_DIR.
Once lei is ready for release, hopefully the need for this
should be few an far between, but shit happens.
|
|
Since we no longer leak an FD for over.sqlite3, we can
initialize and actually enable it by default as originally
intended.
|
|
Leaving $dbh in another field was causing over.sqlite3 to
remain open after ->dbh_close. Fix up some minor style
issues while we're at it.
|
|
waitpid() in DESTROY ends up setting $? for the exit status,
thus we must reap IPC children before calling CORE::exit.
This fixes t/lei-oneshot.t with TEST_RUN_MODE=0
|
|
We may attempt to write an mbox to any terminal, block, or
character device, not just regular files and FIFOs/pipes.
The only thing that is known to not work is a directory.
Sockets may be possible with some OSes (e.g. Plan 9) or
filesystems. This fixes t/lei.t on FreeBSD 11.x
|
|
We'll need it for IMAP support, at least. Proper mbox family
detection will be expensive, so deal with it later.
|
|
Because user errors happen...
|
|
We split out t/lei-oneshot.t and t/lei.t so it's easier
to isolate run-mode specific bugs and behavior and there's
no reason to rerun the socket daemon tests.
|
|
We'll invalidate the {1} (stdout) field on SIGPIPE,
so don't trigger a Perl warning by writing to it.
|
|
We need to delay writing out the mailbox until the compressor
process is up and running, so have startq wait a bit. This
means we must create the pipe early and hand it off to the
workers before augmenting, despite spawning the
gzip/pigz/xz/bzip2 process after augment is complete.
|
|
Most everything should be captured by the __WARN__ handlers and
routed to syslog, but it appears Perl may write to stderr in
some emergency cases, as can libc or other libraries. Just
point it to a small file that's cleared on reboot.
|
|
I'm not sure why, but mutt sometimes won't detect small
quickly. We'll display a progress bar meter when writing
results, instead.
|
|
We need to properly propagate SIGPIPE to the top-level
lei-daemon process and avoid relying on auto-close,
since auto-close triggers Perl warnings when explicit
close() does not.
|
|
We don't need the result of query_prepare (for augmenting or
mass unlinking) until we're ready to deduplicate and write
results to the filesystem. This ought to let us hide some of
the cost of Xapian searches on multi-device/core systems for
extremely expensive searches.
|
|
Instead of optimizing our own performance, this optimizes
our data to reduce work done by the MUA consumer.
Maildir and mbox destinations no longer support any notion of
the IMAP \Recent flag. JMAP has no functioning \Recent
equivalent, and neither do we.
In practice, having MUAs (e.g. mutt) clear the \Recent flag when
committing changes to the mbox is expensive: it creates a
rename(2) storm with Maildir and overwrites the entire mbox.
For mboxcl2 (and mboxcl), we'll further optimize mutt behavior
by setting the Lines: header in addition to Content-Length.
With these changes, mutt exits instantaneously on mboxcl2,
mboxcl, and Maildirs generated by "lei q".
|
|
With 4 dedicated workers, this seems to provide a 100-120%
speedup on a 4 core machine when writing thousands of search
results to a Maildir or mbox. This also sets us up for
high-latency IMAP destinations in the future.
This opens the door to more speedup opportunities such
as optimizing dedupe locking and other ways to reduce
contention.
This change is fairly complex and convoluted, unfortunately.
Further work may allow us to simplify it and even improve
performance.
|
|
It can be convenient to invoke an MUA as search results
are being written to it, as an eager person may want to
start seeing results ASAP. This lets Maildir users
see results in the MUA as we are writing them. Users
of IMAP will eventually be able to take advantage of
them, too.
Since we don't support mbox locking (yet?), we'll only invoke
the MUA after results are done for mbox formats.
|