about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-01-24smsg: make parse_references an object method
Having parse_references in OverIdx was awkward and Smsg is a better place for it.
2021-01-24lei q: fix JSON overview with remote externals
We can't (and don't need to) repeatedly get the $each_smsg callback for each URI since that clobbers {ovv_buf} before it can be output. I initially thought this was a dedupe-related bug and moved the dedupe code into the $each_smsg callback to minimize differences. Nevertheless it's a nice code reduction. I also thought it was related to incomplete smsg info, so {references} is now filled in correctly for dedupe.
2021-01-24lei_xsearch: use curl -d '' for nginx compatibility
It appears Content-Length and/or Content-Type headers are required by nginx with POST requests. varnish alone doesn't have this requirement and my (perhaps lossy) reading of RFC 2616, 7230, 7231 didn't note this, either. In any case, we must support nginx even if it's overly strict. Reported-By: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87v9bmswkh.fsf@kyleam.com/
2021-01-24lei q: honor --no-local to force remote searches
This can be useful for testing remote behavior, or for augmenting local results. It'll also be possible to explicitly include/exclude externals via CLI switches (once names are decided).
2021-01-24lei q: disable remote externals if locals exist
--remote should be explicitly enabled if local externals are present, since users may be offline or on expensive + metered Internet while traveling. In the future, --remote will probably default to caching/memoizing all messages it fetches to increase the usefulness of --local.
2021-01-24ipc: get rid of wq_set_recv_modes
Just open every FD as read/write. Perl (or any non-broken runtime) won't care and won't attempt to use F_SETFL to alter file description flags; as attempting to change those would lead to unpleasant side effects if the file description is shared with another process.
2021-01-24ipc: wq supports arbitrarily large payloads
This should not be needed, but somebody using lei could theoretically create thousands of external URLs and only have a handful of workers, which means the per-worker URI list could be large.
2021-01-24lei q: limit concurrency to 4 remote connections
Unfortunately, this isn't a per-host limit, yet; but nevertheless reduces load on existing PublicInbox::WWW instances, since requesting a mboxrd is one of the more expensive operations.
2021-01-24treewide: reseed RNG in child processes
This prevents name conflicts leading to retries and slowdowns in temporary file name generation. No actual data corruption resulted because all temporary files are opened with O_EXCL anyways. This may increase security for IMAP, NNTP, and HTTPS sessions using TLS, but it's all public data anyways.
2021-01-24lei add-external: don't allow non-existent directories
At least not yet, though we may support mirroring via git.
2021-01-23lei forget-external: don't show redundant "not found"
Pathname/URL canonicalization may not change the result at all, so there's no point in trying (and failing) the same form twice if pre and post-canonicalization are identical.
2021-01-23lei q: support a bunch of curl(1) options
Some of these options will make sense when on weird networks (behind firewalls, etc.) Some of these options may not make sense at all. This allows users who prefer to use the SOCKS5 proxy support in curl rather than torsocks(1), but we'll still support torsocks by default since some Tor instances aren't on the default 127.0.0.1:9050.
2021-01-23lei forget-external: just show the location
No need to show the full key name since the user mainly uses the location.
2021-01-23lei completion: handle URLs with port numbers
This improves the experience for developers running local instances of PublicInbox::WWW without permissions to bind port 80 or 443.
2021-01-23lei: default "-f $mfolder" args for common MUAs
At least mail, mailx, mutt, and neomutt follow this convention. Heirloom mailx doesn't support Maildir (our default), but GNU mailutils mail/mailx does.
2021-01-23lei: oneshot: preserve stdout if writing mbox
We still need stdout if launching an MUA.
2021-01-23lei_to_mail: drop cyclic reference if not using IPC
This may fix another interrupt-related segfault I'm occasionally seeing (but so far unable to reproduce).
2021-01-23lei: support remote externals
Via curl(1), since that lets us easily use tor on a per-connection basis via LD_PRELOAD (torsocks) or proxy. We'll eventually support more curl options which can allow users to get past firewalls and deal with other odd network configurations.
2021-01-23lei: move external vivification to xsearch
This seems like a better place to put it given upcoming URI support, which starts in this commit.
2021-01-22lei forget-external: bash completion support
The tricky bit was getting around word splitting bash does on URLs. This may work with other shells, too.
2021-01-22lei: forget-external support with canonicalization
For proper matching, we'll do a better job canonicalizing URLs and path names for matching. Of course, users may edit the file outside of lei, so ensure we try both the canonicalized and as-is form provided by the user. I also don't think we'll need to store externals info in MiscIdx; just the config file is fine.
2021-01-22lei: remove @TO_CLOSE_ATFORK_CHILD
..At least limit it to a single file handle. The write end EOFpipe can be limited in scope and auto-closed when $quit is clobbered, leaving only the listener. The listener is the only handle that needs to be closed explicitly due to it being on the stack in the Listener->event_step => accept_dispatch => lei_$FOO code path. Everything else gets clobbered by DS->Reset in children after forking.
2021-01-22lei_xsearch: reduce reference paths to lxs
Having an extra reference to LeiXSearch from the OpPipe $done_op map is unnecessary and makes the reference graph more complex than it needs to be. Just use $lei->{lxs} to simplify and reduce the likelyhood of bugs.
2021-01-22lei: remove INT/QUIT/TERM handlers, fix daemon EOF
The signal handlers on the client side were unnecessary, all we need is to handle socket EOF properly in the daemon by killing xsearch and l2m workers.
2021-01-22lei: oneshot: use client $io[2] for placeholder
STDERR may actually get closed in ->ipc_atfork_child in oneshot mode, so ensure we pass in a valid file handle to avoid warnings ->wq_do.
2021-01-22lei_to_mail: avoid segfault on exit
Worker exit causes DESTROY ordering to become unpredictable and leads to Perl segfaulting. Instead, rely on OnDestroy and explicit triggering after wq_worker_loop to ensure we finish all outstanding git requests before worker exit.
2021-01-22lei: fix inadvertant FD sharing
$wq->{-ipc_atfork_child_close} neededed to be initialized properly. And start setting $0 in workers to improve visibility.
2021-01-22lei: show {pct} and {oid} in From_ lines and filenames
From_ lines are shown when mbox* variants are output to stdout, making {oid} and {pct} information visible without risking being propagated to other importer processes if they were in lei-specific X-* headers. Maildirs already had OIDs in the filename, now they gain Xapian {pct} in case anybody cares.
2021-01-22lei_xsearch: eliminate some unused, commented-out code
2021-01-22lei q: retrieve keywords for local, non-external messages
This isn't tested for now, so maybe it works.
2021-01-22lei_overview: rename {relevance} => {pct}
The old name was too long compared to the rest of the field names. With the Xapian method being named ->get_percent, "pct" is a well known abbreviation for "percent" and already used internally by our wrapper. ..And cleanup some excess whitespace while we're in the area.
2021-01-21lei_to_mail: call PublicInbox::IPC::DESTROY
It doesn't seem to matter at the moment, but it should save us from some surprises down the line.
2021-01-21lei_xsearch: keep l2m->{-wq_s1} while preparing query
This caused a performance regression which made parallel lei2mail processes fail prematurely and fall back to writing blobs in the lei_xsearch worker.
2021-01-21lei: dump and clear errors.log in daemon mode
Inspired by "dmesg -c", this should help users report bugs and avoids eating up $XDG_RUNTIME_DIR. Once lei is ready for release, hopefully the need for this should be few an far between, but shit happens.
2021-01-21lei q: cleanup store initialization
Since we no longer leak an FD for over.sqlite3, we can initialize and actually enable it by default as originally intended.
2021-01-21overidx: eidx_prep: fix leftover dbh reference
Leaving $dbh in another field was causing over.sqlite3 to remain open after ->dbh_close. Fix up some minor style issues while we're at it.
2021-01-21lei: exit code in oneshot mode
waitpid() in DESTROY ends up setting $? for the exit status, thus we must reap IPC children before calling CORE::exit. This fixes t/lei-oneshot.t with TEST_RUN_MODE=0
2021-01-21lei: allow more mbox inode types
We may attempt to write an mbox to any terminal, block, or character device, not just regular files and FIFOs/pipes. The only thing that is known to not work is a directory. Sockets may be possible with some OSes (e.g. Plan 9) or filesystems. This fixes t/lei.t on FreeBSD 11.x
2021-01-21lei_overview: start implementing format detection
We'll need it for IMAP support, at least. Proper mbox family detection will be expensive, so deal with it later.
2021-01-21lei: test some likely errors due to misuse
Because user errors happen...
2021-01-21t/lei: fix double-running of socket test with oneshot
We split out t/lei-oneshot.t and t/lei.t so it's easier to isolate run-mode specific bugs and behavior and there's no reason to rerun the socket daemon tests.
2021-01-21lei_overview: do not write if $lei->{1} is gone
We'll invalidate the {1} (stdout) field on SIGPIPE, so don't trigger a Perl warning by writing to it.
2021-01-21lei q: fix augment of compressed mailboxes
We need to delay writing out the mailbox until the compressor process is up and running, so have startq wait a bit. This means we must create the pipe early and hand it off to the workers before augmenting, despite spawning the gzip/pigz/xz/bzip2 process after augment is complete.
2021-01-21lei: write daemon errors to the sock directory
Most everything should be captured by the __WARN__ handlers and routed to syslog, but it appears Perl may write to stderr in some emergency cases, as can libc or other libraries. Just point it to a small file that's cleared on reboot.
2021-01-21lei q: do not spawn MUA early
I'm not sure why, but mutt sometimes won't detect small quickly. We'll display a progress bar meter when writing results, instead.
2021-01-21lei q: fix SIGPIPE handling from lei2mail workers
We need to properly propagate SIGPIPE to the top-level lei-daemon process and avoid relying on auto-close, since auto-close triggers Perl warnings when explicit close() does not.
2021-01-21lei q: start ->mset while query_prepare runs
We don't need the result of query_prepare (for augmenting or mass unlinking) until we're ready to deduplicate and write results to the filesystem. This ought to let us hide some of the cost of Xapian searches on multi-device/core systems for extremely expensive searches.
2021-01-18lei_to_mail: optimize for MUAs
Instead of optimizing our own performance, this optimizes our data to reduce work done by the MUA consumer. Maildir and mbox destinations no longer support any notion of the IMAP \Recent flag. JMAP has no functioning \Recent equivalent, and neither do we. In practice, having MUAs (e.g. mutt) clear the \Recent flag when committing changes to the mbox is expensive: it creates a rename(2) storm with Maildir and overwrites the entire mbox. For mboxcl2 (and mboxcl), we'll further optimize mutt behavior by setting the Lines: header in addition to Content-Length. With these changes, mutt exits instantaneously on mboxcl2, mboxcl, and Maildirs generated by "lei q".
2021-01-18lei q: parallelize Maildir and mbox writing
With 4 dedicated workers, this seems to provide a 100-120% speedup on a 4 core machine when writing thousands of search results to a Maildir or mbox. This also sets us up for high-latency IMAP destinations in the future. This opens the door to more speedup opportunities such as optimizing dedupe locking and other ways to reduce contention. This change is fairly complex and convoluted, unfortunately. Further work may allow us to simplify it and even improve performance.
2021-01-18lei q: add --mua-cmd switch
It can be convenient to invoke an MUA as search results are being written to it, as an eager person may want to start seeing results ASAP. This lets Maildir users see results in the MUA as we are writing them. Users of IMAP will eventually be able to take advantage of them, too. Since we don't support mbox locking (yet?), we'll only invoke the MUA after results are done for mbox formats.