about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiQuery.pm
DateCommit message (Collapse)
2021-02-07lei: remove short switch support for curl(1) options
In particular, -U and -u switches may conflict with diff(1) options we may need for "lei show" which will use solver remotely or locally.
2021-02-07lei_curl: replace -K/--config with --curl-config
Seeing --config in the command-line for lei may mislead users into thinking we support config file overrides that way. Rename the option to --curl-config and drop the short switch for now.
2021-02-07lei_query: trim curl options
Get rid of short options which will or may conflict with some of our own. We may switch over to "git -c http.*" options since we need to run "git clone" and "git fetch" anyways.
2021-02-05lei_query: remove uneeded dwaitpid import
All process management is handled elsewhere.
2021-02-05lei q: delay worker spawn
Now that --stdin support is sorted, we can delay spawning workers until we know the query is ready-to-run.
2021-02-04lei q: support reading queries from stdin
This will be useful on shared machines when a user doesn't want search queries visible to other users looking at the ps(1) output or similar.
2021-02-04lei: complete basenames for include|exclude|only
This will make it even easier for RSI-afflicted users to use, since many externals may share a common prefix.
2021-02-04lei q: -I/--exclude/--only support globs and basenames
We can do basename matching when it's unambiguous. Since '*?[]' characters are rare in URLs and pathnames, we'll do glob matching by default to support a (curl-inspired) --globoff/-g option to disable globbing. And fix --exclude while we're at it
2021-02-03lei q: support --jobs [SEARCHERS],[WRITERS]
This comma-delimited parameter allows controlling the number or lei_xsearch and lei2mail worker processes. With the change to make IPC wq_* work use the event loop, it's now safe to run fewer worker processes for searching with no risk of deadlocks. MAX_PER_HOST isn't configurable yet for remote hosts, and maybe it shouldn't be due to potential for abuse.
2021-02-03lei q: do not leave temporary files after oneshot exit
Avoid on-stack shortcuts which may prevent destructors from firing since we're not inside the event loop. We'll also tidy up the unlink mechanism in LeiOverview while we're at it.
2021-02-03lei: q: shell completion for --(include|exclude|only)
Because .onion URLs names are long!
2021-02-03lei q: support --only, --include and --exclude
-I is short for --include since it's standard for C compilers (along with Perl and Ruby). There are no single-character shortcuts for --exclude or --only, since I don't expect --exclude to be used very often and --only is already short (and will support shell completion).
2021-02-03lei_query: default to 10000 messages as documented
Otherwise, we were only getting 50 matches without (-t) thread expansion.
2021-01-24lei q: honor --no-local to force remote searches
This can be useful for testing remote behavior, or for augmenting local results. It'll also be possible to explicitly include/exclude externals via CLI switches (once names are decided).
2021-01-24lei q: disable remote externals if locals exist
--remote should be explicitly enabled if local externals are present, since users may be offline or on expensive + metered Internet while traveling. In the future, --remote will probably default to caching/memoizing all messages it fetches to increase the usefulness of --local.
2021-01-24lei q: limit concurrency to 4 remote connections
Unfortunately, this isn't a per-host limit, yet; but nevertheless reduces load on existing PublicInbox::WWW instances, since requesting a mboxrd is one of the more expensive operations.
2021-01-23lei q: support a bunch of curl(1) options
Some of these options will make sense when on weird networks (behind firewalls, etc.) Some of these options may not make sense at all. This allows users who prefer to use the SOCKS5 proxy support in curl rather than torsocks(1), but we'll still support torsocks by default since some Tor instances aren't on the default 127.0.0.1:9050.
2021-01-23lei: move external vivification to xsearch
This seems like a better place to put it given upcoming URI support, which starts in this commit.
2021-01-22lei: fix inadvertant FD sharing
$wq->{-ipc_atfork_child_close} neededed to be initialized properly. And start setting $0 in workers to improve visibility.
2021-01-21lei q: cleanup store initialization
Since we no longer leak an FD for over.sqlite3, we can initialize and actually enable it by default as originally intended.
2021-01-18lei q: parallelize Maildir and mbox writing
With 4 dedicated workers, this seems to provide a 100-120% speedup on a 4 core machine when writing thousands of search results to a Maildir or mbox. This also sets us up for high-latency IMAP destinations in the future. This opens the door to more speedup opportunities such as optimizing dedupe locking and other ways to reduce contention. This change is fairly complex and convoluted, unfortunately. Further work may allow us to simplify it and even improve performance.
2021-01-18lei: q: results output to Maildir and mbox* working
All the augment and deduplication stuff seems to be working based on unit tests. OpPipe is a nice general addition that will probably make future state machines easier.
2021-01-14lei q: reinstate smsg dedupe
Now that dedupe is serialization and fork-safe, we can wire it back up in our query results paths.
2021-01-14lei: test SIGPIPE, stop xsearch workers on client abort
The new test ensures consistency between oneshot and client/daemon users. Cancelling an in-progress result now also stops xsearch workers to avoid wasted CPU and I/O. Note the lei->atfork_child_wq usage changes, it is to workaround a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> This switches the internal protocol to use SOCK_SEQPACKET AF_UNIX sockets to prevent merging messages from the daemon to client to run pager and kill/exit the client script.
2021-01-12lei: query: restore JSON output overview
This internal API is better suited for fork-friendliness (but locking + dedupe still needs to be re-added). Normal "json" is the default, though stream-friendly "concatjson" and "jsonl" (AKA "ndjson" AKA "ldjson") all seem working (though tests aren't working, yet). For normal "json", the biggest downside is the necessity of a trailing "null" element at the end of the array because of parallel processes, since (AFAIK) regular JSON doesn't allow trailing commas, unlike JavaScript.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12lei: run pager in client script
While most single keystrokes work fine when the pager is launched from the background daemon, Ctrl-C and WINCH can cause strangeness when connected to the wrong terminal.
2021-01-12lei: fork + FD cleanup
Do a better job of closing FDs that we don't want shared with the work queue workers. We'll also fix naming and use "atfork_prepare" instead of "atfork_parent" to match pthread_atfork(3) naming.
2021-01-12lei: get rid of client {pid} field
Using kill(2) is too dangerous since extremely long queries may mean the original PID of the aborted lei(1) client process to be recycled by a new process. It would be bad if the lei_xsearch worker process issued a kill on the wrong process. So just rely on sending the exit message via socket.
2021-01-12lei: query: ensure pager exit is instantaneous
Improve interactivity and user experience by allowing the user to return to the terminal immediately when the pager is exited (e.g. hitting the `q' key in less(1)). This is a massive change which restructures query handling to allow parallel search when --thread expansion is in use and offloading to a separate worker when --thread is not in use. The Xapian query offload changes allow us to reenter the event loop right away once the search(es) are shipped off to the work queue workers. This means the main lei-daemon process can forget the lei(1) client socket immediately once it's handed off to worker processes. We now unblock SIGPIPE in query workers and send an exit(141) response to the lei(1) client socket to denote SIGPIPE. This also allows parallelization for users using "lei q" from multiple terminals. JSON output is currently broken and will need to be restructured for more flexibility and fork-safety.
2021-01-12lei q: deduplicate smsg
We don't want duplicate messages in results overviews, either.
2021-01-12lei query + pagination sorta working
Parallelism and interactivity with pager + SIGPIPE needs work; but results are shown and phrase search works without shell users having to apply Xapian quoting rules on top of standard shell quoting.