Date | Commit message (Collapse) |
|
In particular, -U and -u switches may conflict with diff(1)
options we may need for "lei show" which will use solver
remotely or locally.
|
|
Seeing --config in the command-line for lei may mislead users
into thinking we support config file overrides that way. Rename
the option to --curl-config and drop the short switch for now.
|
|
Get rid of short options which will or may conflict with
some of our own. We may switch over to "git -c http.*"
options since we need to run "git clone" and "git fetch"
anyways.
|
|
All process management is handled elsewhere.
|
|
Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
|
|
This will be useful on shared machines when a user doesn't want
search queries visible to other users looking at the ps(1)
output or similar.
|
|
This will make it even easier for RSI-afflicted users to use,
since many externals may share a common prefix.
|
|
We can do basename matching when it's unambiguous. Since '*?[]'
characters are rare in URLs and pathnames, we'll do glob
matching by default to support a (curl-inspired) --globoff/-g
option to disable globbing.
And fix --exclude while we're at it
|
|
This comma-delimited parameter allows controlling the number or
lei_xsearch and lei2mail worker processes. With the change
to make IPC wq_* work use the event loop, it's now safe to
run fewer worker processes for searching with no risk of
deadlocks.
MAX_PER_HOST isn't configurable yet for remote hosts,
and maybe it shouldn't be due to potential for abuse.
|
|
Avoid on-stack shortcuts which may prevent destructors from
firing since we're not inside the event loop. We'll also tidy
up the unlink mechanism in LeiOverview while we're at it.
|
|
Because .onion URLs names are long!
|
|
-I is short for --include since it's standard for C compilers
(along with Perl and Ruby). There are no single-character
shortcuts for --exclude or --only, since I don't expect
--exclude to be used very often and --only is already short (and
will support shell completion).
|
|
Otherwise, we were only getting 50 matches without (-t)
thread expansion.
|
|
This can be useful for testing remote behavior, or for
augmenting local results. It'll also be possible to explicitly
include/exclude externals via CLI switches (once names are
decided).
|
|
--remote should be explicitly enabled if local externals are
present, since users may be offline or on expensive + metered
Internet while traveling.
In the future, --remote will probably default to
caching/memoizing all messages it fetches to increase the
usefulness of --local.
|
|
Unfortunately, this isn't a per-host limit, yet; but
nevertheless reduces load on existing PublicInbox::WWW
instances, since requesting a mboxrd is one of the more
expensive operations.
|
|
Some of these options will make sense when on weird networks
(behind firewalls, etc.) Some of these options may not make
sense at all.
This allows users who prefer to use the SOCKS5 proxy support in
curl rather than torsocks(1), but we'll still support torsocks
by default since some Tor instances aren't on the default
127.0.0.1:9050.
|
|
This seems like a better place to put it given upcoming
URI support, which starts in this commit.
|
|
$wq->{-ipc_atfork_child_close} neededed to be initialized properly.
And start setting $0 in workers to improve visibility.
|
|
Since we no longer leak an FD for over.sqlite3, we can
initialize and actually enable it by default as originally
intended.
|
|
With 4 dedicated workers, this seems to provide a 100-120%
speedup on a 4 core machine when writing thousands of search
results to a Maildir or mbox. This also sets us up for
high-latency IMAP destinations in the future.
This opens the door to more speedup opportunities such
as optimizing dedupe locking and other ways to reduce
contention.
This change is fairly complex and convoluted, unfortunately.
Further work may allow us to simplify it and even improve
performance.
|
|
All the augment and deduplication stuff seems to be working
based on unit tests. OpPipe is a nice general addition that
will probably make future state machines easier.
|
|
Now that dedupe is serialization and fork-safe, we can
wire it back up in our query results paths.
|
|
The new test ensures consistency between oneshot and
client/daemon users. Cancelling an in-progress result now also
stops xsearch workers to avoid wasted CPU and I/O.
Note the lei->atfork_child_wq usage changes, it is to workaround
a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
This switches the internal protocol to use SOCK_SEQPACKET
AF_UNIX sockets to prevent merging messages from the daemon to
client to run pager and kill/exit the client script.
|
|
This internal API is better suited for fork-friendliness (but
locking + dedupe still needs to be re-added).
Normal "json" is the default, though stream-friendly "concatjson"
and "jsonl" (AKA "ndjson" AKA "ldjson") all seem working
(though tests aren't working, yet).
For normal "json", the biggest downside is the necessity of a
trailing "null" element at the end of the array because of
parallel processes, since (AFAIK) regular JSON doesn't allow
trailing commas, unlike JavaScript.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
While most single keystrokes work fine when the pager is
launched from the background daemon, Ctrl-C and WINCH can cause
strangeness when connected to the wrong terminal.
|
|
Do a better job of closing FDs that we don't want shared with
the work queue workers. We'll also fix naming and use
"atfork_prepare" instead of "atfork_parent" to match
pthread_atfork(3) naming.
|
|
Using kill(2) is too dangerous since extremely long
queries may mean the original PID of the aborted lei(1)
client process to be recycled by a new process. It would
be bad if the lei_xsearch worker process issued a kill
on the wrong process.
So just rely on sending the exit message via socket.
|
|
Improve interactivity and user experience by allowing the user
to return to the terminal immediately when the pager is exited
(e.g. hitting the `q' key in less(1)).
This is a massive change which restructures query handling to
allow parallel search when --thread expansion is in use and
offloading to a separate worker when --thread is not in use.
The Xapian query offload changes allow us to reenter the event
loop right away once the search(es) are shipped off to the work
queue workers.
This means the main lei-daemon process can forget the lei(1)
client socket immediately once it's handed off to worker
processes.
We now unblock SIGPIPE in query workers and send an exit(141)
response to the lei(1) client socket to denote SIGPIPE.
This also allows parallelization for users using "lei q" from
multiple terminals.
JSON output is currently broken and will need to be restructured
for more flexibility and fork-safety.
|
|
We don't want duplicate messages in results overviews, either.
|
|
Parallelism and interactivity with pager + SIGPIPE needs work;
but results are shown and phrase search works without shell
users having to apply Xapian quoting rules on top of standard
shell quoting.
|