about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-02-05lei_xsearch: drop unused imports
Reaping is handled by the parent PublicInbox::IPC, and we have no business using PublicInbox::Import since LeiXSearch won't write to git directly (it will write via LeiStore).
2021-02-05lei_query: remove uneeded dwaitpid import
All process management is handled elsewhere.
2021-02-05lei q: eliminate $not_done temporary git dir hack
Another step towards simplifying lei internals. None of our current uses of ->wq_do involve FD passing, and the plan is only rely on FD passing between lei-daemon and lei(1). Internally, it ought to be possible for lei-daemon internal bits to be ordered properly to not need FD passing.
2021-02-05eml: handle warning ignores for lei
There's nothing we can do about bad emails in our search results, so quiet things down and don't fight the MUA for the terminal.
2021-02-05lei q: reinstate early MUA spawn for Maildir
Once all files are written, we can use utime() to poke Maildirs to wake up MUAs that fail to account for nanosecond timestamps resolution.
2021-02-05lei q: only start pager if output is to stdout
No need to be starting a pager if we're writing to a regular file.
2021-02-05lei q: reorder internals to reduce FD passing
While FD passing is critical for script/lei <=> lei-daemon, lei-daemon doesn't need to use it internally if FDs are created in the proper order before forking.
2021-02-05ipc: localize fields assignment
We don't want circular references giving surprising behavior during worker exit.
2021-02-05lei q: delay worker spawn
Now that --stdin support is sorted, we can delay spawning workers until we know the query is ready-to-run.
2021-02-04t/lei: skip "lei q" tests on missing dependencies
... for now. It's probably possible to just use send() recv() without CMSG_* eventually.
2021-02-04pkt_op: do not exit subroutine via "next"
"next" apparently doesn't work in "do {} while" loops, so just use "while" as it makes no difference, here.
2021-02-04wwwaltid: add missing word to instructions
2021-02-04www: call curl with -d '' in the altid instructions
Nginx doesn't appear to be happy with just -XPOST, so use -d '' to avoid potential confusion about why the instructions aren't working. cf. commit 533e1234bc03a1ca8754d249aa8c2ce157e26780 (lei_xsearch: use curl -d '' for nginx compatibility, 2021-01-24)
2021-02-04tests: guard against missing DBD::SQLite
The features we use for SharedKV could probably be implemented with GDBM_File or SDBM_File, but that doesn't seem worth it at the moment since we depend on SQLite elsewhere.
2021-02-04doc: update dependencies (+Storable, Data::Dumper)
The new IPC stuff doesn't work without Storable or Sereal. Storable is part of the standard library since Perl 5.8, so we'll put a hard dependency on it for distros that package it separately. Data::Dumper is also part of the standard library, and PublicInbox::MboxReader uses it, and it's frequently useful during development. We'll also trim down INSTALL for standard library modules so it's hopefully less daunting for new users. Development dependencies are noted in HACKING, now. Email::MIME is only used for maintainer tests, so it's only documented in HACKING.
2021-02-04spawn: merge common C code together
There'll probably be more things which work on both GNU and *BSD systems which we don't need separate strings for.
2021-02-04HACKING: use "just-ahead-of-time" to describe Inline::C
Inline::C works during module load time, so "just-ahead-of-time" is a better description of it than "just-in-time". I don't think "JAOT" is a well-known enough acronym, so it's worth spelling it out.
2021-02-04lei q: support reading queries from stdin
This will be useful on shared machines when a user doesn't want search queries visible to other users looking at the ps(1) output or similar.
2021-02-04lei: use sleep(1) loop for infinite sleep
Perl may internally race and miss signals due to a lack of self-pipe / eventfd / signalfd / EVFILT_SIGNAL usage. While our event loop paths avoid these problems by using signalfd or EVFILT_SIGNAL, thse sleep() calls are not within the event loop.
2021-02-04lei add-external: completion for existing URL basenames
Given the presence of one external on a certain host or prefix path, it's logical other inboxes would share a common prefix. For bash users, attempt to complete that using the "-o nospace" option of bash
2021-02-04lei: help starts pager
Because some commands have many options which take up multiple screens.
2021-02-04lei: complete basenames for include|exclude|only
This will make it even easier for RSI-afflicted users to use, since many externals may share a common prefix.
2021-02-04lei q: -I/--exclude/--only support globs and basenames
We can do basename matching when it's unambiguous. Since '*?[]' characters are rare in URLs and pathnames, we'll do glob matching by default to support a (curl-inspired) --globoff/-g option to disable globbing. And fix --exclude while we're at it
2021-02-04lei: propagate curl errors, improve internal consistency
IO::Uncompress::Gunzip seems to be losing $? when closing PublicInbox::ProcessPipe. To workaround this, do a synchronous waitpid ourselves to force proper $? reporting update tests to use the new --only feature for testing invalid URLs. This improves internal code consistency by having {pkt_op} parse the same ASCII-only protocol script/lei understands. We no longer pass {sock} to worker processes at all, further reducing FD pressure on per-user limits.
2021-02-04lei: err: avoid uninitialized variable warnings
2021-02-04pkt_op: rely on DS::in_loop global
No reason to check for $lei->{oneshot} here.
2021-02-04lei: further reduce lei2mail FD pressure
We don't need to be sending errors directly to the client, but instead go through lei-daemon or the top-level one-shot process.
2021-02-04lei: reduce FD pressure from lei2mail worker
lei2mail doesn't need stdin anymore, so we can use the [0] slot for the $not_done keepalive purposes.
2021-02-03lei q: support --jobs [SEARCHERS],[WRITERS]
This comma-delimited parameter allows controlling the number or lei_xsearch and lei2mail worker processes. With the change to make IPC wq_* work use the event loop, it's now safe to run fewer worker processes for searching with no risk of deadlocks. MAX_PER_HOST isn't configurable yet for remote hosts, and maybe it shouldn't be due to potential for abuse.
2021-02-03lei q: tidy up progress reporting
We won't be reporting progress when output is going to stdout since it can clutter up the terminal unless stderr != stdout, which probably isn't worth checking. We'll also use a more agnostic mset_progress which may make it easier to support worker-less invocations.
2021-02-03lei_overview: avoid unnecessary {l2m} delete
We may reuse these objects in the non-worker code paths.
2021-02-03doc: lei-q: note "-a" and link to Xapian QueryParser
"-a" is supported by mairix, too. We should also note somewhere the query parsing features supported by Xapian.
2021-02-03lei_xsearch: ensure curl.err and tail(1) cleanup happens
We can safely rely on exit(0) here when interacting with curl(1) and git(1), unlike query workers which hit Xapian directly, where some badness happens when hit with a signal while retrieving an mset.
2021-02-03pktop: fix potential undefined var
In case we have other bugs in our code.
2021-02-03cmd_ipc4: fix comments and formatting
2021-02-03lei q: do not leave temporary files after oneshot exit
Avoid on-stack shortcuts which may prevent destructors from firing since we're not inside the event loop. We'll also tidy up the unlink mechanism in LeiOverview while we're at it.
2021-02-03lib: explicitly distinguish oneshot use
The daemon must not be fooled into thinking it's in oneshot after a lei client disconnects and erases {sock}.
2021-02-03lei_xsearch: truncate curl stderr after reading it
We may have further URLs to read in that process, so ensure we don't end up having tail send stale data.
2021-02-03lei: q: shell completion for --(include|exclude|only)
Because .onion URLs names are long!
2021-02-03lei: complete: do not complete non-arg options w/ help text
Some of our command-line switches take no arguments, and need no completion for those arguments.
2021-02-03lei q: support --only, --include and --exclude
-I is short for --include since it's standard for C compilers (along with Perl and Ruby). There are no single-character shortcuts for --exclude or --only, since I don't expect --exclude to be used very often and --only is already short (and will support shell completion).
2021-02-03lei q: emit progress and counting via PktOp
Sometimes it can be confusing for "lei q" to finish writing to a Maildir|mbox and not know if it did anything. So show some per-external progress and stats. These can be disabled via the new --quiet/-q switch. We differ slightly from mairix(1) here, as we use stderr instead of stdout for reporting totals (and we support parallel queries from various sources).
2021-02-03lei_query: default to 10000 messages as documented
Otherwise, we were only getting 50 matches without (-t) thread expansion.
2021-02-03lei: switch to use SEQPACKET socketpair instead of pipe
This will allow us to use larger messages and do progress reporting to accumulate in the main daemon.
2021-02-01doc: note optional BSD::Resource use
We've actually been capable of using this since 2019(*) in our spawn code for PSGI limiters. And it's been used since 2016 in our tests. It's a dependency of SpamAssassin, and Danga::Socket used it, too. (*) commit 721368cd04bfbd03c0d9173fff633ae34f16409a ("spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE")
2021-02-01lei: avoid ETOOMANYREFS, cleanup imports
As with PublicInbox::IPC, we'll attempt to bump RLIMIT_NOFILE and transparently workaround ETOOMANYREFS. If that fails, we'll give the user a hint to bump RLIMIT_NOFILE since ETOOMANYREFS is an uncommon error which users may be unfamiliar with. Found while stress testing for segfaults.
2021-02-01ds: next_tick: avoid $_ in top-level loop iterator
$_ at the top of a potentially deep stack below may cause surprising behavior as I experienced with ExtSearchIdx. In the future, we'll limit our $_ usage to easily-auditable bits (e.g. map, grep, and small for loops)
2021-02-01ds: guard against stack-not-refcounted quirk of Perl 5
The Perl 5 stack is weakly-referenced for performance reasons. This means it's possible for items in the stack to be freed while executing further down the stack. In lei (and perhaps public-facing read-only daemons in the future), we'll fork and call PublicInbox::DS->Reset in the child process. This causes %DescriptorMap to be clobbered, allowing the $DescriptorMap{$fd} arg to be freed inside the child process. When Carp::confess or Carp::longmess is called to generate a backtrace, it may access the @DB::args array. This array access is not protected by reference counting and is known to cause segfaults and other weird errors. While the caller of an unnecessary Carp::confess may be eliminated in a future commit, we can't guarantee our dependencies will be free of @DB::args access attempts in the future. So guard against this Perl 5 quirmk by defensively bumping the refcount of any object we call ->event_step on. cf. https://rt.perl.org/Public/Bug/Display.html?id=131046 https://github.com/Perl/perl5/issues/15928
2021-02-01import: reap git-config(1) synchronously
This avoids a zombie if another step of the event loop takes too long.
2021-02-01sharedkv: do not set cache_size by default
These DBs will probably be too small to be worth increasing the cache size of.