about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-02-05lei q: reorder internals to reduce FD passing
While FD passing is critical for script/lei <=> lei-daemon, lei-daemon doesn't need to use it internally if FDs are created in the proper order before forking.
2021-02-05ipc: localize fields assignment
We don't want circular references giving surprising behavior during worker exit.
2021-02-05lei q: delay worker spawn
Now that --stdin support is sorted, we can delay spawning workers until we know the query is ready-to-run.
2021-02-04pkt_op: do not exit subroutine via "next"
"next" apparently doesn't work in "do {} while" loops, so just use "while" as it makes no difference, here.
2021-02-04wwwaltid: add missing word to instructions
2021-02-04www: call curl with -d '' in the altid instructions
Nginx doesn't appear to be happy with just -XPOST, so use -d '' to avoid potential confusion about why the instructions aren't working. cf. commit 533e1234bc03a1ca8754d249aa8c2ce157e26780 (lei_xsearch: use curl -d '' for nginx compatibility, 2021-01-24)
2021-02-04tests: guard against missing DBD::SQLite
The features we use for SharedKV could probably be implemented with GDBM_File or SDBM_File, but that doesn't seem worth it at the moment since we depend on SQLite elsewhere.
2021-02-04spawn: merge common C code together
There'll probably be more things which work on both GNU and *BSD systems which we don't need separate strings for.
2021-02-04lei q: support reading queries from stdin
This will be useful on shared machines when a user doesn't want search queries visible to other users looking at the ps(1) output or similar.
2021-02-04lei: use sleep(1) loop for infinite sleep
Perl may internally race and miss signals due to a lack of self-pipe / eventfd / signalfd / EVFILT_SIGNAL usage. While our event loop paths avoid these problems by using signalfd or EVFILT_SIGNAL, thse sleep() calls are not within the event loop.
2021-02-04lei add-external: completion for existing URL basenames
Given the presence of one external on a certain host or prefix path, it's logical other inboxes would share a common prefix. For bash users, attempt to complete that using the "-o nospace" option of bash
2021-02-04lei: help starts pager
Because some commands have many options which take up multiple screens.
2021-02-04lei: complete basenames for include|exclude|only
This will make it even easier for RSI-afflicted users to use, since many externals may share a common prefix.
2021-02-04lei q: -I/--exclude/--only support globs and basenames
We can do basename matching when it's unambiguous. Since '*?[]' characters are rare in URLs and pathnames, we'll do glob matching by default to support a (curl-inspired) --globoff/-g option to disable globbing. And fix --exclude while we're at it
2021-02-04lei: propagate curl errors, improve internal consistency
IO::Uncompress::Gunzip seems to be losing $? when closing PublicInbox::ProcessPipe. To workaround this, do a synchronous waitpid ourselves to force proper $? reporting update tests to use the new --only feature for testing invalid URLs. This improves internal code consistency by having {pkt_op} parse the same ASCII-only protocol script/lei understands. We no longer pass {sock} to worker processes at all, further reducing FD pressure on per-user limits.
2021-02-04lei: err: avoid uninitialized variable warnings
2021-02-04pkt_op: rely on DS::in_loop global
No reason to check for $lei->{oneshot} here.
2021-02-04lei: further reduce lei2mail FD pressure
We don't need to be sending errors directly to the client, but instead go through lei-daemon or the top-level one-shot process.
2021-02-04lei: reduce FD pressure from lei2mail worker
lei2mail doesn't need stdin anymore, so we can use the [0] slot for the $not_done keepalive purposes.
2021-02-03lei q: support --jobs [SEARCHERS],[WRITERS]
This comma-delimited parameter allows controlling the number or lei_xsearch and lei2mail worker processes. With the change to make IPC wq_* work use the event loop, it's now safe to run fewer worker processes for searching with no risk of deadlocks. MAX_PER_HOST isn't configurable yet for remote hosts, and maybe it shouldn't be due to potential for abuse.
2021-02-03lei q: tidy up progress reporting
We won't be reporting progress when output is going to stdout since it can clutter up the terminal unless stderr != stdout, which probably isn't worth checking. We'll also use a more agnostic mset_progress which may make it easier to support worker-less invocations.
2021-02-03lei_overview: avoid unnecessary {l2m} delete
We may reuse these objects in the non-worker code paths.
2021-02-03lei_xsearch: ensure curl.err and tail(1) cleanup happens
We can safely rely on exit(0) here when interacting with curl(1) and git(1), unlike query workers which hit Xapian directly, where some badness happens when hit with a signal while retrieving an mset.
2021-02-03pktop: fix potential undefined var
In case we have other bugs in our code.
2021-02-03cmd_ipc4: fix comments and formatting
2021-02-03lei q: do not leave temporary files after oneshot exit
Avoid on-stack shortcuts which may prevent destructors from firing since we're not inside the event loop. We'll also tidy up the unlink mechanism in LeiOverview while we're at it.
2021-02-03lib: explicitly distinguish oneshot use
The daemon must not be fooled into thinking it's in oneshot after a lei client disconnects and erases {sock}.
2021-02-03lei_xsearch: truncate curl stderr after reading it
We may have further URLs to read in that process, so ensure we don't end up having tail send stale data.
2021-02-03lei: q: shell completion for --(include|exclude|only)
Because .onion URLs names are long!
2021-02-03lei: complete: do not complete non-arg options w/ help text
Some of our command-line switches take no arguments, and need no completion for those arguments.
2021-02-03lei q: support --only, --include and --exclude
-I is short for --include since it's standard for C compilers (along with Perl and Ruby). There are no single-character shortcuts for --exclude or --only, since I don't expect --exclude to be used very often and --only is already short (and will support shell completion).
2021-02-03lei q: emit progress and counting via PktOp
Sometimes it can be confusing for "lei q" to finish writing to a Maildir|mbox and not know if it did anything. So show some per-external progress and stats. These can be disabled via the new --quiet/-q switch. We differ slightly from mairix(1) here, as we use stderr instead of stdout for reporting totals (and we support parallel queries from various sources).
2021-02-03lei_query: default to 10000 messages as documented
Otherwise, we were only getting 50 matches without (-t) thread expansion.
2021-02-03lei: switch to use SEQPACKET socketpair instead of pipe
This will allow us to use larger messages and do progress reporting to accumulate in the main daemon.
2021-02-01ds: next_tick: avoid $_ in top-level loop iterator
$_ at the top of a potentially deep stack below may cause surprising behavior as I experienced with ExtSearchIdx. In the future, we'll limit our $_ usage to easily-auditable bits (e.g. map, grep, and small for loops)
2021-02-01ds: guard against stack-not-refcounted quirk of Perl 5
The Perl 5 stack is weakly-referenced for performance reasons. This means it's possible for items in the stack to be freed while executing further down the stack. In lei (and perhaps public-facing read-only daemons in the future), we'll fork and call PublicInbox::DS->Reset in the child process. This causes %DescriptorMap to be clobbered, allowing the $DescriptorMap{$fd} arg to be freed inside the child process. When Carp::confess or Carp::longmess is called to generate a backtrace, it may access the @DB::args array. This array access is not protected by reference counting and is known to cause segfaults and other weird errors. While the caller of an unnecessary Carp::confess may be eliminated in a future commit, we can't guarantee our dependencies will be free of @DB::args access attempts in the future. So guard against this Perl 5 quirmk by defensively bumping the refcount of any object we call ->event_step on. cf. https://rt.perl.org/Public/Bug/Display.html?id=131046 https://github.com/Perl/perl5/issues/15928
2021-02-01import: reap git-config(1) synchronously
This avoids a zombie if another step of the event loop takes too long.
2021-02-01sharedkv: do not set cache_size by default
These DBs will probably be too small to be worth increasing the cache size of.
2021-02-01lei_to_mail: reduce spew on Maildir removal
At most, we'll only warn once per worker when a Maildir disappears from under us. We'll also use the '!' OpPipe to note the exceptional condition, and use '|' to SIGPIPE so it'll be a bit easier for hackers to remember.
2021-02-01sharedkv: use lock_for_scope_fast
This allows us to avoid repeated open() and close() syscalls and speeds up the new xt/stress-sharedkv.t maintainer test by roughly 7%.
2021-02-01lei: increase initial timeout
PublicInbox::Listener unconditionally sets O_NONBLOCK upon accept(), so we need a larger timeout under heavy load since there's no "dataready" accept filter on the listener. With O_NONBLOCK already set, we don't have to set it at ->event_step_init
2021-02-01sharedkv: lock and explicitly disconnect {dbh}
It may be possible for updates or changes to be uncommitted until disconnect, so we'll use flock() as we do elsewhere to avoid the polling retry behavior of SQLite. We also need to clear CachedKids before disconnecting to to avoid warnings like: ->disconnect invalidates 1 active statement handle (either destroy statement handles or call finish on them before disconnecting)
2021-02-01lei: deep clone {ovv} for l2m workers
We don't need to send the temporary xsearch {git} object over to workers, just the directory name.
2021-02-01lei_xsearch: load PublicInbox::Smsg
We use $smsg->populate here, so ensure it's loaded although PublicInbox::Search currently loads it.
2021-02-01lei_dedupe: use Digest::SHA
While it's loaded by ContentHash, we use Digest::SHA directly in this package for smsg and OID-only deduplication.
2021-02-01lei: keep $lei around until workers are reaped
This prevents SharedKV->DESTROY in lei-daemon from triggering before DB handles are closed in lei2mail processes. The {each_smsg_not_done} pipe was not sufficient in this case: that gets closed at the end of the last git_to_mail callback invocation.
2021-02-01sharedkv: release {dbh} before rmtree
This may be needed to avoid warnings/errors when operating in single process mode in the future.
2021-02-01lei: remove syslog dependency
It doesn't seem necessary now that we redirect and write stuff to errors.log, which gets checked every run.
2021-02-01ipc: more helpful ETOOMANYREFS error messages
ETOOMANYREFS is probably a unfamiliar error to most users, so give a hint about RLIMIT_NOFILE. This can be hit on my system running 3 simultaneous queries with my system default limit of 1024. There's also no need to import Errno constants for uncommon errors, so we'll stop using Errno, here. We'll also try to bump RLIMIT_NOFILE as much as possible to avoid this error.
2021-02-01lei: remove SIGPIPE handler
It doesn't save us any code, and the action-at-a-distance element was making it confusing to track down actual problems. Another potential problem was keeping references alive too long. So do like we would a C100K server and check every write while still ensuring lei(1) exit with a proper SIGPIPE iff needed.