about summary refs log tree commit homepage
path: root/script
DateCommit message (Collapse)
2021-01-12cmd_ipc: send FDs with buffer payload
For another step in in syscall reduction, we'll support transferring 3 FDs and a buffer with a single sendmsg/recvmsg syscall using Socket::MsgHdr if available. Beyond script/lei itself, this will be used for internal IPC between search backends (perhaps with SOCK_SEQPACKET). There's a chance this could make it to the public-facing daemons, too. This adds an optional dependency on the Socket::MsgHdr package, available as libsocket-msghdr-perl on Debian-based distros (but not CentOS 7.x and FreeBSD 11.x, at least). Our Inline::C version in PublicInbox::Spawn remains the last choice for script/lei due to the high startup time, and IO::FDPass remains supported for non-Debian distros. Since the socket name prefix changes from 3 to 4, we'll also take this opportunity to make the argv+env buffer transfer less error-prone by relying on argc instead of designated delimiters.
2021-01-12ds: block signals when reaping
This lets us call dwaitpid long before a process exits and not have to wait around for it. This is advantageous for lei where we can run dwaitpid on the pager as soon as we spawn it, instead of waiting for a client socket to go away on DESTROY.
2021-01-04lei: prefer IO::FDPass over our Inline::C recv_3fds
While our recv_3fds() implementation is more efficient syscall-wise, loading Inline takes nearly 50ms on my machine even after Inline::C memoizes the build. The current ~20ms in the fast path is barely acceptable to me, and 50ms would be unusable. Eventually, script/lei may invoke tcc(1) or cc(1) directly in the fast path, but it needs @INC for the slow path, at least. We'll encode the number of FDs into the socket name allow parallel installations, for now.
2021-01-03send and receive all 3 FDs at once
We'll always be transferring stdin, stdout, and stderr together for lei. Perhaps I lack imagination or foresight, but I can't think of a reason to send more or less FDs.
2021-01-03spawn: support send_fd+recv_fd w/o IO::FDPass
IO::FDPass may be an extra installation burden I don't want to impose on users. We only support Linux and *BSDs, however.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2021-01-01on_destroy: support PID owner guard
Since we'll be forking for Xapian indexing and maybe other places, having a simple guard in place to ensure OnDestroy doesn't unexpectedly unlink files or similar is a safer option.
2021-01-01lei: avoid Spawn package when starting daemon
Spawn was designed to speed up process spawning inside long-lived daemons with largish memory usage. It does not help for short-lived scripts which only exist to start and connect to a daemon. This change actually speeds up initial lei startup from ~190ms to ~140ms(!). Normal usage once the daemon is running is unaffected, at <20ms for help text. While we're in the area, simplify Cwd error message generation, too.
2021-01-01syscall: SFD_NONBLOCK can be a constant, again
Since Perl exposes O_NONBLOCK as a constant, we can safely make SFD_NONBLOCK a constant, too. This is not the case for SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite being used internally in the interpreter.
2021-01-01init: remove embedded UnlinkMe package
PublicInbox::OnDestroy can do the same thing
2021-01-01spawn: move run_die here from PublicInbox::Import
It seems like a more logical place for it, but we'll favor the newly-added xsys_e() in tests for BAIL_OUT use.
2020-12-31Merge remote-tracking branch 'origin/master' into lorelei
* origin/master: (58 commits) ds: flatten + reuse @events, epoll_wait style fixes ds: simplify EventLoop implementation check defined return value for localized slurp errors import: check for git->qx errors, clearer return values git: qx: avoid extra "local" for scalar context case search: remove {mset} option for ->mset method search: remove pointless {relevance} setting miscsearch: take reopen from Search and use it extsearch: unconditionally reopen on access extindex: allow using --all without EXTINDEX_DIR extindex: add undocumented --no-scan switch extindex: enable autoflush on STDOUT/STDERR extindex: various --watch signal handling fixes extindex: --watch for inotify-based updates eml: fix undefined vars on <Perl 5.28 t/config: test --get-urlmatch for git <2.26 default to CORE::warn in $SIG{__WARN__} handlers inbox: name variable for values loop iterator inboxidle: avoid needless syscalls on refresh inboxidle: clue users into resolving ENOSPC from inotify ...
2020-12-28check defined return value for localized slurp errors
Reading from regular files (even on STDIN) can fail when dealing with flakey storage.
2020-12-27extindex: allow using --all without EXTINDEX_DIR
If "--all" is specified to index all inboxes, implicitly choose the configured [extindex "all"] external index since "--all" is incompatible with specifying inbox directories on the command-line.
2020-12-27extindex: add undocumented --no-scan switch
This makes diagnosing --watch problems easier when there's 50K inboxes by avoiding the lengthy scan (which is the reason --watch exists in the first place).
2020-12-27extindex: enable autoflush on STDOUT/STDERR
With --watch, the output may be redirected to a pipe or socket which Perl may decide to buffer. Ensure Perl doesn't buffer these outputs since they can provide real-time status updates in response to signals or FS activity.
2020-12-27extindex: --watch for inotify-based updates
This reuses existing InboxIdle infrastructure to update external indices based on per-inbox updates. This is an alternative to auto-updating external indices via the -index command and also works with existing uses of -mda and public-inbox-watch. Using inotify (or EVFILT_VNODE) allows watching thousands of inboxes without having to scan every single one at every invocation. This is especially beneficial in cases where an external index is not writable to the users writing to per-inbox indices.
2020-12-26index: filter out indexlevel=basic from extindex
extindex users will likely want to use indexlevel=basic for per-inbox indices, however extindex itself doesn't support basic index level (yet?). Let's ensure we don't trip up extindex users who specify "-L basic" on the -index command-line.
2020-12-26index: fix --no-fsync flag propagation to extindex
Negation in flag names are confusing, but trying to deviate from the DB_NO_SYNC name used by Xapian is also confusing.
2020-12-26index: do not attach inbox to extindex unless updated
We'll count the number of log changes (regardless of index or unindex) and only attach inboxes to ExtSearchIdx objects when they get new work. We'll also reduce lock bouncing and only update external indices after all per-inbox indexing is done. This also updates existing v2 indexing/unindexing callers to be more consistent and ensures unindex log entries update per-inbox last commit information.
2020-12-26index: disable --fast-noop on --reindex
These options make no sense when used together, just inform the user and move on since it's probably harmless to continue.
2020-12-26init: use the return value of rel2abs_collapsed
:x Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
2020-12-25index: support --fast-noop / -F switch
Note: I'm not sure if it's worth documenting and supporting this long-term. We can can avoid taking locks for invocations of "index --all" and rely on high-resolution ctime (struct timespec st_ctim) comparisons of msgmap.sqlite3 and the packed-refs + refs/heads directory of the newest epoch. This cuts public-inbox-index invocations with "--all --no-update-extindex -L basic" down from 0.92s to 0.31s. The change with "-L medium" or "-L full" and (default) non-zero jobs is even more drastic, reducing a 12-13s no-op invocation down to the same 0.31s
2020-12-25inboxwritable: delay umask_prepare calls
This simplifies all ->with_umask callers and opens the door for further optimizations to delay/elide process spawning.
2020-12-24index: update [extindex "all"] by default, support -E
In most cases, this ensures users will only have to opt-in to using -extindex once and won't have to issue extra commands to keep external indices up-to-date when using public-inbox-index. Since we support arbitrary numbers of external indices for ease-of-development, we'll support repeating "-E" ("--update-extindex=") in case users want to test changes in parallel.
2020-12-21use rel2abs_collapsed when loading Inbox objects
We need to canonicalize paths for inboxes which do not have a newsgroup defined, otherwise ->eidx_key matches can fail in unexpected ways.
2020-12-20script/public-inbox-*: favor caller-provided pathnames
We'll try to avoid calling Cwd::abs_path and use File::Spec->rel2abs instead, since abs_path will resolve symlinks the user specified on the command-line. Unfortunately, ->rel2abs still leaves "/.." and "/../" uncollapsed, so we still need to fall back to Cwd::abs_path in those cases. While we are at it, we'll also resolve inboxdir from deep inside v2 directories instead of misdetecting them as v1 bare git repos. In any case, stop matching directories by name and instead rely on the unique combination of st_dev + st_ino on stat() as we started doing in the extindex code.
2020-12-19lei: drop $SIG{__DIE__}, add oneshot fallbacks
We'll force stdout+stderr to be a pipe the spawning client controls, thus there's no need to lose error reporting by prematurely redirecting stdout+stderr to /dev/null. We can now rely exclusively on OnDestroy to write to syslog() on uncaught die failures. Also support falling back to oneshot mode on socket and cwd failures, since some commands may still be useful if the current working directory goes missing :P
2020-12-19lei: micro-optimize startup time
We'll use lower-level Socket and avoid IO::Socket::UNIX, use Cwd::fastcwd(*), avoid IO::Handle->autoflush by using the select operator, and reuse buffer for reading the socket while avoiding unnecessary $/ localization in a tiny script. All these things adds up to ~5-10 ms savings on my loaded system. (*) caveats about fastcwd won't apply since lei won't work in removed directories.
2020-12-19rename LeiDaemon package to PublicInbox::LEI
"LEI" is an acronym, and ALL CAPS is consistent with existing PublicInbox::{IMAP,HTTP,NNTP,WWW} naming for top-level modules, 3 of 4 old ones which deal directly with sockets and requests.
2020-12-19lei: refine help/option parsing, implement "init"
There's a bunch of work in here as the foundations are being fleshed out. One of the UI/UX is to make it easy to keep built-in help and shell completions consistent
2020-12-19lei: use spawn (vfork + execve) for lazy start
This allows us to rely on FD_CLOEXEC being set on pipes from prove(1), so forgetting `daemon-stop' won't cause tests to hang. Unfortunately, daemon tests will be slower with this.
2020-12-19lei: FD-passing and IPC basics
The start of lei, a Local Email Interface. It'll support a daemon via FD passing to avoid startup time penalties if IO::FDPass is installed, but fall back to a slow one-shot mode if not. Compared to traditional socket daemon, FD passing should allow us to eventually do stuff like run "git show" and still have proper terminal support for pager and color.
2020-12-09extindex: do not use current dir like -index does
At least not for resolving inboxes, since there's no good way for a user to specify what is an inbox or extindex directory without a command-line switch. Instead of changing the -extindex command, we change the -index command internals to rely on the new {-use_cwd} flag to avoid internal use of negation, since double-negatives and the like are confusing to me.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-11-29extindex: support `--gc' to remove dead inboxes
Inboxes may be removed or newsgroups renamed over time. Introduce a switch to do garbage collection and eliminate stale search and xref3 results based on inboxes which remain in the config file. This may also fixup stale results leftover from any bugs which may leave stale data around. This is also useful in case a clumsy BOFH (me :P) is swapping between several PI_CONFIGs and accidentally indexed a bunch of inboxes they didn't intend to.
2020-11-28*index: more consistent graceful shutdown checks
v1 and v2 inbox indexing now supports graceful shutdown checks just like ExtSearchIdx. Additionally, we'll consistently perform quit checks at the top of loops for consistency. Interaction with the --xapian-only and --sequential-shard options are a bit lacking, and will warn the user to use "--reindex --xapian-only" to fix.
2020-11-24miscidx: put grokmirror manifest entries in Xapian docdata
This should make it possible for us quickly generate manifest.js.gz files with less random I/O and process spawning in the WWW code.
2020-11-19extindex: remove skip-docdata option
Since extindex is entirely new, it doesn't have backwards compatibility concerns and never stored docdata, anyways.
2020-11-08extindex: fix --batch-size support
Calling PublicInbox::Admin::index_prepare is required for --batch-size (k|m|g) modifiiers and indexBatchSize in the config file. Otherwise, the default 1m batch size stuck and led to unexpectedly bad performance on a machine which could index v2 inboxes faster with larger batch sizes.
2020-11-08extindex: SIGUSR1 supports checkpoint
Matching the behavior of git-fast-import(1), we'll allow a user to send SIGUSR1 to checkpoint over.sqlite3 and Xapian.
2020-11-08v2writable: more accurate {current_info} warnings/progress
With async git blob retrievals, the OID being enqueued and the OID being processed can be totally unrelated and misleading. We'll also prefix $INBOX_DIR for v2, and not just the epoch since we could be indexing multiple inboxes via both -index and -extindex.
2020-11-08extsearch: rename -eindex to -extindex
Upon "eindex" rhymes with "reindex", which could be confusing; so name the command and config prefix to use "extindex" which is hopefully less confusing.
2020-11-07index: eindex wiring
This doesn't do anything, yet, but it will once the rest of the eindex stuff works.
2020-11-07script: add preliminary eindex implementation
Not documented, yet, but it runs...
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2: require git dir with OID
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git.
2020-09-19gcf2: transparently retry on missing OID
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens.
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.
2020-09-14sigfd: fix typos and scoping on systems w/o epoll+kqueue
Unfortunately, I'm not sure how easy catching these at compile-time, is. Prototypes do not seem to check these at compile time when crossing packages (not even with exported subroutines).