Date | Commit message (Collapse) |
|
Perl chdir() automatically does fchdir(2) if given a file
or directory handle since 5.8.8/5.10.0, so we can safely
rely on it given our 5.10.1+ requirement.
This means we no longer have to waste several milliseconds
loading the Cwd.so and making stat() calls to ensure
ENV{PWD} is correct and usable in the server. It also lets
us work in directories that are no longer accessible via
pathname.
|
|
It turns out "local" did not take effect in the way we used it:
http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
Fortunately, none of the old use cases seem affected, unlike the
previous lei change to ensure consistent SIGPIPE handling.
|
|
The new test ensures consistency between oneshot and
client/daemon users. Cancelling an in-progress result now also
stops xsearch workers to avoid wasted CPU and I/O.
Note the lei->atfork_child_wq usage changes, it is to workaround
a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
This switches the internal protocol to use SOCK_SEQPACKET
AF_UNIX sockets to prevent merging messages from the daemon to
client to run pager and kill/exit the client script.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
While most single keystrokes work fine when the pager is
launched from the background daemon, Ctrl-C and WINCH can cause
strangeness when connected to the wrong terminal.
|
|
Using kill(2) is too dangerous since extremely long
queries may mean the original PID of the aborted lei(1)
client process to be recycled by a new process. It would
be bad if the lei_xsearch worker process issued a kill
on the wrong process.
So just rely on sending the exit message via socket.
|
|
Actually, sending 4 FDs will be useful for lei internal xsearch
work once we start accepting input from stdin. It won't be used
with the lightweight lei(1) client, however.
For WWW (eventually), a single FD may be enough.
|
|
For another step in in syscall reduction, we'll support
transferring 3 FDs and a buffer with a single sendmsg/recvmsg
syscall using Socket::MsgHdr if available.
Beyond script/lei itself, this will be used for internal IPC
between search backends (perhaps with SOCK_SEQPACKET). There's
a chance this could make it to the public-facing daemons, too.
This adds an optional dependency on the Socket::MsgHdr package,
available as libsocket-msghdr-perl on Debian-based distros
(but not CentOS 7.x and FreeBSD 11.x, at least).
Our Inline::C version in PublicInbox::Spawn remains the last
choice for script/lei due to the high startup time, and
IO::FDPass remains supported for non-Debian distros.
Since the socket name prefix changes from 3 to 4, we'll also
take this opportunity to make the argv+env buffer transfer less
error-prone by relying on argc instead of designated delimiters.
|
|
This lets us call dwaitpid long before a process exits
and not have to wait around for it.
This is advantageous for lei where we can run dwaitpid on the
pager as soon as we spawn it, instead of waiting for a client
socket to go away on DESTROY.
|
|
While our recv_3fds() implementation is more efficient
syscall-wise, loading Inline takes nearly 50ms on my machine
even after Inline::C memoizes the build. The current ~20ms in
the fast path is barely acceptable to me, and 50ms would be
unusable.
Eventually, script/lei may invoke tcc(1) or cc(1) directly in
the fast path, but it needs @INC for the slow path, at least.
We'll encode the number of FDs into the socket name allow
parallel installations, for now.
|
|
We'll always be transferring stdin, stdout, and stderr together
for lei. Perhaps I lack imagination or foresight, but I can't
think of a reason to send more or less FDs.
|
|
IO::FDPass may be an extra installation burden I don't want to
impose on users. We only support Linux and *BSDs, however.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Since we'll be forking for Xapian indexing and maybe
other places, having a simple guard in place to ensure
OnDestroy doesn't unexpectedly unlink files or similar
is a safer option.
|
|
Spawn was designed to speed up process spawning inside
long-lived daemons with largish memory usage. It does not help
for short-lived scripts which only exist to start and connect to
a daemon.
This change actually speeds up initial lei startup from
~190ms to ~140ms(!). Normal usage once the daemon is running
is unaffected, at <20ms for help text.
While we're in the area, simplify Cwd error message generation,
too.
|
|
Since Perl exposes O_NONBLOCK as a constant, we can safely make
SFD_NONBLOCK a constant, too. This is not the case for
SFD_CLOEXEC, since O_CLOEXEC is not exposed by Perl despite
being used internally in the interpreter.
|
|
PublicInbox::OnDestroy can do the same thing
|
|
It seems like a more logical place for it, but we'll favor the
newly-added xsys_e() in tests for BAIL_OUT use.
|
|
* origin/master: (58 commits)
ds: flatten + reuse @events, epoll_wait style fixes
ds: simplify EventLoop implementation
check defined return value for localized slurp errors
import: check for git->qx errors, clearer return values
git: qx: avoid extra "local" for scalar context case
search: remove {mset} option for ->mset method
search: remove pointless {relevance} setting
miscsearch: take reopen from Search and use it
extsearch: unconditionally reopen on access
extindex: allow using --all without EXTINDEX_DIR
extindex: add undocumented --no-scan switch
extindex: enable autoflush on STDOUT/STDERR
extindex: various --watch signal handling fixes
extindex: --watch for inotify-based updates
eml: fix undefined vars on <Perl 5.28
t/config: test --get-urlmatch for git <2.26
default to CORE::warn in $SIG{__WARN__} handlers
inbox: name variable for values loop iterator
inboxidle: avoid needless syscalls on refresh
inboxidle: clue users into resolving ENOSPC from inotify
...
|
|
Reading from regular files (even on STDIN) can fail
when dealing with flakey storage.
|
|
If "--all" is specified to index all inboxes, implicitly choose
the configured [extindex "all"] external index since "--all" is
incompatible with specifying inbox directories on the
command-line.
|
|
This makes diagnosing --watch problems easier when there's
50K inboxes by avoiding the lengthy scan (which is the reason
--watch exists in the first place).
|
|
With --watch, the output may be redirected to a pipe or socket
which Perl may decide to buffer. Ensure Perl doesn't buffer
these outputs since they can provide real-time status updates
in response to signals or FS activity.
|
|
This reuses existing InboxIdle infrastructure to update external
indices based on per-inbox updates. This is an alternative to
auto-updating external indices via the -index command and also
works with existing uses of -mda and public-inbox-watch.
Using inotify (or EVFILT_VNODE) allows watching thousands of
inboxes without having to scan every single one at every
invocation.
This is especially beneficial in cases where an external index
is not writable to the users writing to per-inbox indices.
|
|
extindex users will likely want to use indexlevel=basic for
per-inbox indices, however extindex itself doesn't support basic
index level (yet?). Let's ensure we don't trip up extindex
users who specify "-L basic" on the -index command-line.
|
|
Negation in flag names are confusing, but trying to deviate from
the DB_NO_SYNC name used by Xapian is also confusing.
|
|
We'll count the number of log changes (regardless of index or
unindex) and only attach inboxes to ExtSearchIdx objects when
they get new work. We'll also reduce lock bouncing and only
update external indices after all per-inbox indexing is done.
This also updates existing v2 indexing/unindexing callers
to be more consistent and ensures unindex log entries update
per-inbox last commit information.
|
|
These options make no sense when used together, just inform the
user and move on since it's probably harmless to continue.
|
|
:x
Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
|
|
Note: I'm not sure if it's worth documenting and supporting this
long-term.
We can can avoid taking locks for invocations of "index --all"
and rely on high-resolution ctime (struct timespec st_ctim)
comparisons of msgmap.sqlite3 and the packed-refs + refs/heads
directory of the newest epoch.
This cuts public-inbox-index invocations with
"--all --no-update-extindex -L basic" down from 0.92s to 0.31s.
The change with "-L medium" or "-L full" and (default) non-zero
jobs is even more drastic, reducing a 12-13s no-op invocation
down to the same 0.31s
|
|
This simplifies all ->with_umask callers and opens the
door for further optimizations to delay/elide process spawning.
|
|
In most cases, this ensures users will only have to opt-in to
using -extindex once and won't have to issue extra commands
to keep external indices up-to-date when using
public-inbox-index.
Since we support arbitrary numbers of external indices for
ease-of-development, we'll support repeating "-E"
("--update-extindex=") in case users want to test changes in
parallel.
|
|
We need to canonicalize paths for inboxes which do not have
a newsgroup defined, otherwise ->eidx_key matches can fail
in unexpected ways.
|
|
We'll try to avoid calling Cwd::abs_path and use
File::Spec->rel2abs instead, since abs_path will resolve
symlinks the user specified on the command-line.
Unfortunately, ->rel2abs still leaves "/.." and "/../"
uncollapsed, so we still need to fall back to Cwd::abs_path in
those cases.
While we are at it, we'll also resolve inboxdir from deep inside
v2 directories instead of misdetecting them as v1 bare git
repos.
In any case, stop matching directories by name and instead rely
on the unique combination of st_dev + st_ino on stat() as we
started doing in the extindex code.
|
|
We'll force stdout+stderr to be a pipe the spawning client
controls, thus there's no need to lose error reporting by
prematurely redirecting stdout+stderr to /dev/null.
We can now rely exclusively on OnDestroy to write to syslog() on
uncaught die failures.
Also support falling back to oneshot mode on socket and cwd
failures, since some commands may still be useful if the current
working directory goes missing :P
|
|
We'll use lower-level Socket and avoid IO::Socket::UNIX,
use Cwd::fastcwd(*), avoid IO::Handle->autoflush by
using the select operator, and reuse buffer for reading
the socket while avoiding unnecessary $/ localization
in a tiny script.
All these things adds up to ~5-10 ms savings on my loaded
system.
(*) caveats about fastcwd won't apply since lei won't work
in removed directories.
|
|
"LEI" is an acronym, and ALL CAPS is consistent with existing
PublicInbox::{IMAP,HTTP,NNTP,WWW} naming for top-level modules,
3 of 4 old ones which deal directly with sockets and requests.
|
|
There's a bunch of work in here as the foundations are being
fleshed out. One of the UI/UX is to make it easy to keep
built-in help and shell completions consistent
|
|
This allows us to rely on FD_CLOEXEC being set on pipes
from prove(1), so forgetting `daemon-stop' won't cause
tests to hang.
Unfortunately, daemon tests will be slower with this.
|
|
The start of lei, a Local Email Interface. It'll support a
daemon via FD passing to avoid startup time penalties if
IO::FDPass is installed, but fall back to a slow one-shot mode
if not.
Compared to traditional socket daemon, FD passing should allow
us to eventually do stuff like run "git show" and still have
proper terminal support for pager and color.
|
|
At least not for resolving inboxes, since there's no good way
for a user to specify what is an inbox or extindex directory
without a command-line switch.
Instead of changing the -extindex command, we change the -index
command internals to rely on the new {-use_cwd} flag to avoid
internal use of negation, since double-negatives and the like
are confusing to me.
|
|
{pi_config} may be confused with the documented `PI_CONFIG'
environment variable, and we'll favor vowel-removal to be
consistent with our usage of object references.
The `pi_' prefix may stay in some places, for now; since a
separate namespace may come into this codebase for local/private
client-tooling.
For InboxIdle, we'll also remove an invalid comment about
holding a reference to the PublicInbox::Config object, too.
|
|
Inboxes may be removed or newsgroups renamed over time.
Introduce a switch to do garbage collection and eliminate stale
search and xref3 results based on inboxes which remain in the
config file.
This may also fixup stale results leftover from any bugs which
may leave stale data around.
This is also useful in case a clumsy BOFH (me :P) is swapping
between several PI_CONFIGs and accidentally indexed a bunch of
inboxes they didn't intend to.
|
|
v1 and v2 inbox indexing now supports graceful shutdown checks
just like ExtSearchIdx. Additionally, we'll consistently
perform quit checks at the top of loops for consistency.
Interaction with the --xapian-only and --sequential-shard
options are a bit lacking, and will warn the user to use
"--reindex --xapian-only" to fix.
|
|
This should make it possible for us quickly generate
manifest.js.gz files with less random I/O and process
spawning in the WWW code.
|
|
Since extindex is entirely new, it doesn't have backwards
compatibility concerns and never stored docdata, anyways.
|
|
Calling PublicInbox::Admin::index_prepare is required for
--batch-size (k|m|g) modifiiers and indexBatchSize in the config
file. Otherwise, the default 1m batch size stuck and led
to unexpectedly bad performance on a machine which could index
v2 inboxes faster with larger batch sizes.
|
|
Matching the behavior of git-fast-import(1), we'll allow a user
to send SIGUSR1 to checkpoint over.sqlite3 and Xapian.
|
|
With async git blob retrievals, the OID being enqueued and the
OID being processed can be totally unrelated and misleading.
We'll also prefix $INBOX_DIR for v2, and not just the epoch
since we could be indexing multiple inboxes via both -index
and -extindex.
|
|
Upon "eindex" rhymes with "reindex", which could be confusing;
so name the command and config prefix to use "extindex" which
is hopefully less confusing.
|