Date | Commit message (Collapse) |
|
This may save us a small bit of startup time since there's
fewer args and opcodes should be smaller.
|
|
While lei is intended for non-public mail and runs umask(077)
by default, externals are one area which can safely defer to
the user's umask.
Instead of sending it unconditionally with every command, only
have lei-daemon request it when necessary.
|
|
This allows -fetch to work out-of-the-box on using the
grokmirror 2.x default of "_grokmirror".
|
|
This mode only checks history for missed/stale messages
and doesn't attempt to reindex messages which are already
indexed.
|
|
This lets administrators reindex specific time ranges
according to git "approxidate" formats. These arguments
are passed directly to underlying git-log(1) invocations
and may still reach into old epochs.
Since these options rely on git committer dates (which we infer
from the most recent Received: header), they are not guaranteed
to be strictly tied to git history and it's possible to
over/under-reindex some messages. It's probably not a major
problem in practice, though; reindexing a few extra messages
is generally harmless aside from some extra device wear.
Since this currently relies on git-log, these options do not
affect -extindex, yet.
|
|
Since signalfd is often combined with our event loop, give it a
convenient API and reduce the code duplication required to use it.
EventLoop is replaced with ::event_loop to allow consistent
parameter passing and avoid needlessly passing the package name
on stack.
We also avoid exporting SFD_NONBLOCK since it's the only flag we
support. There's no sense in having the memory overhead of a
constant function when it's in cold code.
|
|
Partial (v2) clones should be useful addition for users wanting
to conserve storage while having fast access to recent messages.
Continuing work started in 876e74283ff3 (fetch: ignore
non-writable epoch dirs, 2021-09-17), this creates bare,
read-only epoch git repos. These git repos have the remotes
pre-configured, but does not fetch any objects.
The goal is to allow users to set the writable bit on a
previously-skipped epoch and start fetching it.
Shell completion support may not be necessary given how short
the epoch ranges are, here.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
|
|
Redundant code is noise and therefore confusing :<
|
|
It looks dumb, but I'm not about to take a runtime penalty to
use signalfd|EVFILT_SIGNAL, here, either.
|
|
Sometimes it's useful to pause an expensive query or
refresh-mail-sync to do something else. While lei-daemon and
lei/store can't be paused since they're shared across clients,
per-invocation WQ workers can be paused safely using the
unblockable SIGSTOP.
While we're at it, drop the ETOOMANYREFS hint since it
hasn't been a problem since we drastically reduced FD passing
early in development.
|
|
While my MUA also runs umask(077) unconditionally, not all
MUAs do. Additionally, pagers may support writing its buffer
to disk, so ensure anything else we spawn has umask(077).
|
|
Because make(1), git(1), tar(1) all support -C in this form, as
do our newer commands such as lei, public-inbox-{clone,fetch}.
|
|
As noted in the new manpage entry, this is useful for avoiding
public-inbox-index invocations when there's nothing to update.
We use 127 to match "grok-pull", and also because it doesn't
conflict with any of the current curl(1) exit codes.
|
|
IMHO, this greatly improves code sharing and organization
between v2, extindex, and lei/store. Common git-related
logic for these is lightly-refactored and easier to reason
about.
The impetus for this big change was to ensure inboxes
created+managed by public-inbox-{clone,fetch} could have
alternates and configs setup properly without depending on
SQLite (via V2Writable). This change does that while
making old code shorter and better factored.
|
|
"Unnamed repository" for v1 inboxes was misleading, and having a
non-existent description for v2 was equally annoying, so set a
short description based on the primary address.
We remove descriptions when setting up new test inboxes to
preserve the behavior of the t/lei-mirror.t test case.
|
|
Setting up and maintaining git-only mirrors of v2 inboxes is
complex since multiple commands are required to clone and fetch
into epochs.
Unlike grokmirror, these commands do not require any
configuration. Instead, they rely on existing git config files
and work like "git clone --mirror" and "git fetch",
respectively.
Like grokmirror, they use manifest.js.gz, but only on a
per-inbox basis so users won't have to clone every inbox of a
large instance nor edit config files to include/exclude inboxes
they're interested in.
|
|
lei shouldn't become unusable if a config file is invalid.
Instead, show the "git config" stderr and attempt to continue
gracefully.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
|
|
None of our code elsewhere accounts for non-*nix pathnames and
it's not worth our time to start. So stop wasting CPU cycles
giving the illusion that we'd care about non-*nix pathnames.
|
|
For users using the native TLS functionality of -httpd (instead
of using nginx + Plack::Middleware::ReverseProxy),
psgi.url_scheme=http was wrong and would lead to improper
redirects.
|
|
Boost relies on knowledge of all inboxes in a given config file
to work properly. So while we support indexing a subset of
inboxes, we must still account for boost in inboxes we're not
indexing. So split internal inbox groups into "known" and
"active", where previously we only cared for inboxes which were
being actively indexed.
Furthermore, boost checks need to be applied when a
message arrives in different inboxes across multiple
invocations.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
|
|
Since extindex uses Xapian shards in a similar way to
v2 inboxes, we'll support -xcpdb (reshard+upgrade) and
-compact all the same to give admins tuning+upgrade
options.
|
|
ECONNRESET should be rare on a private local socket, and if
we hit it, it's because we're hitting the listen() limit.
|
|
The underscore variant was never documented and maintaining
the difference between the command-line and internal hash
is not worth it.
|
|
It turns out `--fixed-value' is a relatively new git-config(1)
feature in git 2.30+ (December 2020). So use the quotemeta
perlop for now since it seems compatible-enough for POSIX ERE
used by git.
|
|
Sometimes I just want to dedupe a single Message-ID to test
something, and this lets me do it.
This patch appears to do what its supposed to. But it also
appears to be finding duplicates that were previously missed.
That's a good thing, but I wish I understood what seems to be
fixed :x
I'm not sure why the previous ExtSearchIdx.pm (blob 357312b8)
was causing messages to be missed, even, and why this patch
seems to fix it... And it's not infinite looping, either.
Anyways, before this patch, "-extindex --dedupe" was taking ~5
min to no-op every message (after the initial full --dedupe run
which took over a day to run). No-op --dedupes now take just
under 2 hours to scan every single cross-posted message for a
no-op dedupe. The initial dedupe took nearly 44 hours on my
system for <https://yhbt.net/lore/all/> due to SATA-2 TLC SSD
latency on 3 gigantic Xapian shards.
Running --dedupe with this change seems to prevent
/BUG\?.*?not deduplicated properly/ stderr messages from being
triggered by View.pm. Current versions of -extindex do not
seem susceptible to introducing duplicates.
|
|
This won't blindly append identical key=values, but
allows specifying multiple, different key=value pairs
as long as the values are different.
|
|
Since we require separate PublicInbox::HTTPD instances for each
listen socket address (in order to support {SERVER_<NAME|PORT>}
for PSGI env), the old cache needed to be invalidated on rare
app refreshes.
SIGHUP has always been broken in -httpd (but not -imapd or
-nntpd) due to this cache.
Update the daemon documentation and 5.10.1-ize some bits while
we're in the area.
|
|
This is intended to fix older indices that had deduplication
bugs for matching content. It'll also make dealing with
future changes to ContentHash easier since that's never
guaranteed stable.
It also supports --dry-run to print changes only without
making them.
|
|
Non-daemon lei isn't implemented, anymore.
|
|
The cost of supporting separate code paths between oneshot and
daemon isn't worth the trouble; especially if there are more
users to support. The test suite time nearly doubles with
oneshot, so that's hurting developer productivity.
FD passing is currently required to work efficiently with
remote HTTP(S) queries which return large messages, as seen in
commit 708b182a57373172f5523f3dc297659d58e03b58
("ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case").
Additionally, upcoming support for IMAP IDLE and inotify-based
monitoring of Maildirs cannot work properly without a background
daemon.
|
|
Everything else that's intended to be executable at some
point has the executable bit set. Remove an inaccurate
comment while we're at it.
|
|
The contents of the old lei.q.output will not be removed,
but will be converted into the new one.
|
|
This lets us share more code and reduces cognitive overhead when
it comes to picking names (because {lsss} was ridiculous).
We'll need to ensure the first error set in lei is the actual
error we exit with, otherwise things can get confusing and
errors may get lost.
|
|
We'll support this mode of operation for now to quiet down
testing of oneshot mode where the daemon doesn't persist.
|
|
Since "lei q" may read queries from stdin, we must reconnect a
known terminal before spawning terminal MUAs. Attempt to use
stdout as stdin for this purpose, since terminal MUAs tend to
expect stdout to be a terminal.
Reported-By: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v98klxg3.fsf@kyleam.com/
|
|
We need to ensure we reap things we spawn.
|
|
I completely forgot about git-credential prompting when
making lei background the client process for MUA.
Now it backgrounds itself only for the MUA when no FDs are
passed, since the MUA is the final command run. Otherwise, it
relies on FD passing as before.
Fixes: c790a75439f3a1db ("script/lei: background ourselves on MUA/pager exec")
|
|
This ought to give the MUA or pager exclusive access to the
controlling terminal. The downside is we can only exec the
pager or MUA once per invocation, but I can't imagine a valid
case for running those things multiple times, either.
Note: I'm no expert when it comes to terminal control matters,
but this allows Ctrl-Z-ed mutt instance to come back and is
a nice code reduction, as well.
|
|
File::Temp only requires four 'X' characters (unlike mkstemp(3),
which requires six). So only so only give it 4 to avoid an
80-column violation and maybe save metadata space on FSes.
|
|
It's no longer necessary with the changes to stop doing
FD passing in our backend.
cf. commits 5180ed0a1cd65139 and 7d440bf3667b8ef5
("lei q: eliminate $not_done temporary git dir hack")
("lei q: reorder internals to reduce FD passing")
|
|
While using utime on the destination Maildir is enough for mutt
to eventually notice new mail, "eventually" isn't good enough.
Send a SIGWINCH to wake mutt (and likely other MUAs)
immediately. This is more portable than relying on MUAs to
support inotify or EVFILT_VNODE.
|
|
We only spawn one process to be reaped at the moment. tests
will run the contents of script/* in the same process if
possible, so any test scripts which spawn -httpd or other
read-only can cause us to stall with waitpid(-1, ...)
|
|
This is taken from common implementations of make(1)
and only affected people using the command-line help
output.
|
|
Perl may internally race and miss signals due to a lack of
self-pipe / eventfd / signalfd / EVFILT_SIGNAL usage. While our
event loop paths avoid these problems by using signalfd or
EVFILT_SIGNAL, thse sleep() calls are not within the event loop.
|
|
As with PublicInbox::IPC, we'll attempt to bump RLIMIT_NOFILE
and transparently workaround ETOOMANYREFS. If that fails,
we'll give the user a hint to bump RLIMIT_NOFILE since
ETOOMANYREFS is an uncommon error which users may be unfamiliar
with.
Found while stress testing for segfaults.
|
|
PublicInbox::Listener unconditionally sets O_NONBLOCK upon
accept(), so we need a larger timeout under heavy load since
there's no "dataready" accept filter on the listener.
With O_NONBLOCK already set, we don't have to set it at
->event_step_init
|
|
Just open every FD as read/write. Perl (or any non-broken
runtime) won't care and won't attempt to use F_SETFL to alter
file description flags; as attempting to change those would
lead to unpleasant side effects if the file description is
shared with another process.
|
|
Via curl(1), since that lets us easily use tor on a
per-connection basis via LD_PRELOAD (torsocks) or proxy.
We'll eventually support more curl options which can allow
users to get past firewalls and deal with other odd network
configurations.
|
|
The signal handlers on the client side were unnecessary,
all we need is to handle socket EOF properly in the daemon
by killing xsearch and l2m workers.
|
|
Perl chdir() automatically does fchdir(2) if given a file
or directory handle since 5.8.8/5.10.0, so we can safely
rely on it given our 5.10.1+ requirement.
This means we no longer have to waste several milliseconds
loading the Cwd.so and making stat() calls to ensure
ENV{PWD} is correct and usable in the server. It also lets
us work in directories that are no longer accessible via
pathname.
|