Date | Commit message (Collapse) |
|
Simplify our APIs and force dwaitpid() to work in async mode for
all lei workers. This avoids having lingering zombies for
parallel searches if one worker finishes soon before another.
The old distinction between "old" and "new" workers was
needlessly complex, error-prone, and embarrasingly bad.
We also never handled v2:// writers properly before on
Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS
to ensure they get handled by $lei when appropropriate.
|
|
By relying more on pgroups for remaining remaining processes,
this lets us pause all curl+tail subprocesses with a single
kill(2) to avoid cluttering stderr.
We won't bother pausing the pigz/gzip/bzip2/xz compressor
process not cat-file processes, though, since those don't write
to the terminal and they idle soon after the workers react to
SIGSTOP.
AutoReap is hoisted out from TestCommon.pm. CLONE_SKIP
is gone since we won't be using Perl threads any time
soon (they're discouraged by the maintainers of Perl).
|
|
This lets users Ctrl-Z from their terminal to pause an entire
git-clone process hierarchy.
|
|
While lei is intended for non-public mail and runs umask(077)
by default, externals are one area which can safely defer to
the user's umask.
Instead of sending it unconditionally with every command, only
have lei-daemon request it when necessary.
|
|
Since public inboxes are usually intended to be public,
the File::Temp default permission of 0600 is wrong.
Just respect the user's umask in this case as git-clone
does.
This doesn't work for "lei add-external --mirror", yet;
but it will...
|
|
warn() is easier to augment with context information, and
frankly unavoidable in the presence of 3rd-party libraries
we don't control.
|
|
This allows IMAP mirrors to keep UIDVALIDITY synchronized (and
"LIST ACTIVE.TIMES" in NNTP). "lei add-external --mirror" will
automatically set it, as will the combination of
public-inbox-clone + public-inbox-index.
This avoids the need for extra endpoints or config entries,
at least...
|
|
Also was written before we had auto-loading and rarely used.
|
|
This makes it easier for users to enable fetching on a
previously read-only epoch. Prior to this change, users were
required to delete manifest.js.gz in addition to adding the
writable bit. Now, they just have to "chmod +w $EPOCH_DIR".
|
|
There may still be pre-manifest.js.gz versions of PublicInbox::WWW.
running and serving v2 inboxes.
Since $INBOX_URL/manifest.js.gz was not understood, it was
assumed to be a Message-ID and 301-ed to
"$INBOX_URL/manifest.js.gz/" with a trailing slash, so our 404
checks were invalid. Update our fallbacks to deal with 301
by catching JSON decoding errors to trigger HTML scraping.
For HTML parsing, be sure to not be fooled by potential
user-generated content and only scan the part after the last
<hr>.
We also need to avoid propagating $? from curl unnecessarily
when we can continue safely.
Finally, update v2mirror.t with tests to use PublicInbox::WWW
from our "v1.1.0-pre1" tag to ensure these code paths get tested
|
|
Partial (v2) clones should be useful addition for users wanting
to conserve storage while having fast access to recent messages.
Continuing work started in 876e74283ff3 (fetch: ignore
non-writable epoch dirs, 2021-09-17), this creates bare,
read-only epoch git repos. These git repos have the remotes
pre-configured, but does not fetch any objects.
The goal is to allow users to set the writable bit on a
previously-skipped epoch and start fetching it.
Shell completion support may not be necessary given how short
the epoch ranges are, here.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
|
|
Redundant code is noise and therefore confusing :<
|
|
The full pathname for "curl -o ..." was too noisy and confusing.
Reduce confusion by adding the ".tmp" suffix and relying on
"-C". We'll also avoid displaying "-C" in run_reap() and
rely on "--git-dir=" with "git fetch" to display progress for
users.
|
|
Since the beginning of time, I've been dropping Makefiles
in $INBOX_DIR (and above hiearchies) to organize groups
of commands.
make(1) is widely available in various flavors and a familiar
tool for our target audience. It is easy to run in the right
directory, typically has built-in shell completion, and doesn't
silently ignore errors by default like Bourne shell.
|
|
IMHO, this greatly improves code sharing and organization
between v2, extindex, and lei/store. Common git-related
logic for these is lightly-refactored and easier to reason
about.
The impetus for this big change was to ensure inboxes
created+managed by public-inbox-{clone,fetch} could have
alternates and configs setup properly without depending on
SQLite (via V2Writable). This change does that while
making old code shorter and better factored.
|
|
Timestamp comparisons only have 1 second granularity, which
isn't nearly enough for our test cases, and probably not for
real world use for "git send-email" bursts and fast SMTP
servers.
We'll continue to check modification times inside the manifest,
though, in case an extremely rare SHA-1 collision is found...
|
|
The v1 code path was totally half-baked after the change
to use manifest.js.gz :x
Fixes: ffb7fbda6869db4b ("fetch: use manifest.js.gz for v1")
|
|
This is gentler to the remote HTTP server in the no-op case and
will allow client migrations to some v2-ish format without
forcing the client to redownload everything.
|
|
Instead of generic "Unnamed repository" or "missing" messages,
show "mirror of $URL" since it seems like a better default when
creating a mirror.
|
|
Setting up and maintaining git-only mirrors of v2 inboxes is
complex since multiple commands are required to clone and fetch
into epochs.
Unlike grokmirror, these commands do not require any
configuration. Instead, they rely on existing git config files
and work like "git clone --mirror" and "git fetch",
respectively.
Like grokmirror, they use manifest.js.gz, but only on a
per-inbox basis so users won't have to clone every inbox of a
large instance nor edit config files to include/exclude inboxes
they're interested in.
|
|
We're using rename(2) rather than link(2)
|
|
Slowly transitioning to using die() more, which hopefully
improves code reusability between lei and non-lei parts of our
code.
|
|
This should improve the users' chances of seeing errors in
various git config files we use.
|
|
If the mirror.done file doesn't exist for unlink, it's because
we already got another error, so don't confuse users by noting
an unlink error since the ENOENT is expected in the face of
other errors.
|
|
The current manifest.js.gz generation in WWW doesn't account for
PSGI mount prefixes (and grokmirror 1.x appears to work fine).
In other words, <https://yhbt.net/lore/lkml/manifest.js.gz>
currently has keys like "/lkml/git/0.git" and not
"/lore/lkml/git/0.git" where "/lore" is the PSGI mount prefix.
This works fine with the prefix accounted for in my grokmirror
(1.x) repos.conf like this:
site = https://yhbt.net/lore/
manifest = https://yhbt.net/lore/manifest.js.gz
Adding the PSGI mount prefix in manifest.js.gz is probably not
desirable since it would force the prefix into the locally
cloned path by grokmirror, and all the cloned directories
would have the remote PSGI mount prefix prepended to the
toplevel.
So, "lei add-external --mirror" needs to account for PSGI
mount prefixes by deducing the prefix based on available keys
in the manifest.js.gz hash table.
|
|
op_wait_event is now more lei-specific since we no longer have
to care about oneshot and use a synchronous loop.
{ikw} (import-keywords) started a trend, but LeiPmdir (parallel
Maildir) is an upcoming WQ class that will follow this idea.
Eventually, {l2m} usage may be updated to follow this, too.
|
|
In most cases, we just name the worker process based
on the command. The only change is for LeiMirror
vs "lei add-external --mirror", but I doubt it matters.
|
|
This lets us share more code and reduces cognitive overhead when
it comes to picking names (because {lsss} was ridiculous).
We'll need to ensure the first error set in lei is the actual
error we exit with, otherwise things can get confusing and
errors may get lost.
|
|
Simplify our internals a little bit.
|
|
I don't know if it's worth it to sub (or super)class
PublicInbox::Config into something more generic for
lei, but this change simplifies a good chunk of lei
code that reuses the public-inbox config parsing.
|
|
Provide a consistent ->op_wait_event method instead of
forcing callers to loop (or not) at each callsite.
This also avoid a leak possibility by avoiding circular
references.
|
|
While we were exiting with a error code, showing a successful
"# mirrored $URL" message is misleading and wrong. Don't show
success until everything is complete and the config is written.
|
|
All of our $lei->workers_start callers can simply rely on
that wrapper to do the right thing and pass fields to
->wq_worker_start children, only.
This could manifest as a unbound memory growth if somebody is
constantly mirroring, and was causing tests to get stuck when
experimenting with a persistent lei-daemon for the entire
test suite.
|
|
Instead of creating a short-lived circular reference,
ensure they don't exist in the first place.
Note the following changes to hold an extra ref to $sto:
- $self->_lei_store(1)->write_prepare($self);
+ my $sto = $self->_lei_store(1);
+ $sto->write_prepare($self);
I'm not a perlguts expert, but I actually wanted to switch
to the one-line version for LeiImport, but xt/lei-auth-fail.t
was getting stuck for some reason. It seems the extra ref
to the LeiStore ($sto) object is necessary.
|
|
Otherwise we could get non-sensical results if somebody tries
running "lei atfork_child" from the command-line.
|
|
We already localize %ENV before calling dispatch(), so
it's needless overhead in spawn() to be checking env for
undef values in those cases.
|
|
The backends for "lei add-external --mirror", "lei convert", and
"lei import" all share a similar pattern for spawning background
workers. Hoist out the common parts to slim down our code base
a bit.
The LeiXSearch and LeiToMail workers for "lei q" remains a the
odd duck due to the deep pipelining and parallelization.
|
|
For early MUA spawners using lock-free outputs, we we need to
on the startq pipe to silence progress reporting. For
--augment users, we can start the MUA even earlier by
creating Maildirs in the pre-augment phase.
To improve progress reporting for non-MUA (or late-MUA)
spawners, we'll no longer blindly append "--compressed" to the
curl(1) command when POST-ing for the gzipped mboxrd.
Furthermore, we'll overload stringify ('""') in LeiCurl to
ensure the empty -d '' string shows up properly.
v2: fix startq waiting with --threads
mset_progress is never shown with early MUA spawning,
The plan is to still show progress when augmenting and
deduping. This fixes all local search cases.
A leftover debug bit is dropped, too
|
|
We will have a ->wq_do that doesn't pass FDs for I/O.
|
|
This also updates lei_xsearch to follow the same pattern for
stopping curl(1) and tail(1) processes it spawns.
|
|
This can be useful for users who want to clone and
mirror an existing public-inbox. This doesn't have
update support, yet, so users will need to run
"git fetch && public-inbox-index" for now.
|