Date | Commit message (Collapse) |
|
getpid() isn't cached by glibc nowadays and system calls are
more expensive due to CPU vulnerability mitigations. To
ensure we switch to the new semantics properly, introduce
a new `on_destroy' function to simplify callers.
Furthermore, most OnDestroy correctness is often tied to the
process which creates it, so make the new API default to
guarded against running in subprocesses.
For cases which require running in all children, a new
PublicInbox::OnDestroy::all call is provided.
|
|
The IO package seems like a better home for I/O subs than the
Git package. We lose the 60 second read timeout for `git
cat-file --batch-*' processes since it's probably not necessary
given how reliable the code has proven and things would fall
over hard in other ways if the storage device were completely
hosed.
|
|
This hurts startup time a bit, but our tests use run_script
by default and I don't think normal users call -init enough
to care.
|
|
It's actually valid Perl syntax, but still confusing to look at.
Fixes: add90b9504f4 ("support -C (chdir) for most non-daemon commands")
|
|
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.
|
|
Creating config 0600 disregarding umask breaks scenarios where daemons
run with credentials different from config owner (but need to read the
config).
File::Temp defaults to 0600, which is unsuitable for the
recommended/typical scenario of daemons running unprivileged and with
UID different from $PI_CONFIG owner, as the deamons need to read
$PI_CONFIG.
Respecting umask might end up creating world-unreadable config, too,
but for people who use such umask that's expected behavior.
|
|
I noticed a description for a new inbox had st_mode=0600.
|
|
Because make(1), git(1), tar(1) all support -C in this form, as
do our newer commands such as lei, public-inbox-{clone,fetch}.
|
|
"Unnamed repository" for v1 inboxes was misleading, and having a
non-existent description for v2 was equally annoying, so set a
short description based on the primary address.
We remove descriptions when setting up new test inboxes to
preserve the behavior of the t/lei-mirror.t test case.
|
|
None of our code elsewhere accounts for non-*nix pathnames and
it's not worth our time to start. So stop wasting CPU cycles
giving the illusion that we'd care about non-*nix pathnames.
|
|
It turns out `--fixed-value' is a relatively new git-config(1)
feature in git 2.30+ (December 2020). So use the quotemeta
perlop for now since it seems compatible-enough for POSIX ERE
used by git.
|
|
This won't blindly append identical key=values, but
allows specifying multiple, different key=value pairs
as long as the values are different.
|
|
File::Temp only requires four 'X' characters (unlike mkstemp(3),
which requires six). So only so only give it 4 to avoid an
80-column violation and maybe save metadata space on FSes.
|
|
This is taken from common implementations of make(1)
and only affected people using the command-line help
output.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Since we'll be forking for Xapian indexing and maybe
other places, having a simple guard in place to ensure
OnDestroy doesn't unexpectedly unlink files or similar
is a safer option.
|
|
PublicInbox::OnDestroy can do the same thing
|
|
It seems like a more logical place for it, but we'll favor the
newly-added xsys_e() in tests for BAIL_OUT use.
|
|
Reading from regular files (even on STDIN) can fail
when dealing with flakey storage.
|
|
:x
Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
|
|
We need to canonicalize paths for inboxes which do not have
a newsgroup defined, otherwise ->eidx_key matches can fail
in unexpected ways.
|
|
We'll try to avoid calling Cwd::abs_path and use
File::Spec->rel2abs instead, since abs_path will resolve
symlinks the user specified on the command-line.
Unfortunately, ->rel2abs still leaves "/.." and "/../"
uncollapsed, so we still need to fall back to Cwd::abs_path in
those cases.
While we are at it, we'll also resolve inboxdir from deep inside
v2 directories instead of misdetecting them as v1 bare git
repos.
In any case, stop matching directories by name and instead rely
on the unique combination of st_dev + st_ino on stat() as we
started doing in the extindex code.
|
|
Following "git init" as an example, we'll create every parent
path up to the one specified, instead of attempting to continue
on when Cwd::abs_path returns `undef'.
|
|
`-h' doesn't conflict with anything, and some users (including
git users) may be more accustomed to using it rather than the
rarely-seen-outside-of-Getopt::Long `-?' switch.
We can also rely on the GetOptions() function to emit a proper
error message instead of just "bad command-line args".
|
|
Since we no longer read document data from Xapian, allow users
to opt-out of storing it.
This breaks compatibility with previous releases of
public-inbox, but gives us a ~1.5% space savings on Xapian
storage (and associated I/O and page cache pressure reduction).
|
|
It may be too easily confused for --newsgroup or --ng. This is
too rarely used and never made it into a release, so it should
be fine.
|
|
We can reduce the need to edit the config file for NNTP group names
this way.
|
|
And speed those up with some lazy loading, too.
|
|
Move away from hard-to-read alllowercase naming and favor
snake_case or separated-by-dashes.
We'll keep `--indexlevel' as-is for now, since it's been around
for several releases; but we'll support `--index-level' in the
CLI and update our documentation in a few months.
We'll also clarify that publicInbox.indexMaxSize is only
intended for -index, and not -watch or -mda.
|
|
We can use open(..., undef) natively in Perl in t/import.t
In places where we need a pathname, the File::Temp OO API
gives us auto-unlinking for free.
|
|
Tests for failures should not leave junk temporary files lying
around in a users' ~/.public-inbox/.
On a side note, I'm not sure if PI_DIR is or was ever
necessary. It's never been documented, so perhaps
using $HOME for this is better...
|
|
"\n" and other characters requiring quoting and/or escaping in
in $GIT_DIR/objects/info/alternates was not supported in git 2.11
and earlier; nor does it seem supported at all in libgit2.
This will allow us to support sharing git-cat-file or similar
endpoints across multiple inboxes via alternates.
This breaks an existing use case for anybody wacky
enough to put `\n' in the `inboxdir' pathname; but I doubt
this affects anybody.
|
|
For archivists with only newer mail archives, this option allows
reserving reserve NNTP article numbers for yet-to-be-archived
old messages. Indexers will need to be updated to support this
feature in future commits.
-V1 inboxes will now be initialized with SQLite and Xapian
support if this option is used, or if --indexlevel= is
specified.
|
|
Since V2 uses multiple git repositories, stop using
the word "repo" when referring to inboxes.
|
|
On a powerful (by my standards) machine with 16GB RAM and an
7200 RPM HDD marketed for "enterprise" use, indexing a 8.1G (in
git) LKML snapshot from Sep 2019 did not finish after 7 days
with the default number (3) of Xapian shards (`--jobs=4') and
`--batch-size=10m'.
Indexing starts off fast, but progressively get slower as
contents of the inbox (including Xapian + SQLite DBs) could no
longer be cached by the kernel. Once the on-disk size
increased, HDD seek contention between the Xapian shard workers
slowed the process down to a crawl.
With a single shard, it still took around 3.5 days to index on
the HDD. That's not good, but it's far better than not
finishing after 7 days. So allow unfortunate HDD users to
easily specify a single shard on public-inbox-init.
For reference, a freshly TRIM-ed low-end TLC SSD on the SATA II
bus on the same machine indexes that same snapshot of LKML in
~7 hours with 3 shards and the same 10m batch size. In the past,
a higher-end consumer grade MLC SSDs on similar hardware indexed
a similarly sized-data set in ~4 hours.
|
|
I didn't wait until September to do it, this year!
|
|
We already load PublicInbox::Import via
PublicInbox::InboxWritable, so it's not an extra module
to load. This can give us a slight speedup in tests.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
Avoid 'Variable "%s" will not stay shared' warnings
when the contents of this script eval'ed into a sub.
We also need to rely on ->DESTROY instead of END{}
to unlink the lock file on sub exit.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
First, we use flock(2) to wait on parallel public-inbox-init(1)
invocations while we make multiple changes using git-config(1).
This flock allows -init processes to wait on each other if using
reasonable POSIX filesystems.
Then, we also need a git-config(1)-compatible lock to prevent
user-invoked git-config(1) processes from clobbering our
changes while we're holding the flock.
|
|
Since I intend to add support for --skip-artnum, disambiguating
the long option name makes sense. We'll support --skip
indefinitely for compatibility.
|
|
|
|
-index documentation avoid redundant v1 information and refers
readers to apropriate v1/v2 manpages. Search::Xapian can also
be optional, now, as only the PSGI search interface uses it.
Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be
confused for code repos which we also support.
XAPIAN_FLUSH_THRESHOLD is documented for all relevant
bulk commands.
|
|
In retrospect, introducing V1Writable was unnecessary and
InboxWritable->importer is in a better position to abstract
away differences between v1 and v2 writers.
So teach InboxWritable to initialize inboxes and get rid
of V1Writable.
|
|
"git config" did not preserve permissions of the config file it
modifies prior to git 2.1.0, so workaround that.
|
|
Since we lazy-load Xapian now, some errors may become
more cryptic or buried. Try to improve that by making
Admin show better errors.
|
|
More tests work without Search::Xapian, now.
Usability issues still need to be fixed
|
|
Import initialization is a little strange from history, but we
also can't change it too much because it's technically a public
API which external code may rely on...
And we may need to support v1 repos indefinitely. This should
make it easier to write tests for both formats.
|
|
This allows archivists to publish incomplete archives with newer
mail while allowing "0.git" (or "1.git" and so on) epochs to be
added-after-the-fact (without affecting "git clone" followers).
A reindex will be necessary for Xapian and SQLite to catch up
once the old epochs are added; but the reindexing code is also
capable of tolerating missing epochs.
|