Date | Commit message (Collapse) |
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is
safe to use inside $SIG{__WARN__} handlers without triggering
infinite recursion. So fall back to reusing CORE::warn instead
of creating a new sub.
|
|
This simplifies all ->with_umask callers and opens the
door for further optimizations to delay/elide process spawning.
|
|
This ensures we have UIDVALIDITY to index earlier
rather than later for v1 inboxes, matching v2 behavior.
|
|
There's only one caller, unlikely to be any more, and
should be harmless to open code.
|
|
Perl readdir detects list context and can return an array
suitable for the grep op. From there, we can rely on
substr to remove the ".git" suffix and integerize the value
to save a few bytes before letting List::Util::max return
the value.
This is how we detect Xapian shards nowadays, too, and
we'll also use defined-or (//) to simplify the return
value there.
We'll also simplify InboxWritable->git_dir_latest,
remove some callers, and consider removing it entirely.
|
|
As with the other messages in this callback, there's
nothing we can do about invalid messages ending up in
our Maildirs for -watch.
|
|
{ibx} is shorter and is the most prevalent abbreviation
in indexing and IMAP code, and the `$ibx' local variable
is already prevalent throughout.
In general, the codebase favors removal of vowels in variable
and field names to denote non-references (because references are
"lighter" than non-references).
So update WWW and Filter users to use the same code since
it reduces confusion and may allow easier code sharing.
|
|
For a mirror of lore.kernel.org with >140 inboxes, this speeds
up manifest.js.gz generation from ~1s to 40ms on my HW. This
is still unacceptable when dealing with thousands of inboxes,
but gets us closer to where we need to be.
|
|
This is preferable to open-coding "newsgroup // inboxdir" everywhere.
|
|
We'll be using per-sync-state {ibx} refs instead, so make parts
of the v2 indexing code less-dependent on $self->{ibx} where
$self is a V2Writable object.
|
|
Email::Address::XS and PublicInbox::MsgTime both emit warnings
which are likely to trigger from spam messages. Since this can
be configured to remove spam, just filter out those warnings to
avoid cluttering up stderr with useless information.
|
|
This is more accurate given we use PublicInbox::Eml instead
of Email::MIME/PublicInbox::MIME, nowadays.
|
|
We don't want `local $/' affecting Eml->new, and we can
use implicit returns which may be faster on older Perl.
|
|
While it makes the code flow slightly less well in some places,
it saves us runtime allocations and indentation.
|
|
The default (and fast) TEST_RUN_MODE=2 preloads most modules,
but TEST_RUN_MODE=0 is more realistic and can catch some
problems which may show up in real-world use.
|
|
This will allow us to use InboxIdle on empty/unindexed v1 inboxes.
|
|
For archivists with only newer mail archives, this option allows
reserving reserve NNTP article numbers for yet-to-be-archived
old messages. Indexers will need to be updated to support this
feature in future commits.
-V1 inboxes will now be initialized with SQLite and Xapian
support if this option is used, or if --indexlevel= is
specified.
|
|
InboxWritable should only set $v2w->{parallel} if the $parallel
flag is defined to 0 or 1. We want indexing a new inbox to
utilize SMP, just like --reindex.
-index once again allows -j0/--jobs=0 to force single-process
use, and we'll be ensuring that works in tests to maintain
performance on small systems.
Fixes: 61a2fff5b34a3e32 ("admin: move index_inbox over")
|
|
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
|
|
There's nothing Maildir-specific about the function, so
`maildir_path_load' was a bad name. So give it a more
appropriate name and use it in our tests.
This save ourselves some code and inconsistency by reusing an
existing internal library routine in more places. We can drop
the "From_" line in some of our (formerly) mbox sample files.
|
|
We can't rely on Email::MIME noticing the change to our
scalar ref after calling `PublicInbox::MIME->new'.
This is because Email::MIME::body_set (unlike
Email::Simple::body_set) will copy the contents of the body into
`->{body_raw}' as a new scalar.
Furthermore, we need to escape multiple From lines in the body,
not just the first one, using the `g' modifier to `s//'.
Reported-by: Kyle Meyer <kyle@kyleam.com>
|
|
It's more convenient to specify `-c' / `--compact' on the
command-line when reindexing than it is to invoke
public-inbox-compact(1) separately.
This is especially convenient in low-space situations when
public-inbox-index is operating on multiple inboxes
sequentially, as compaction can happen immediately after
indexing each inbox, instead of waiting until all inboxes are
indexed.
|
|
I didn't wait until September to do it, this year!
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
And update callers to use it, as it makes the code a bit cleaner.
Probably irrelvant, but it should be faster, too, as
"perl -I lib -w -MO=Deparse $FILE" shows REJECT() calls are
constant-folded.
|
|
We've been using this in -edit, and will be using it in some
more scripts and tests to optimize for run_mode=2 with
run_script.
Keeping this in the *Writable modules since I don't see it being
useful for the WWW and NNTP read-only interfaces which use
PublicInbox::Inbox.
|
|
InboxWritable caching the result of ->importer leads to a
circular references with returned (V2Writable|Import) object
holds onto the calling InboxWritable object.
With public-inbox-watch, this leads to a memory leak if a user
is reloading via SIGHUP after a message is imported (it would
only become noticeable with SIGHUPs after every message imported).
I would not expect anybody to to notice this in real-world
usage. I only noticed this since I was making -xcpdb suitable
for long-lived process use (e.g. "mod_perl style") and a flock
remained unreleased on v1 inboxes after resharding.
WatchMaildir (used by -watch) already handles caching of the
importer object itself, and all of our other real-world uses of
->importer are short-lived or designed for batch scripts, so
there's no need to cache the importer result internally.
|
|
And use it for mda, since "0" could be a usable directory
if somebody insists on using relative paths...
|
|
I'm not sure if this will get used anywhere, but at least
call a function which exists in dead code.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
|
|
More work towards being consistent with Xapian's own terminology
|
|
In retrospect, introducing V1Writable was unnecessary and
InboxWritable->importer is in a better position to abstract
away differences between v1 and v2 writers.
So teach InboxWritable to initialize inboxes and get rid
of V1Writable.
|
|
Clearly the AltId stuff was never tested for v2. Ensure
this tricky filter (which reuses Msgmap to avoid introducing
new serial numbers) doesn't trigger deadlocks SQLite due
to opening a DB for writing multiple times.
I went through several iterations of this change before
going with this one, which is the least intrusive I could
fine.
|
|
|
|
It's a convenient wrapper nowadays, so get rid of some legacy
code and minimize differences from the -watch code.
|
|
This is consistent with git itself and the previous behavior
was a result of misunderstanding of how git interprets this.
And adjust tests slightly to match the new behavior.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
<38873789-ab42-65a1-20c9-12c30b171f4f@linuxfoundation.org>
|
|
Ensure -convert and -compact do not make repositories
unreadable on live servers.
|
|
We'll be making sure V2Writable uses this.
|
|
This will make it easier to as well as supporting future
Filter API users. It allows simplifying our ad-hoc
import_vger_from_mbox script.
|
|
This code will be shared with future mass-import tools.
|