Date | Commit message (Collapse) |
|
This matches the behavior of Maildir `watchspam' handling in not
removing unseen messages. NNTP can't match this behavior, since
NNTP servers don't store flags, clients do.
|
|
There's no need for this to be a separate sub since there's
only a single caller. This saves a few kilobytes at least
in short-lived processes.
|
|
It's no problem for most users to enable WAL, here, since
there's only a single process doing both reading and writing
(unlike the read-only daemons). However, WAL doesn't work on
network filesystems, so it can't be enabled by default.
|
|
For consistency in output, any URL/path-context-dependent
prefixes should have the same prefix as the actual warning which
triggered it.
|
|
I'm seeing "read: Connection timed out" from in my syslog from
-httpd. The fail() calls in PublicInbox::Git seems to be the
only code path of ours which could trigger it...
ETIMEDOUT shouldn't happen on pipes, only sockets; and all of
our socket operations are non-blocking. So this could be
cgit-wwwhighlight-filter.lua, but that's connecting over
localhost, though on fairly loaded HW.
|
|
A `PI_XAPIAN' environment variable is now exposed for testing
purposes. We'll also deal with the removal of
`NumberValueRangeProcessor' and use `NumberRangeProcessor'
in its place, but continue favoring the old Search::Xapian
since that's all that's packaged for Debian 10.x stable.
|
|
We use the defined-or (`//', `//=') operators in 5.10,
so require 5.10.1 like the rest of our codebase. Update
an outdated comment while we're at it.
|
|
v5.10.1 lets us use the lighter parent.pm instead of base.pm,
and we'll rely on the shebang to enable warnings (or not).
While we're in the area, drop a no-longer-necessary import for
PublicInbox::Search, since OverIdx doesn't require search.
|
|
As noted in commit 87dca6d8d5988c5eb54019cca342450b0b7dd6b7
("www: rework query responses to avoid COUNT in SQLite"),
COUNT on many rows is expensive on big SQLite DBs.
We've already stopped using that code path long ago in WWW
while -imapd and -nntpd never used it. So we'll adjust our
remaining test cases to not need it, either.
|
|
Since we got rid of over->connect, `disconnect' no longer pairs
with it. So name it after the `close(2)' syscall it ultimately
issues.
|
|
`->connect' is confused with the perlfunc for the `connect(2)'
syscall, and also `DBI->connect'. Since SQLite doesn't use
sockets, the word "connect" needlessly confuses me. Give
it a short name to match the field name we use for it, which
also matches the variable name used by the DBI(3pm) and
DBD::SQLite(3pm) manpages.
|
|
The SWIG binding won't auto-convert IV/UV to PV like the XS
Search::Xapian binding would, so workaround that shortcoming
for now.
Fixes: a367ec1b15a2458 ("mbox: disable "&t" on existing Xapian until full reindex")
|
|
WAL actually seems to have ideal locking characteristics given
concurrency problems I'm experiencing with --reindex running
in parallel with expensive read-only SQLite queries:
<https://public-inbox.org/meta/20200825001204.GA840@dcvr/>
Unfortunately, we cannot blindly use WAL while preserving
compatibility with existing setups nor our guarantees that
read-only daemons are indeed "read-only".
However, respect an user's the choice to set WAL on their
own if they're comfortable with giving -nntpd/-httpd/-imapd
processes write permission to the directory storing SQLite DBs.
|
|
It's fewer queries and matches what we do in OverIdx.
|
|
This file gets truncated anyhow, so it won't fragment.
|
|
croak() can give more context on the failure, and setting
`PERL5OPT=-MCarp=verbose' can force a stacktrace.
|
|
There's no reason we'd want Xapian to defer flushing once we've
indexed everything belonging to a particular shard.
|
|
Expanding threads via over.sqlite3 for mbox.gz downloads without
Xapian effectively collapsing on the THREADID column leads to
repeated messages getting downloaded.
To avoid that situation, use a "has_threadid" Xapian metadata
flag that's only set on --reindex (and brand new Xapian DBs).
This allows admins to upgrade WWW or do --reindex in any order;
without worrying about users eating up bandwidth and CPU cycles.
|
|
Finally, the addition of THREADID for collapsing results
in Xapian lets us emulate the "mairix --threads" feature.
That is, instead of returning only the matching messages,
the entire thread is included in the downloaded mbox.gz
This requires a "public-inbox-index --reindex" to be usable.
|
|
This is the `tid' column from over.sqlite3; and will be used for
IMAP and JMAP search (among other things).
|
|
We'll also rename the /^remote_/ prefix to "shard_", since
remote implies the process is on a different host. These
methods only pass messages to a child process on the same host
OR perform operations within the same process.
|
|
Merely assigning `undef' to a scalar does not free the
underlying buffer memory of a scalar.
|
|
Unlike w3m and links, the lynx browser seems to require a `name'
attribute for `<input type=submit>' elements. Maybe some other
browsers do, too. The `name' attribute for submit elements
doesn't seem to cause any harm for w3m or links, users, either;
despite not (AFAIK) being part of historical or current HTML
specs.
|
|
We can avoid importing mdocid() in several places by using
this method, simplifying callers.
|
|
Since we no longer read document data from Xapian, allow users
to opt-out of storing it.
This breaks compatibility with previous releases of
public-inbox, but gives us a ~1.5% space savings on Xapian
storage (and associated I/O and page cache pressure reduction).
|
|
We no longer read docdata.glass from anywhere in our code base.
Some adjustments were needed to t/search.t to deal with the
Xapian::WritableDatabase committing at different times, since
our ->query is avoided from PublicInbox::SearchIdx to avoid
needing a {over_ro} field.
|
|
Another place where we can reduce kernel page cache overhead
by hitting over.sqlite3 instead of docdata.glass.
|
|
Once again, over.sqlite3 contains everything necessary for
Message-ID resolution. Also, Xapian may be completely
unnecessary with the advent of over.sqlite3, but that's for
another time.
|
|
git blob retrieval dominates on these, "&x=t" (nested) is
roughly the same due to increased overhead for ->get_percent
storage balancing out the mass-loading from SQLite.
Atom "&x=A" is sped up slightly and uses less memory in the
long-lived response.
|
|
Instead of loading one article at-a-time from over.sqlite3, we
can use SQL to mass-load IN (?,?, ...) all results with a single
SQLite query. Despite SQLite being in-process and having no
network latency, the reduction in SQL query executions from
loading multiple rows at once speeds things up significantly.
We'll keep the over->get_art optimizations from the previous
commit, since it still speeds up long-lived responses, slightly.
|
|
This is a step towards improving kernel page cache hit rates by
relying on over.sqlite3 for document data instead of Xapian.
Some micro-optimization to over->get_art was required to
maintain performance.
|
|
Both callers of load_from_data call utf8::decode, so just
do utf8::decode in load_from_data.
|
|
We'll probably be reusing it from another package in a future commit.
|
|
Since this was already a separate package, split it off
into its own file since SearchView may not handle inbox
groups.
|
|
No need to have awkward globrefs for this.
|
|
We'll probably be adding more value columns like THREADID to sort
on.
|
|
While this is unlikely to be a problem in current practice,
keeping Xapian DBs open for long responses can interfere with
free space recovery after -compact.
In the future, it will interfere with inbox search grouping
and lead to unexpected results.
|
|
No need to localize it, here, since we can just refer to it
in the `$opt' hashref. Hopefully this improves readability
for others like it does for me.
I sometimes wonder if the concept of a stack in high-level
languages is even necessary...
|
|
This seems required to correctly get the NNTP article number
from Xapian docid on combined Xapian DBs. The default
(ASCII-betical) sorting was only acceptable for -imapd users
until somebody hit 11 (or more) shards, which is a rare case.
|
|
This is helpful with --all, or when multiple inboxes
are being indexed.
|
|
Otherwise things get very confusing when verbosity is enabled :x
|
|
There may be messages in the wild with wide characters in
headers which aren't non-RFC2047 encoded. Assume UTF-8 so
those fields can round trip through over.sqlite3.
This doesn't affect docdata.glass in Xapian, but it does
affect how over.sqlite3 stores the same deflated info.
|
|
We use IdxStack via log2stack() from SearchIdx, now.
|
|
--sequential-shard also disables the copy parallelism (--jobs),
so it can be useful for systems unable to handle parallel random
I/O but still want many shards.
There was a missing "use strict", too, which is fixed.
|
|
Established tools like make(1), prove(1) and xargs(1) don't warn
when the desired parallelism level can't be met, either.
|
|
In case there's unbalanced shards AND we're limiting parallelism
while using many shards, spawn the next task in the queue ASAP
once a task is done, instead of waiting for all tasks to finish
before spawning the next batch.
Unbalanced shards probably isn't a big issue for most users;
however many smaller shards with few jobs can be useful for HDD
users to reduce the effect of random writes.
|
|
We don't need to fully-qualify when referring to subs in
the same namespace, nor do we need make a SCALAR ref only
to dereference it
(Yes, still learning Perl :x)
|
|
Converting v1 inboxes from v2 can be a painful experience
on HDD. Some of the new options in the CLI or config
file make it less painful.
|
|
The rest of our indexing code uses `$opt' instead of `$opts'.
|
|
Move away from hard-to-read alllowercase naming and favor
snake_case or separated-by-dashes.
We'll keep `--indexlevel' as-is for now, since it's been around
for several releases; but we'll support `--index-level' in the
CLI and update our documentation in a few months.
We'll also clarify that publicInbox.indexMaxSize is only
intended for -index, and not -watch or -mda.
|