Date | Commit message (Collapse) |
|
This is important for people running mirrors via "git fetch",
as they need to be kept up-to-date. Purging is also now
supported in mirrors.
The short-lived "--regenerate" option is gone and is now
implicitly enabled as a result. It's still cheap when
article number regeneration is unnecessary, as we track
the range for each git repository.
|
|
We we worked around the default range/termination conditions of
long_response in many cases to reduce calls to SQLite or Xapian.
So continue that trend and become more like the PSGI API
which doesn't force callers to specify an article range or
work inside a loop.
|
|
id_batch had a an overly complicated interface, replace it
with id_batch which is simpler and takes advantage of
selectcol_arrayref in DBI. This allows simplification of
callers and the diffstat agrees with me.
|
|
This ought to provide better performance and scalability
which is less dependent on inbox size. Xapian does not
seem optimized for some queries used by the WWW homepage,
Atom feeds, XOVER and NEWNEWS NNTP commands.
This can actually make Xapian optional for NNTP usage,
and allow more functionality to work without Xapian
installed.
Indexing performance was extremely bad at first, but
DBI::Profile helped me optimize away problematic queries.
|
|
This still requires a msgmap.sqlite3 file to exist, but
it allows us to tweak Xapian indexing rules and reindex
the Xapian database online while -watch is running.
|
|
This will be used to keep track of Message-ID <-> NNTP Article
numbers to prevent article number reuse when reindexing.
|
|
We need to hide removals from anybody hitting the search engine.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
It is needless bloat and doesn't seem to help with readability,
in retrospect, either.
|
|
This prevents public-inbox-watch from dying when reloading
(and thus rescanning) already-imported directories.
|
|
This will allow smoother imports as occasional Message-ID
duplicates happen and the best we can do is ignore the
second one.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
We want transactions to be the responsibility of the
caller when possible; this fixes the potential for
the msgmap to internally become inconsistent when
using it from inside searchidx.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
This doesn't seem to do anything on my older system, but maybe it
will in newer or future versions of DBD::SQLite. Anyways it can
be helpful for documentation purposes, too.
|
|
It doesn't actually give performance improvements unless we
use types with "my", but we don't do that. We'll only continue
using fields with Danga::Socket-derived classes where they're
required.
|
|
This doesn't actually change anything as the constant is still
usable in other subroutines, but helps with consistency and
readability IMHO.
|
|
LISTGROUP can be expensive for giant groups, too. Use the
long response API to improve fairness and prevent excessive
buffering.
|
|
Implementing NEWNEWS, XHDR, XOVER efficiently will require
additional caching on top of msgmap.
This seems to work with lynx and slrnpull, haven't tried clients.
DO NOT run in production, yet, denial-of-service vulnerabilities
await!
|
|
This will allow us to maintain stable article numbers for an
NNTP server independently of Xapian.
|