Date | Commit message (Collapse) |
|
I didn't wait until September to do it, this year!
|
|
|
|
RFC3977 6.1.2.2 LISTGROUP allows a [range] arg after [group],
and supporting it allows NNTP support in neomutt to work again.
Tested with NeoMutt 20170113 (1.7.2) on Debian stretch
(oldstable)
|
|
"INSERT OR IGNORE" still bumps the auto-increment counter in
SQLite, which causes gaps to appear in NNTP article numbering.
This bug appeared in v2 repos where V2Writable may call ->add
repeatedly on the same message. This bug is apparent with
public-inbox-watch and work-in-progress IMAP watchers which may
rescan and (attempt to) reinsert the same message on mailbox
changes.
Most uses of public-inbox-mda were not affected, unless the
same message is actually delivered multiple times to the mda.
v1 is not affected, either, since deduplication is only based
on Message-ID and msgmap never sees the duplicate.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
I have never not found double negatives to be confusing...
|
|
Today the only thing that prevents public-inbox not reusing the
message numbers of deleted messages is the sqlite autoincrement magic
and that only works part of the time. The new incremental indexing
test has revealed areas where today public-inbox does try to reuse
numbers of deleted messages.
Reusing the message numbers of existing messages is a problem because
if a client ever sees messages that are subsequently deleted the
client will not see the new messages with their old numbers.
In practice this is difficult to trigger because it requires the most
recently added message to be removed and have the removal show up in a
separate pull request. Still it can happen and it should be handled.
Instead of infering the highset number ever used by finding the maximum
number in the message map, track the largest number ever assigned directly.
Update Msgmap to track this value and update the indexers to use this
value.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
All callers in expect to iterate through results. This
was causing unfairness when fetching large ranges via XHDR
as rtin does :<
Fixes: b8c41362f2a5c8fc "nntp: simplify the long_response API"
|
|
"LIKE" in SQLite (and other SQL implementations I've seen) is
expensive with nearly 3 million messages in the archives.
This caused some partial Message-ID lookups to take over 600ms
on my workstation (~300ms on a faster Xeon). Cut that to below
under 30ms on average on my workstation by relying exclusively
on Xapian for partial Message-ID lookups as we have in the past.
Unlike in the past when we tried using Xapian to match partial
Message-IDs; we now optimize our indexing of Message-IDs to
break apart "words" in Message-IDs for searching, yielding
(hopefully) "good enough" accuracy for folks who get long URLs
broken across lines when copy+pasting.
We'll also drop the (in retrospect) pointless stripping of
"/[tTf]" suffixes for the partial match, since anybody who
hits that codepath would be hitting an invalid message ID.
Finally, limit wildcard expansion to prevent easy DoS vectors
on short terms.
And blame Pine and alpine for generating Message-IDs with
low-entropy prefixes :P
|
|
We can't have files with permissions inconsistent with what's
in git objects.
|
|
* origin/master:
nntp: allow and ignore empty commands
mbox: do not barf on queries which return no results
nntp: fix NEWNEWS command
searchview: fix non-numeric comparison
Allow specification of the number of search results to return
githttpbackend: avoid infinite loop on generic PSGI servers
http: fix modification of read-only value
extmsg: use news.gmane.org for Message-ID lookups
extmsg: rework partial MID matching to favor current inbox
Update the installation instructions with Fedora package names
nntp: do not drain rbuf if there is a command pending
nntp: improve fairness during XOVER and similar commands
searchidx: do not modify Xapian DB while iterating
Don't use LIMIT in UPDATE statements
|
|
Some messages to git@vger went missing from Msgmap from old bugs
and became inaccessible via NNTP. Forcing NNTP article numbers
when the overview DB came about made the problem more visible when
reindexing old (v1) repositories as all removed spam messages
took up AUTOINCREMENT numbers again before they were removed.
Having large gaps in NNTP article numbers is not good since it
throws off NNTP clients. This does NOT prevent NNTP clients from
seeing some messages twice, but is better than having them
miss several messages entirely.
We also avoid depending on --reverse in git-log, as
git requires storing an entire commit list in memory for
--reverse, so it's cheaper to store only deleted blobs in the %D
hash since they do not live long.
|
|
This significantly improves the performance of the NNTP GROUP
command with 2.7 million messages from over 250ms to 700us.
SQLite is weird about this, but at least there's a way to
optimize it.
|
|
For upgrades, this will let users keep an old version
running while performing "public-inbox-index" on the
newest version.
|
|
This is important for people running mirrors via "git fetch",
as they need to be kept up-to-date. Purging is also now
supported in mirrors.
The short-lived "--regenerate" option is gone and is now
implicitly enabled as a result. It's still cheap when
article number regeneration is unnecessary, as we track
the range for each git repository.
|
|
We we worked around the default range/termination conditions of
long_response in many cases to reduce calls to SQLite or Xapian.
So continue that trend and become more like the PSGI API
which doesn't force callers to specify an article range or
work inside a loop.
|
|
id_batch had a an overly complicated interface, replace it
with id_batch which is simpler and takes advantage of
selectcol_arrayref in DBI. This allows simplification of
callers and the diffstat agrees with me.
|
|
This ought to provide better performance and scalability
which is less dependent on inbox size. Xapian does not
seem optimized for some queries used by the WWW homepage,
Atom feeds, XOVER and NEWNEWS NNTP commands.
This can actually make Xapian optional for NNTP usage,
and allow more functionality to work without Xapian
installed.
Indexing performance was extremely bad at first, but
DBI::Profile helped me optimize away problematic queries.
|
|
This still requires a msgmap.sqlite3 file to exist, but
it allows us to tweak Xapian indexing rules and reindex
the Xapian database online while -watch is running.
|
|
This will be used to keep track of Message-ID <-> NNTP Article
numbers to prevent article number reuse when reindexing.
|
|
We need to hide removals from anybody hitting the search engine.
|
|
...not all distributions build SQLite with that enabled.
[ew: LIMIT shouldn't be necessary because `key' is primary]
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
It is needless bloat and doesn't seem to help with readability,
in retrospect, either.
|
|
This prevents public-inbox-watch from dying when reloading
(and thus rescanning) already-imported directories.
|
|
This will allow smoother imports as occasional Message-ID
duplicates happen and the best we can do is ignore the
second one.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
We want transactions to be the responsibility of the
caller when possible; this fixes the potential for
the msgmap to internally become inconsistent when
using it from inside searchidx.
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
This doesn't seem to do anything on my older system, but maybe it
will in newer or future versions of DBD::SQLite. Anyways it can
be helpful for documentation purposes, too.
|
|
It doesn't actually give performance improvements unless we
use types with "my", but we don't do that. We'll only continue
using fields with Danga::Socket-derived classes where they're
required.
|
|
This doesn't actually change anything as the constant is still
usable in other subroutines, but helps with consistency and
readability IMHO.
|
|
LISTGROUP can be expensive for giant groups, too. Use the
long response API to improve fairness and prevent excessive
buffering.
|
|
Implementing NEWNEWS, XHDR, XOVER efficiently will require
additional caching on top of msgmap.
This seems to work with lynx and slrnpull, haven't tried clients.
DO NOT run in production, yet, denial-of-service vulnerabilities
await!
|
|
This will allow us to maintain stable article numbers for an
NNTP server independently of Xapian.
|