about summary refs log tree commit homepage
path: root/lib/PublicInbox/Msgmap.pm
DateCommit message (Collapse)
2018-04-04v2: support incremental indexing + purge
This is important for people running mirrors via "git fetch", as they need to be kept up-to-date. Purging is also now supported in mirrors. The short-lived "--regenerate" option is gone and is now implicitly enabled as a result. It's still cheap when article number regeneration is unnecessary, as we track the range for each git repository.
2018-04-03nntp: simplify the long_response API
We we worked around the default range/termination conditions of long_response in many cases to reduce calls to SQLite or Xapian. So continue that trend and become more like the PSGI API which doesn't force callers to specify an article range or work inside a loop.
2018-04-03msgmap: replace id_batch with ids_after
id_batch had a an overly complicated interface, replace it with id_batch which is simpler and takes advantage of selectcol_arrayref in DBI. This allows simplification of callers and the diffstat agrees with me.
2018-04-02replace Xapian skeleton with SQLite overview DB
This ought to provide better performance and scalability which is less dependent on inbox size. Xapian does not seem optimized for some queries used by the WWW homepage, Atom feeds, XOVER and NEWNEWS NNTP commands. This can actually make Xapian optional for NNTP usage, and allow more functionality to work without Xapian installed. Indexing performance was extremely bad at first, but DBI::Profile helped me optimize away problematic queries.
2018-03-22v2writable: support reindexing Xapian
This still requires a msgmap.sqlite3 file to exist, but it allows us to tweak Xapian indexing rules and reindex the Xapian database online while -watch is running.
2018-03-22msgmap: add tmp_clone to create an anonymous copy
This will be used to keep track of Message-ID <-> NNTP Article numbers to prevent article number reuse when reindexing.
2018-03-19v2writable: implement remove correctly
We need to hide removals from anybody hitting the search engine.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-06-26msgmap: reduce constant usage
It is needless bloat and doesn't seem to help with readability, in retrospect, either.
2017-06-23msgmap: ignore duplicates instead of dying
This prevents public-inbox-watch from dying when reloading (and thus rescanning) already-imported directories.
2017-06-22msgmap: mid_insert ignores duplicates instead of die-ing
This will allow smoother imports as occasional Message-ID duplicates happen and the best we can do is ignore the second one.
2016-08-11search: support alt-ID for mapping legacy serial numbers
For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-07-31msgmap: fix use of transactions
We want transactions to be the responsibility of the caller when possible; this fixes the potential for the msgmap to internally become inconsistent when using it from inside searchidx.
2015-11-20various internal documentation updates
Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-10-02Msgmap: pass ReadOnly DBI flag for non-writable opens
This doesn't seem to do anything on my older system, but maybe it will in newer or future versions of DBD::SQLite. Anyways it can be helpful for documentation purposes, too.
2015-09-30remove unnecessary fields usage
It doesn't actually give performance improvements unless we use types with "my", but we don't do that. We'll only continue using fields with Danga::Socket-derived classes where they're required.
2015-09-21msgmap: minor cleanup to move constant declaration
This doesn't actually change anything as the constant is still usable in other subroutines, but helps with consistency and readability IMHO.
2015-09-19nntp: use long response API for LISTGROUP
LISTGROUP can be expensive for giant groups, too. Use the long response API to improve fairness and prevent excessive buffering.
2015-09-18read-only NNTP server
Implementing NEWNEWS, XHDR, XOVER efficiently will require additional caching on top of msgmap. This seems to work with lynx and slrnpull, haven't tried clients. DO NOT run in production, yet, denial-of-service vulnerabilities await!
2015-09-15msgmap: add message mapping via SQLite
This will allow us to maintain stable article numbers for an NNTP server independently of Xapian.