about summary refs log tree commit homepage
path: root/lib/PublicInbox/AltId.pm
DateCommit message (Collapse)
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-03-25www: add endpoint to retrieve altid dumps
This ensures all our indexed data, including data from altid searches (e.g. "gmane:$ARTNUM") is retrievable. It uses a "POST" request to avoid wasting cycles when invoked by crawlers, since it could potentially be several megabytes of data not indexable by search engines.
2020-03-25altid: warn about non-word prefixes
We only support searching on prefixes matching /\A\w+\z/ because Xapian requires ':' to delimit the prefix and splits on spaces without quotes. I've also verified Xapian supports multibyte UTF-8 characters, underscores, and bare numbers as search prefixes, so there's no need to restrict it beyond what Perl's UTF-8 aware \w character class offers.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-06altid: use msgmap at compile time
AltId requires Msgmap to work, which requires SQLite. Search also requires SQLite3 (for Over), nowadays, so there's no reason for us to lazy-load Msgmap and SQLite anymore.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-09run update-copyrights from gnulib for 2019
2019-04-16cleanup: use '$ibx' consistently when referring to Inbox refs
'$inbox' is more human-readable, so that is for the more human-readable name in most cases. Making our variable naming more consistent should make the code easier-to-review and harder to screw up.
2019-01-09doc: various overview-level module comments
Hopefully this helps people familiarize themselves with the source code.
2018-04-06altid: fix miscopied field name
Oops :x
2018-04-05support altid mechanism for v2
There's enough gmane links out there in wild that it makes sense to maintain support for these mappings.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-06-22add filter for RubyLang lists
Unfortunately, it appears we have to reject this and instead add support filtering at View time(*), due to DKIM signatures in messages from ruby-lang.org. (*) which may not be worth it
2016-08-11search: support alt-ID for mapping legacy serial numbers
For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.