about summary refs log tree commit homepage
path: root/lib/PublicInbox/ExtMsg.pm
DateCommit message (Collapse)
2020-01-25s/news.gmane.org/news.gmane.io/
gmane still has a NNTP server, so update links to point to it. cf. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-06hval: export prurl and add prototype
This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2019-12-28search: retry_reopen passes user arg to callback
This allows callers to pass named (not anonymous) subs. Update all retry_reopen callers to use this feature, and fix some places where we failed to use retry_reopen :x
2019-12-27config: each_inbox: pass user arg to callback
Another place where we can replace anonymous subs with named subs by passing a user-supplied arg.
2019-10-09extmsg: drop unused $have_mm variable
We rely on Inbox::mm nowadays.
2019-09-09run update-copyrights from gnulib for 2019
2019-04-27extmsg: escape ampersands in @EXT_URL array
We already escape the user-provided Message-IDs (so there's no security problem AFAIK), but the URL templates which exist in our source code were not escaped properly. This quiets down tidy(1).
2019-01-20extmsg: don't bother partial matching with <16 chars
It's not worth it, and attempts to wildcard off single-character Message-IDs(*) causes Xapian to error out in unpredictable ways: something terrible happened at /usr/lib/x86_64-linux-gnu/perl5/5.24/Search/Xapian/Enquire.pm line 54. ...propagated at lib/PublicInbox/Search.pm line 209. So don't bother. (*) because people blindly hit 'y' or 'n' when git-send-email prompted them for In-Reply-To.
2018-04-22extmsg: use Xapian only for partial matches
"LIKE" in SQLite (and other SQL implementations I've seen) is expensive with nearly 3 million messages in the archives. This caused some partial Message-ID lookups to take over 600ms on my workstation (~300ms on a faster Xeon). Cut that to below under 30ms on average on my workstation by relying exclusively on Xapian for partial Message-ID lookups as we have in the past. Unlike in the past when we tried using Xapian to match partial Message-IDs; we now optimize our indexing of Message-IDs to break apart "words" in Message-IDs for searching, yielding (hopefully) "good enough" accuracy for folks who get long URLs broken across lines when copy+pasting. We'll also drop the (in retrospect) pointless stripping of "/[tTf]" suffixes for the partial match, since anybody who hits that codepath would be hitting an invalid message ID. Finally, limit wildcard expansion to prevent easy DoS vectors on short terms. And blame Pine and alpine for generating Message-IDs with low-entropy prefixes :P
2018-04-18Merge remote-tracking branch 'origin/master' into v2
* origin/master: nntp: allow and ignore empty commands mbox: do not barf on queries which return no results nntp: fix NEWNEWS command searchview: fix non-numeric comparison Allow specification of the number of search results to return githttpbackend: avoid infinite loop on generic PSGI servers http: fix modification of read-only value extmsg: use news.gmane.org for Message-ID lookups extmsg: rework partial MID matching to favor current inbox Update the installation instructions with Fedora package names nntp: do not drain rbuf if there is a command pending nntp: improve fairness during XOVER and similar commands searchidx: do not modify Xapian DB while iterating Don't use LIMIT in UPDATE statements
2018-04-18extmsg: remove expensive git path checks
Searching across different inboxes is expensive without SQLite (or Xapian) installed, so avoid doing expensive tree lookups in git. Since SQLite is required for Xapian support anyways, we won't need to check Xapian, either. Sites without SQLite installed will simply 404 if somebody requests a message which isn't in the current inbox.
2018-03-27extmsg: use news.gmane.org for Message-ID lookups
http://mid.gmane.org/ has not worked for a while, but their NNTP server continues to work. Use that and perhaps give NNTP more exposure. Reported-by: Jonathan Corbet <corbet@lwn.net>
2018-03-19extmsg: rework partial MID matching to favor current inbox
The current inbox is more important for partial Message-ID matching, so we try harder on that to fix common errors before moving onto other inboxes. Then, prevent expensive scanning of other inboxes by requiring a Message-ID length of at least 16 bytes. Finally, we limit the overall partial responses to 200 when scanning other inboxes to avoid excessive memory usage.
2018-03-19extmsg: rework partial MID matching to favor current inbox
The current inbox is more important for partial Message-ID matching, so we try harder on that to fix common errors before moving onto other inboxes. Then, prevent expensive scanning of other inboxes by requiring a Message-ID length of at least 16 bytes. Finally, we limit the overall partial responses to 200 when scanning other inboxes to avoid excessive memory usage.
2018-03-02search: revert to using 'Q' as a uniQue id per-Xapian conventions
'Q' is merely a convention in the Xapian world, and is close enough to unique for practical purposes, so stop using XMID and gain a little more term length as a result.
2018-02-16search: stop assuming Message-ID is unique
In general, they are, but there's no way for or general purpose mail server to enforce that. This is a step in allowing us to handle more corner cases which existing lists throw at us.
2018-02-16extmsg: fix broken Xapian MID lookup
This likely has no real world implications, though, as we fall back to Msgmap lookups anyways. Broken since commit 7eeadcb62729b0efbcb53cd9b7b181897c92cf9a ("search: remove unnecessary abstractions and functionality")
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-03-22extmsg: use updated mail-archive.com URL
Apparently mid.mail-archive.com does not support HTTPS, and the HTTP version redirects to the search query, anyways.
2016-08-14www: do not unecessarily escape some chars in paths
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-13extmsg: reorder and add a more Message-ID lookup services
gmane is down at the moment, so lower that in priority (hopefully it will be brought back up, again). Wikipedia also lists a few more project-specific list providers, so include those as well: https://en.wikipedia.org/wiki/Message-ID
2016-07-17extmsg: favor user-provided URL on partial matches
While an inbox may have multiple URLs, we will favor the existing URL for the current inbox on partial matches to avoid confusing users or slowing them down by requiring a new TCP connection.
2016-07-09www: cleanup parameter passing
Reduce the size of hashes a bit and drops some unneeded hash lookups for uncommon paths.
2016-07-06extmsg: switch to wwwstream for partial match, too
Another step towards a consistent WWW UI...
2016-07-06extmsg: disable automatic inbox switching
Automatic inbox switching was a potentially deceptive pattern and surprises readers who do not check the URL bar closely. Furthermore, a message could be cross-posted to multiple lists, too.
2016-07-06hval: get rid of unused parameter for new_msgid
Exposing compressed Message-IDs in URLs was a mistake, remove a remnant of it.
2016-07-02config: introduce each_inbox for iteration
This fills in the internal lookup hashes and simplifies callers.
2016-07-02extmsg: rework to use Inbox objects
This is less code and hopefully easier-to-understand.
2016-05-14rename most instances of "list" to "inbox"
A public-inbox is NOT necessarily a mailing list, but it could serve as an input point for zero, one, or infinite mailing lists :D
2016-02-26support protocol-relative URLs in publicinbox.$LISTNAME.url
All URL generation in dynamic HTTP pages should be capable of generating "https" or "http" URLs depending on the user's preference.
2016-02-26extmsg: do not modify shared array via prurl
We cannot modify elements in any shared data strucutures shared between requests. Oops!
2016-02-26extmsg: allow returning 404 responses
We will be falling back and cascading to newsgroup lookups, later.
2016-02-25hval: implement common UI for protocol-relative URLs
This allows users to avoid HTTPS -> HTTP downgrade warnings, but we will also avoid encouraging them towards HTTPS, for now. IMHO: the CA system gives a false sense of security, TLS libraries (e.g. OpenSSL) can introduce new bugs and problems (even to attack clients), and TLS libraries also eats memory on cheap servers.
2016-02-25remove direct CGI.pm support
Relying on Plack::Handler::CGI is much easier for long-term maintenance and development. Nowadays, we even include our own httpd implementation to facilitate easier deployment with PSGI/Plack.
2016-02-22extmsg: support "//" protocol-relative URLs
Avoid unintentionally switching protocols if the external site we're linking to supports both HTTP and HTTPS. We do not want to force HTTPS everywhere because potential bugs and performance problems in the TLS stack may outweigh the privacy benefits. Leave up to site authors and users to decide whether they want HTTPS or plain old HTTP.
2016-01-09hval: use more appropriate hvals for documentation
Not needed, but this is good documentation. Some of these values should never have newlines.
2016-01-03www: comments for denoting Plack::Request vs CGI
We'll probably want to continue supporting CGI for mod_perl compatibility.
2015-12-25extmsg: fixup comparison for unknown message types
Fixes commit 4c2c2325d2948ec5340e2fcafbee798cf568f5fd ("rename 'GitCatFile' package to 'Git'")
2015-12-22rename 'GitCatFile' package to 'Git'
We'll be using it for more than just cat-file. Adding a `popen' API for internal use allows us to save a bunch of code in other places.
2015-12-08extmsg: try to fixup common errors
Sometimes users (me :x) blindly append "raw" to a /t/ URL...
2015-11-20various internal documentation updates
Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-09-15extmsg: wire up to use msgmap for prefixes
DBI + DBD::SQLite has much better handling of prefix lookups than Xapian. While we're at it, avoid linking blatantly wrong Message-IDs to external services.
2015-09-05extmsg: add note about the deficiency of the implementation
ref: http://public-inbox.org/meta/20150905091457.GA27857@dcvr.yhbt.net/
2015-09-05extmsg: fall back to partial Message-ID matching
In case a URL gets truncated (as is common with long URLs), we can rely on Xapian for partial matches and bring the user to their destination.
2015-09-04extmsg: close HTML tag in response
Oops, browsers normally render this fine, though.
2015-09-03get rid of Message-ID compression entirely
Provide a fallback for legacy SHA-1 messages, but do not advertise shorter URLs anymore for data portability concerns. This fixes a regression introduced in commit 81a9c1b476987d845b340ab9013d26cf4487cb9a ("search: disable Message-ID compression in Xapian") which ended up breaking thread-related endpoints for large Message-IDs, as lookups on the SHA-1 message no longer worked.
2015-09-03ExtMsg: 300 to external mailing list archives
Since cross-posting is inevitable, we shall link to external message archives for interopability.
2015-09-03search: disable Message-ID compression in Xapian
We'll continue to compress long Message-IDs in URLs (which we know about), but we will store entire Message-IDs in the Xapian database to facilitate ease-of-lookups in external databases.
2015-09-02implement external Message-ID finder
Currently, this looks at other public-inbox configurations served in the same process. In the future, it will generate links to other Message-ID lookup endpoints.