about summary refs log tree commit homepage
path: root/lib/PublicInbox/NNTP.pm
DateCommit message (Collapse)
2020-11-04nntp: attempt RFC 5536 3.1.5-conformant Path: headers
Perhaps some NNTP clients would be unhappy with the old value "y". So use a bit more bandwidth+space to use the server-name and historical "!not-for-mail" tail-entry to better conform to a published RFC. Reported-by: Andrey Melnikov <temnota.am@gmail.com>
2020-11-04nntp: delimit Newsgroup: header with commas
...instead of spaces. This is specified in RFC 5536 3.1.4. Include references to RFC 1036, 5536 and 5537 in our docs while we're at it. Reported-by: Andrey Melnikov <temnota.am@gmail.com> Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
2020-10-30tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so bail out by closing the underlying socket rather than stopping the event loop and daemon process.
2020-09-12nntp: share more code between art_lookup callers
This prepares us for future changes to improve scalability to many inboxes.
2020-09-10nntp: fix cross-newsgroup Message-ID lookups
We cannot blindly use the selected newsgroup for HEAD/ARTICLE/BODY requests using Message-ID, since those commands look across all newsgroups; not just the selected one (if any). So stuff a reference to the Inbox object into $smsg. We can reduce args passed into set_nntp_headers() and msg_hdr_write(), too. Fixes: 0e6ceff37fc38f28 ("nntp: support slow blob retrievals")
2020-08-02nntp: fix STAT command
The return value of art_lookup changed but this command wasn't updated since it wasn't tested. Fixes: 0e6ceff37fc38f28 ("nntp: support slow blob retrievals")
2020-07-06daemon: warn on missing blobs
Since -edit and -purge should be rare and TOCTOU around them rarer still; missing {blobs} could be indicative of a real bug elsewhere. Warn on them. And I somehow ended up with 3 different field names for Inbox objects. Perhaps they'll be made consistent in the future.
2020-06-28ds: remove fields.pm usage
Since the removal of pseudo-hash support in Perl 5.10, the "fields" module no longer provides the space or speed benefits it did in 5.8. It also does not allow for compile-time checks, only run-time checks. To me, the extra developer overhead in maintaining "use fields" args has become a hassle. None of our non-DS-related code uses fields.pm, nor do any of our current dependencies. In fact, Danga::Socket (which DS was originally forked from) and its subclasses are the only fields.pm users I've ever encountered in the wild. Removing fields may make our code more approachable to other Perl hackers. So stop using fields.pm and locked hashes, but continue to document what fields do for non-trivial classes.
2020-06-25git_async_cat: remove circular reference
While this circular reference was carefully managed to not leak memory; it was still triggering a warning at -imapd/-nntpd shutdown due to the EPOLL_CTL_DEL op failing after the $Epoll FD gets closed. So remove the circular reference by providing a ref to `undef', instead.
2020-06-21nntp: support slow blob retrievals
Having `git cat-file' as a separate process naturally lends itself to asynchronous dispatch. Our event loop for -nntpd no longer blocks on slow git storage. Pipelining in -imapd was tricky and bugs were exposed by mbsync(1). Update t/nntpd.t to support pipelining ARTICLE requests to ensure we don't have the same problems -imapd did during development.
2020-06-21nntp: event_step: prepare for async git reads
This matches PublicInbox::IMAP::event_step and will allow us to handle blob retrievals from git asynchronously without falling over on pipelined requests.
2020-06-21daemon: use ->can to check for IO::Socket::SSL
Doing a ref($obj) string comparison ties us to IO::Socket::SSL (and OpenSSL) In the future, we may support GnuTLS or other TLS implementations. This was already done in the IMAP code.
2020-06-13nntpd+imapd: detect replaced over.sqlite3
For v1 inboxes (and possibly v2 in the future, for VACUUM), public-inbox-compact replaces over.sqlite3 with a new file. This currently doesn't need an extra inotify watch descriptor (or FD for kevent) at the moment, so it can coexist nicely for systems w/o IO::KQueue or Linux::Inotify2.
2020-06-03smsg: remove remaining accessor methods
We'll continue to favor simpler data models that can be used directly rather than wasting time and memory with accessor APIs. The ->from, ->to, -cc, ->mid, ->subject, >references methods can all be trivially replaced by hash lookups since all their values are stored in doc_data. Most remaining callers of those methods were test cases, anyways. ->from_name is only used in the PSGI code, so we can just use ->psgi_cull to take care of populating the {from_name} field.
2020-06-03nntp: smsg_range_i: favor ->{$field} lookups when possible
PublicInbox::Smsg::date remains the only exception which requires any subroutine calls, here, so we'll just have a branch just for that.
2020-05-09switch read-only Email::Simple users to Eml
Since PublicInbox::Eml doesn't parse MIME subparts up front, it can replace most uses of Email::Simple without performance penalty. This will eventually allow us to lower overall internal API footprint by not having to keep the MIME vs Simple distinction.
2020-04-22make zlib-related modules a hard dependency
This allows us to simplify some of our existing code and make future changes easier. I doubt anybody goes through the trouble to have a Perl installation without zlib support. The zlib source code is even bundled with Perl since 5.9.3 for systems without existing zlib development headers and libraries. Of course, zlib is also a requirement of git, too; and we're not going to stop using git :) [squashed: "wwwaltid: use gzipfilter up front"]
2020-04-19reduce scope of mbox From_ line removal
It's unnecessary overhead for anything which does Email::MIME parsing. It was never done for v2 indexing, even though v1->v2 conversions did NOT remove those From_ lines. There was never a need to remote From_ lines the v1 SearchIdx paths, either. Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message thread reveals a ~0.5% speed improvement. This will become more apparent when we have a faster MIME parser.
2020-04-02nntp: allow multiple spaces or tabs to delimit args
While this is not a known problem in practice, RFC 3977 section 3.1 states: Keywords and arguments MUST each be separated by one or more space or TAB characters.
2020-04-02mid: add $MID_EXTRACT regexp for export
This allows us to consistently enforce the same Message-ID extraction rules everywhere and makes it easier for us to make changes in the future. Update scripts/ssoma-replay, as well, but don't rely on PublicInbox::* modules in that since it's legacy and public-inbox was never a dependency of ssoma.
2020-03-22rename PublicInbox::SearchMsg => PublicInbox::Smsg
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-24nntp: simplify setting X-Alt-Message-ID
We can cut down on the number of operations required using "grep" instead of "foreach".
2020-01-13ds|http|nntp: simplify {wbuf} population
We can rely on autovification to turn `undef' value of {wbuf} into an arrayref. Furthermore, "push" returns the (new) size of the array since at least Perl 5.0 (I didn't look further back), so we can use that return value instead of calling "scalar" again.
2020-01-08nntp: correctly log long response errors
We cannot safely call "fileno(undef)" without bringing down the entire -nntpd process :x. To ensure no logging regression, we now stash the FD for the duration of the long response to ensure the error can be matched to the original command in logs. Fixes: 207b89615a1a0c06 ("nntp: remove cyclic refs from long_response")
2020-01-01nntp: handle 2-digit year "70" properly
Time::Local has the concept of a "rolling century" which is defined at 50 years on either side of the current year. Since it's now 2020 and >50 years since the Unix epoch, the year "70" gets interpreted by Time::Local as 2070-01-01 instead of 1970-01-01. Since NNTP servers are unlikely to store messages from the future, we'll feed 4-digit year to Time::Local::{timegm,timelocal} and hopefully not have to worry about things until Y10K. This fixes test failures on t/v2writable.t and t/nntpd.t since 2020-01-01.
2019-12-22nntp: cmd_xover: use named sub for long_response
Introduce xover_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: hdr_msg_id: use named sub for long_response
Introduce hdr_msgid_range_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: cmd_newnews: use named sub for long_response
Introduce newnews_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: cmd_listgroup: use named subs for long_response
Introduce listgroup_range_i and listgroup_all_i subs which do the same things as the anonymous subs they replace.
2019-12-22nntp: cmd_xrover: use named sub for long_response
Introduce xrover_i which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: hdr_searchmsg: use named sub for numeric range response
Introduce searchmsg_range_i, which does the same thing as the anonymous sub it replaces.
2019-12-22nntp: remove cyclic refs from long_response
Leftover cyclic references are a source of memory leaks. While our code is AFAIK unaffected by such leaks at the moment, eliminating a potential source of bugs will make maintenance easier. We make the long_response API cycle-free by stashing the callback into the NNTP object. However, callers will need to be updated to get rid of the circular reference to $self. We do that be replacing anonymous subs with name subroutine references, such as xref_range_i replacing the formerly anonymous sub inside hdr_xref.
2019-12-22nntp: get_range: return scalarref for $beg
...Instead of just returning a plain scalar inside an arrayref. This is because we usually pass the result of NNTP::get_range to Msgmap::msg_range. Upcoming changes will move us away from anonymous subroutines, so this change will make followup commits easier-to-digest as modifications to the underlying scalar can be more easily propagated between non-anonymous-subs.
2019-12-22nntp: get rid of some unused imports
Our NNTP code no longer relies on search or Xapian. Msgmap and Git modules are loaded anyways through Inbox->(git|mm|over) methods, however.
2019-12-22nntp: simplify method detection using UNIVERSAL::can
No need to do an eval dance or disable strict refs.
2019-12-14ds: move NNTP-only expiration code into DS
We'll be supporting idle timeout for the HTTP code in the future to deal directly with Internet-exposed clients w/o Varnish or nginx.
2019-12-14ds: move EvCleanup code into DS
EvCleanup only existed since Danga::Socket was a separate component, and cleanup code belongs with the event loop.
2019-09-08nntp: regexp always consumes rbuf if "\n" exists
We don't want to get hung into a state where we see "\n" via index(), yet cannot consume rbuf in the while loop. So tweak the regexp to ensure we always consume rbuf. I suspect this is what causes occasional 100% CPU usage of -nntpd, but reproducing it's been difficult..
2019-09-08nntp: fix redundant CRLF from "LISTGROUP GROUP RANGE"
Since Net::NNTP::listgroup doesn't support the range parameter, I had to test this manually and noticed extra CRLF were emitted.
2019-07-13nntp: support optional [range] arg in LISTGROUP
RFC3977 6.1.2.2 LISTGROUP allows a [range] arg after [group], and supporting it allows NNTP support in neomutt to work again. Tested with NeoMutt 20170113 (1.7.2) on Debian stretch (oldstable)
2019-07-13nntp: fix LIST OVERVIEW.FMT ordering and format
RFC3977 8.4.2 mandates the order of non-standard headers to be after the first seven standard headers/metadata; so "Xref:" must appear after "Lines:"|":lines". Additionally, non-required header names must be followed by ":full". Cc: Jonathan Corbet <corbet@lwn.net> Reported-by: Urs Janßen <E1hmKBw-0008Bq-8t@akw>
2019-07-12nntp: clear local timer on idle client expiry
We need to ensure further timers can be registered if there's currently no idle clients.
2019-07-10http|nntp: avoid recursion inside ->write
In HTTP.pm, we can use the same technique NNTP.pm uses with long_response with the $long_cb callback and avoid storing $pull in the per-client structure at all. We can also reuse the same logic to push the callback into wbuf from NNTP. This does NOT introduce a new circular reference, but documents it more clearly.
2019-07-08http|nntp: "use PublicInbox::DS" instead of ->import
Relying on "use" to import during BEGIN means we get to take advantage of prototype checking of function args during the rest of the compilation phase.
2019-07-07nntp: improve error reporting for COMPRESS
Add some checks for errors at initialization, though there's not much that can be done with ENOMEM-type errors aside from dropping clients. We can also get rid of the scary FIXME for MemLevel=8. It was a stupid error on my part in the original per-client deflate stream implementation calling C::R::Z::{Inflate,Deflate} in array context and getting the extra dualvar error code as a string result, causing the {zin}/{zout} array refs to have extra array elements.
2019-07-06nntp: reduce memory overhead of zlib
Using Z_FULL_FLUSH at the right places in our event loop, it appears we can share a single zlib deflate context across ALL clients in a process. The zlib deflate context is the biggest factor in per-client memory use, so being able to share that across many clients results in a large memory savings. With 10K idle-but-did-something NNTP clients connected to a single process on a 64-bit system, TLS+DEFLATE used around 1.8 GB of RSS before this change. It now uses around 300 MB. TLS via IO::Socket::SSL alone uses <200MB in the same situation, so the actual memory reduction is over 10x. This makes compression less efficient and bandwidth increases around 45% in informal testing, but it's far better than no compression at all. It's likely around the same level of compression gzip gives on the HTTP side. Security implications with TLS? I don't know, but I don't really care, either... public-inbox-nntpd doesn't support authentication and it's up to the client to enable compression. It's not too different than Varnish caching gzipped responses on the HTTP side and having responses go to multiple HTTPS clients.
2019-07-06nntp: support COMPRESS DEFLATE per RFC 8054
This is only tested so far with my patches to Net::NNTP at: https://rt.cpan.org/Ticket/Display.html?id=129967 Memory use in C10K situations is disappointing, but that's the nature of compression. gzip compression over HTTPS does have the advantage of not keeping zlib streams open when clients are idle, at the cost of worse compression.
2019-07-06nntp: move LINE_MAX constant to the top
It'll be accessible from other places, and there was no real point in having it declared inside a sub.
2019-07-06nntp: use msg_more as a method
It's a tad slower, but we'll be able to subclass this to rely on zlib deflate buffering. This is advantageous for TLS clients since (AFAIK) IO::Socket::SSL/OpenSSL doesn't give us ways to use MSG_MORE or writev(2) like like GNUTLS does.