about summary refs log tree commit homepage
path: root/lib/PublicInbox/NNTP.pm
DateCommit message (Collapse)
2019-06-24http|nntp: favor "$! == EFOO" over $!{EFOO} checks
Integer comparisions of "$!" are faster than hash lookups. See commit 6fa2b29fcd0477d126ebb7db7f97b334f74bbcbc ("ds: cleanup Errno imports and favor constant comparisons") for benchmarks.
2019-06-24ds: get rid of event_watch field
We don't need to keep track of that field since we always know what events we're interested in when using one-shot wakeups.
2019-06-24ds: set event flags directly at initialization
We can avoid the EPOLL_CTL_ADD && EPOLL_CTL_MOD sequence with a single EPOLL_CTL_ADD.
2019-06-24ds: share send(..., MSG_MORE) logic
No sense in having similar Linux-specific functionality in both our NNTP.pm and HTTP.pm
2019-06-24ds: lazy-initialize wbuf
We don't need write buffering unless we encounter slow clients requesting large responses. So don't waste a hash slot or (empty) arrayref for it.
2019-06-24ds: use and export monotonic now()
All of our internal timing code should use monotonic clocks for consistency against system clock adjustments. This can be shared by our Daemon and NNTP packages.
2019-06-24ds: get rid of {closed} field
Merely checking the presence of the {sock} field is enough, and having multiple sources of truth increases confusion and the likelyhood of bugs.
2019-06-16ds: stop distinguishing event read and write callbacks
Having separate read/write callbacks in every class is too confusing to my easily-confused mind. Instead, give every class an "event_step" callback which is easier to wrap my head around. This will make future code to support IO::Socket::SSL-wrapped sockets easier-to-digest, since SSL_write() can require waiting on POLLIN events, and SSL_read() can require waiting on POLLOUT events.
2019-06-15Merge remote-tracking branch 'origin/ds'
* origin/ds: ds: stop caring about event flags set by epoll/poll/kqueue ds: do not distinguish between POLLHUP and POLLERR ds: remove read method, here, too nntp: use sysread to append to existing buffer ds: remove steal_socket method ds: remove {fd} field ds: reduce Errno imports and drop ->close reason ds: cleanup Errno imports and favor constant comparisons ds: simplify write buffer accounting
2019-06-14nntp: filter out duplicate Message-IDs for leafnode
It's the unfortunate reality that there are some clients which reuse Message-IDs (in which we generate + use another) or set multiple Message-IDs on their own. While the v2 format addresses that, NNTP clients such as leafnode are not always prepared to deal with that case. So, ensure NNTP clients only see a single Message-ID, and show the others as 'X-Alt-Message-ID'.
2019-06-13nntp: ensure Message-ID is not folded for leafnode
Leafnode cannot handle Message-ID headers which are too long and require folding via Email::Simple::Header. Since there are already many of these messages in git with the header already folded, we need to handle the unfolding when emitting the message via NNTP. As far as we know, Leafnode is the only client software incapable of handling this case.
2019-06-13nntp: add Path: header for leafnode
Apparently leafnode just needs any junk in the Path: header. Lets not waste bandwidth and just use a single byte to keep leafnode happy. Cc: Dave Taht <dave@taht.net>
2019-06-10ds: do not distinguish between POLLHUP and POLLERR
In my experience, both are worthless as any normal read/write call path will be wanting to check errors and deal with them appropriately; so we can just call event_read, for now. Eventually, there'll probably be only one callback for dealing with all in/out/err/hup events to simplify logic, especially w.r.t TLS socket negotiation.
2019-06-10nntp: use sysread to append to existing buffer
We already do this in PublicInbox::HTTP, as it's superior to DS::read in this regard. Initially (when I started writing NNTP.pm, I wanted to use Danga::Socket's read buffering and push_back_read (removed in DS) but quickly figured out it wasn't useful at all for dealing with trickling clients.
2019-06-10ds: remove {fd} field
Storing the file descriptor was redundant as we can quickly call fileno($self->{sock}) and not have to store an extra hash table entry. Multiple sources of truth leads to confusion, confusion leads to bugs.
2019-06-10ds: simplify write buffer accounting
Keeping track of write_buf_size was redundant and pointless when we can simply check the number of elements in the buffer array. Multiple sources of truth leads to confusion; confusion leads to bugs. Finally, rename the prefixes to 'wbuf' to ensure we loudly (instead of silently) break any external dependencies being ported over from Danga::Socket, as further changes are pending.
2019-06-04nntp: ensure we only handle ASCII whitespace
RFC3977 does not have provisions for whitespace beyond ASCII TAB, SP, CR and LF. I doubt there's any NNTP clients broken enough to be sending non-ASCII whitespace delimiters. We're probably excessively liberal regarding TAB acceptance, even; but it's probably too late to change at this point...
2019-06-04nntp: be explicit about ASCII digit matches
We aren't able to make sense of non-ASCII digits cf. perlrecharclass(1) / "Digits" section
2019-05-15nntp: use Inbox->over directly
None of the NNTP code actually relies on Xapian, anymore.
2019-05-08Merge remote-tracking branch 'origin/danga-bundle'
* origin/danga-bundle: DS: epoll: fix misordered EPOLL_CTL_DEL call DS: drop unused "_undef" sub syscall: drop readahead wrapper build: do not manify DS and Syscall pods DS: handle EINTR in IO::Poll path, too DS: workaround IO::Kqueue EINTR (mis-)handling DS: drop profiling support DS: remove unused fields and functions listener: use EPOLLEXCLUSIVE for listen sockets bundle Danga::Socket and Sys::Syscall
2019-05-08nntp: avoid uninitialized variable from blank requests
We'll ignore blank lines from clients, since that's what innd seems to do.
2019-05-04bundle Danga::Socket and Sys::Syscall
These modules are unmaintained upstream at the moment, but I'll be able to help with the intended maintainer once/if CPAN ownership is transferred. OTOH, we've been waiting for that transfer for several years, now... Changes I intend to make: * EPOLLEXCLUSIVE for Linux * remove unused fields wasting memory * kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615 * accept4 support And some lower priority experiments: * switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes) * nginx-style buffering to tmpfile instead of string array * sendfile off tmpfile buffers * io_uring maybe?
2019-02-07nntp: get rid of long_response_limit constant
Unused since commit b8c41362f2a5c8fcc6b1846a79c72bfa77565297 ("nntp: simplify the long_response API")
2019-01-16nntp: header responses use CRLF consistently
Alpine is apparently stricter than other clients I've tried w.r.t. using CRLF for headers. So do the same thing we do for bodies to ensure we only emit CRLFs and no bare LFs. Reported-by: Wang Kang <i@scateu.me> https://public-inbox.org/meta/alpine.DEB.2.21.99.1901161043430.29788@la.scateu.me/
2019-01-08nntp: fix uninitialized variable in event_read
do_write must return 0 or 1.
2018-12-06nntp: prevent event_read from firing twice in a row
When a client starts pipelining requests to us which trigger long responses, we need to keep socket readiness checks disabled and only enable them when our socket rbuf is drained. Failure to do this caused aborted clients with "BUG: nested long response" when Danga::Socket calls event_read for read-readiness after our "next_tick" sub fires in the same event loop iteration. Reported-by: Jonathan Corbet <corbet@lwn.net> cf. https://public-inbox.org/meta/20181013124658.23b9f9d2@lwn.net/
2018-10-16Add Xrefs to over/xover lines
Putting the Xref field into xover lines allows newsreaders to mark cross-posted messages read when catching up a group. That, in turn, massively improves the life of crazy people who try to follow dozens of kernel lists, where emails are often heavily cross-posted.
2018-10-16Put the NNTP server name into Xref lines
RFC 5536 sec 3.2.14 says that the server-name in an Xref line is "which news server generated the header field"; indeed, that is necessary for newsreaders like gnus to handle references properly. So pick up the server name from the config if available (the first name if there's more than one), from the host name otherwise, and use it rather than the domain name of the list server. Tests have been adjusted to match the new behavior.
2018-04-18Merge remote-tracking branch 'origin/master' into v2
* origin/master: nntp: allow and ignore empty commands mbox: do not barf on queries which return no results nntp: fix NEWNEWS command searchview: fix non-numeric comparison Allow specification of the number of search results to return githttpbackend: avoid infinite loop on generic PSGI servers http: fix modification of read-only value extmsg: use news.gmane.org for Message-ID lookups extmsg: rework partial MID matching to favor current inbox Update the installation instructions with Fedora package names nntp: do not drain rbuf if there is a command pending nntp: improve fairness during XOVER and similar commands searchidx: do not modify Xapian DB while iterating Don't use LIMIT in UPDATE statements
2018-04-18nntp: allow and ignore empty commands
Somebody hitting "\n" into telnet shouldn't hold a client up indefinitely and prevent shutdown.
2018-04-07store less data in the Xapian document
Since we only query the SQLite over DB for OVER/XOVER; do not need to waste space storing fields To/Cc/:bytes/:lines or the XNUM term. We only use From/Subject/References/Message-ID/:blob in various places of the PSGI code. For reindexing, we will take advantage of docid stability in "xapian-compact --no-renumber" to ensure duplicates do not show up in search results. Since the PSGI interface is the only consumer of Xapian at the moment, it has no need to search based on NNTP article number.
2018-04-06nntp: set Xref across multiple inboxes
Noted by Jonathan Corbet in https://lwn.net/Articles/748184/
2018-04-03nntp: simplify the long_response API
We we worked around the default range/termination conditions of long_response in many cases to reduce calls to SQLite or Xapian. So continue that trend and become more like the PSGI API which doesn't force callers to specify an article range or work inside a loop.
2018-04-03msgmap: replace id_batch with ids_after
id_batch had a an overly complicated interface, replace it with id_batch which is simpler and takes advantage of selectcol_arrayref in DBI. This allows simplification of callers and the diffstat agrees with me.
2018-04-03nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster
While SQLite is faster than Xapian for some queries we use, it sucks at handling OFFSET. Fortunately, we do not need offsets when retrieving sorted results and can bake it into the query. For inbox.comp.version-control.git (v1 Xapian), XOVER and XHDR are over 20x faster.
2018-04-03nntp: fix NEWNEWS command
I guess nobody uses this command (slrnpull does not), and the breakage was not noticed until I started writing new tests for multi-MID handling. Fixes: 3fc411c772a21d8f ("search: drop pointless range processors for Unix timestamp")
2018-04-02replace Xapian skeleton with SQLite overview DB
This ought to provide better performance and scalability which is less dependent on inbox size. Xapian does not seem optimized for some queries used by the WWW homepage, Atom feeds, XOVER and NEWNEWS NNTP commands. This can actually make Xapian optional for NNTP usage, and allow more functionality to work without Xapian installed. Indexing performance was extremely bad at first, but DBI::Profile helped me optimize away problematic queries.
2018-03-14nntp: do not drain rbuf if there is a command pending
Some clients pipeline requests aggressively (enough to match LINE_MAX) and we should not read from the client socket until we know there's no pending command in our read buffer. Reported-and-tested-by: Sergey Organov <sorganov@gmail.com>
2018-03-07nntp: improve fairness during XOVER and similar commands
For other commands generating long responses, we generally want to yield to another client after emitting 100 . However, XOVER-based responses already query 200 lines worth of responses at a time, so we were sending 20000 lines before yielding to other clients. This may help avoid timeouts for some clients.
2018-03-03nntp: fix NEWNEWS command
I guess nobody uses this command (slrnpull does not), and the breakage was not noticed until I started writing new tests for multi-MID handling. Fixes: 3fc411c772a21d8f ("search: drop pointless range processors for Unix timestamp")
2018-03-03nntp: use NNTP article numbers for lookups
Since Message-IDs are no longer unique within Xapian (but are within the SQLite Msgmap); favor NNTP article numbers for internal lookups. This will prevent us from finding the "wrong" internal Message-ID.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2016-12-13nntp: avoid useless use of strftime
There's no need to use strftime if we'll be converting the date by hand, anyways.
2016-09-09nntp: cleanup: move use statements out of sub scope
This clarifies the code somewhat, and we don't care to lazy-load in NNTP.pm anyways since this is only used for a long-lived daemon.
2016-08-14www: do not unecessarily escape some chars in paths
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-07-27localize $/ when using chomp
Callers may have localized $/ to something else, so make sure we chomp the expected character(s) when calling chomp.
2016-07-09nntp: return if a client drops on us
Danga::Socket::write will set the closed flag on a socket, automatically, and we do not need to bring down an entire server when one client breaks the connection :P
2016-07-07inbox: cleanup and consolidate object weakening
This fixes some layering violations and consolidates the cleanup into the inbox object itself. Keeping in mind weakening does not work at all without our PSGI server.
2016-07-02nntp: respect 3 minute idle time for shutdown
This avoids breaking clients on graceful shutdown since NNTP responses should usually be quick.
2016-07-02nntp: simplify update_idle_time
This ought to make things easier when we add TLS support.