about summary refs log tree commit homepage
path: root/lib/PublicInbox/NNTP.pm
DateCommit message (Collapse)
2023-10-11treewide: consolidate "From " line removal
Aside from our prior import bugs (fixed in a0c07cba0e5d8b6a (mda: drop leading "From " lines again, 2016-06-26)), we'll always have to be dealing with mutt piping messages to us and `git format-patch' output. So just share the regexp so we can use it everywhere. In may be desirable to allow importing messages with a leading "From " line for FUSE, even. Additionally, some instances of this regexp needlessly added optional `\r?' (CR) checks ahead of the `\n' (LF) element; but they're pointless anyways since [^\n]* is enough to exclude all non-LF bytes.
2023-05-02daemon: improve handling of Git->async_abort
The $oid arg for Git->cat_async is defined on async_abort using the original request, so use undefined $type to distinguish that case in caller-supplied callbacks. async_abort isn't common, of course, but sometimes git subprocesses can die unexpectedly.
2023-01-30use Net::SSLeay (OpenSSL) for SHA-(1|256) if installed
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as the Digest::SHA implementation from Perl, most likely due to an optimized assembly implementation. SHA-1 is a few percent faster, too.
2022-11-28nntpd: fix LISTGROUP with range
This reverts 0c62cffc2389 ("nntp: listgroup_range_i: remove useless `map' op") and adds a test that demonstrates the breakage: the server returns lines like ARRAY(0x556dace73f08) instead of message numbers. Fixes: 0c62cffc2389 ("nntp: listgroup_range_i: remove useless `map' op")
2022-08-10daemon: rely on $SIG{__WARN__} for error output
warn/carp usage is unavoidable given Perl itself and standard libraries, so just rely on localized $SIG{__WARN__} from 60d262483a4d6ddf (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08) for all error reporting. While we're in the area, make some of the error handling more consistent between IMAP/NNTP/POP3.
2022-08-09daemon: use per-listener SIG{__WARN__} callbacks
This allows "-l $ADDRESS?err=/path/to/err.log to isolate normal warn() (and carp()) messages for a particular listen address to track down errors more easily.
2022-08-04nntp: speed up group listings via ->ALL->misc
By taking advantage of the new ART_MIN/ART_MAX value in MiscIdx, we can avoid the overhead of opening per-inbox msgmap DB files. The result gives us a ~40 speedup with 50K newgroups.
2022-08-03daemon: reload TLS certs and keys on SIGHUP
This allows new TLS certificates to be loaded for new clients without having to timeout nor drop existing clients with established connections made with the old certs. This should benefit users with admins who expire certificates frequently (as encouraged by Let's Encrypt).
2022-08-03ds: use ->dflush to distinguish from ->zflush
->zflush is already for GzipFilter in PublicInbox::WWW, while we use DEFLATE for NNTP and IMAP. This ought to make the code easier-to-follow.
2022-07-23nntp: use substr to check for trailing CRLF
Regexps consume more CPU cycles and memory, and aren't necessary here since we just converted the entire buffer to CRLF.
2022-07-23imap+nntp: share COMPRESS implementation
Their code was nearly identical to begin with, so save some memory in -netd and disk space for all of our tarball/distro users, at least. And I seem to have used multiple inheritance successfully, here, maybe...
2022-07-23nntp: resolve inboxes immediately on group listings
This prevents potential races between SIGHUP config reloads while gigantic group listings are streaming, allowing us to avoid many invalidation checks. This also reduces send(2) syscalls and avoid Perl internal pad allocations in a few places where it's not beneficial. There might be a slight (0.5%) speedup, but I'm not sure if that's down to system noise, power/thermal management, or other users on my VM.
2022-07-23ds: share long_step between NNTP and IMAP
It's not actually used by our POP3 code at the moment, but it may be soon to reduce memory usage when loading 50K smsg objects into memory.
2022-07-23nntp: inline CRLF in all response lines
This brings NNTP closer to POP3 and IMAP implementations to allow CoW avoidance on constants.
2022-07-23nntp: listgroup_range_i: remove useless `map' op
No need to iterate through the array twice; and this even seems a hair faster than what I got with commit 726d6e71aee5d974 (nntp: small speed up for multi-line responses, 2020-12-04)
2022-07-23ds: move requeue_once
It's the same subroutine everywhere.
2022-07-23ds: move no-op ->zflush to common base class
More deduplication, and POP3 never needed it.
2022-07-23ds: support greeting protocols
We can share some common code between IMAP, NNTP, and POP3 without too much trouble, so cut down our LoC.
2022-07-23nntp: remove more() wrapper
Using PublicInbox::DS->msg_more directly can avoid unnecessary CoW memory traffic since there's no appending "\r\n".
2022-07-23nntp: start adding CRLF to responses natively
With IMAP and POP3, I've started to embed CRLF into constant response codes to avoid triggering CoW and extra memory traffic in Perl. The end goal is to enable more code sharing between IMAP, NNTP, and POP3 inside one -netd process.
2022-07-23nntp: pass regexp to split() callers
Current implementations of Perl5 don't have optimizations for single-character field separators.
2021-10-16imapd+nntpd: drop timer-based expiration
It's needlessly complex and O(n), so it doesn't scale well to a high number of clients nor is it easy-to-scale with the data structures available to us in pure Perl. In any case, I see no evidence of either -imapd nor -nntpd experiencing high connection loads on public-facing sites. -httpd has never had its own timer-based expiration, either. Fwiw, public-inbox.org itself has been running a public-facing HTTP/HTTPS server with no userspace idle client expiration for the past 8 years or with no ill effect. Clients can come and go as they wish, and SO_KEEPALIVE takes care of truly broken connections if they're gone for ~2 hours. Internet connections drop all time, so it should be harmless to drop connections w/o warning since both NNTP and IMAP protocols have well-defined semantics for determining if a message was truncated (as does HTTP/1.1+).
2021-10-12nntp: use defined-OR from Perl 5.10 for msgid check
"<0>" could be a valid Message-ID, maybe...
2021-08-28move ->ids_after from mm to over
Since we favor ->over in WWW and IMAP, move this method to ->over to reduce open files in common cases. This fixes the /$EXTINDEX_NAME/all.mbox.gz endpoint for extindex entries (which may get expensive...).
2021-08-28get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-08-25imap+nntp: die loudly if ->mm or ->over disappear
While the WWW front-end can gracefully handle ->mm and ->over disappearing (in most cases), IMAP+NNTP front-ends are completely dependent on these and failed mysteriously when they go missing after startup. These will hopefully make issues like what Konstantin encountered more obvious: Link: https://public-inbox.org/meta/20210824204855.ejspej4z7r2rpu63@nitro.local/
2021-06-24favor git(1) rather than libgit2 for ExtSearch
While both git and libgit2 take around 16 minutes to load 100K alternates there's already a proposed patch to make git faster: <https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/> It's also easier to patch and install git locally since the git.git build system defaults to prefix=$HOME and dealing with dynamic linking with libgit2 is more difficult for end users relying on Inline::C. libgit2 remains in use for the non-ALL.git case, but maybe it's not necessary (libgit2 is significantly slower than git in Debian 10 due to SHA-1 collision checking).
2021-03-16nntp: remove unused header_append method
It was unused since 1bf653ad139bf7bb3d853ab0b5eae3eaa1b13a95 ("nntp+www: drop List-* and Archived-At headers")
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-11nntp+www: drop List-* and Archived-At headers
These headers can conflict with headers in the DKIM signature; and parsing the DKIM-Signature header to determine whether or not we can safely add a header would be more code and CPU cycles. Since IMAP seems fine without these headers (and JMAP will likely be, too), there's likely no need to continue appending these to every message. Nowadays, developers seem sufficiently trained to use URLs with Message-IDs in them. So drop the headers and save some cycles and bandwidth all around.
2020-12-10www+nntp: deal with lack of addresses for ->ALL
Since extindex is an amalgamation of several inboxes, discerning an appropriate address for List-Post: would be expensive and most likely unnecessary. Some legacy/historical inboxes may have no active address, either, so don't attempt to set the List-Post header if no addresses are configured.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-12-09nntp: replace {ng} with {ibx} for consistency
They're PublicInbox::Inbox objects just like the rest of the non-NNTP code. So rename the NNTP code for consistency with the rest of the codebase. Furthermore, {ng} and $ng may be confused with the `--ng' switch for -init, and that's a non-ref scalar string.
2020-12-05nntp: small speed up for multi-line responses
Using a non-zero-length separator for `join' requires extra work inside Perl. We can shove the cost of appending "\r\n" into the `map' loop, instead. This speeds up the `join' operation. The "deferred" log entry for a "LISTGROUP org.kernel.vger.linux-kernel" command (with nearly 3.8 million messages) goes from ~3.96s to 3.86s on my workstation.
2020-12-05nntp: xref_by_tc: simplify slightly
We can invalidate ibx->{newsgroup} at config load-time to avoid having to check ibx->{newsgroup} validity in To/Cc: matching. This saves us some hash lookups in all cases.
2020-12-01nntp: make ->ALL Xref generation more fuzzy
For ->ALL users, this mitigates the regression introduced by commit 811b8d3cbaa790f59b7b107140b86248da16499b ("nntp: xref: use ->ALL extindex if available"), since it's common to cross post messages to some mailing lists with per-list trailers for unsubscribe information. We won't bother dealing with Bcc-ed messages since those are nearly all spam when it comes to public mailing lists. Fixes: 811b8d3cbaa790f5 ("nntp: xref: use ->ALL extindex if available") Link: https://public-inbox.org/meta/20201130194201.GA6687@dcvr/
2020-11-29nntpd: remove redundant {groups} shortcut
It's not worth confusing hackers reading the source to have two ways to access the same (large) hash table. So just go through PublicInbox::Config objects for now since the extra hash lookup isn't going to be noticeable. I've also started favoring "for" instead of "foreach" since they're the equivalent perlop and less wear on my fingers + keyboard.
2020-11-29nntp: XPATH uses ->ALL extindex, too
Another 30-40% speedup when testing against a local lore.kernel.org mirror. In either case, we'll consistently sort the response for ease-of-testing and client-side cache-friendliness.
2020-11-29nntp: art_lookup: use mid_lookup and simplify
This lets us take advantage of mid_lookup speedup from the previous commit. While we're at it, start moving towards using `$ibx' as the abbreviation for PublicInbox::Inbox objects even in the NNTP code, since they've been shared with the WWW code for several years, now.
2020-11-29nntp: speed up mid_lookup() using ->ALL extindex
We can reuse "xref3" information in extindex to quickly match messages matching a given Message-ID across hundreds or thousands of newsgroups with a few SQL statements. "XHDR Xref $MESSAGE_ID" is around 40% faster, on top of previous speedups.
2020-11-29nntp: NEWGROUPS uses long_response
We can amortize the cost of NEWGROUPS time filtering using the long_response API. This lets us handle hundreds/thousands of inboxes without monopolizing the event loop for this command. Further speedup is possible using MiscSearch, but that requires not-yet-done indexing changes to MiscIdx.
2020-11-28nntp: xref: use ->ALL extindex if available
Getting Xref for cross-posted messages is an O(n) operation where `n' is the number of newsgroups on the server. This works acceptably when there are dozens of groups, but would be unnacceptable when there's tens of thousands of newsgroups. With ~140 newsgroups, a lore.kernel.org mirror already handles "XHDR Xref $MESSAGE_ID" requests around 30% faster after creating the xref3.idx_nntp index. The SQL additions to ExtSearch.pm may be a bit strange and seem more appropriate for Over.pm; however it currently makes sense to me since those bits of over.sqlite3 access are exclusive to ExtSearch and can't be used by traditional v1/v2 inboxes...
2020-11-28nntp: xref: simplify sub signature
We'll be using the `xref3' table in extindex to speed up xref(), and that'll require comparisons against $smsg->{blob}. So pass the entire $smsg through.
2020-11-28nntp: some minor golfing
Reduce screen real estate usage to reduce human attention span requirements.
2020-11-28nntp: move LIST iterators to long_response
Iterating through many newsgroups can hog the event loop if many random seeks are required. Avoid monopolizing the event loop in that case by using the long_response API. For now, we can still rely on grep() since it seems to work reasonably well with 50K test newsgroup names.
2020-11-28nntp: LIST ACTIVE.TIMES use angle brackets around address
This matches the example shown in RFC 3977, section 7.6.1.3
2020-11-28nntp: NEWNEWS: speed up filtering
With 50K newsgroups, the filtering phase goes from ~2000 seconds to ~90 MILLISECONDS by relying on the grep perlop. This moves ->over checking out of the main dispatch and amortizes the cost via long_response. (Fairly scheduled) long_response time in newnews_i now takes ~360 seconds as opposed to ~30 seconds before this change, however; but the initial filtering speedup eliminating 2000s is more than worth it.
2020-11-28nntp: use grep operation for wildmat matching
Based on experiences with the IMAP server, this ought to be significantly faster (as to be demonstrated in the next commit).
2020-11-28mm: min/max: return 0 instead of undef
This simplifies callers and allows empty newsgroups to be represented (the WWW UI may be insufficient there, too).
2020-11-28nntp: use Inbox->uidvalidity instead of ->mm->created_at
This is memoized, and may allow us some future flexibility w.r.t PublicInbox::Inbox-like objects. While we're at it, use defined-or ("//") in case somebody really set a public-inbox creation time to the Unix epoch.