From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS63949 45.79.64.0/19 X-Spam-Status: No, score=-3.0 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from ms.lwn.net (ms.lwn.net [45.79.88.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 5EB9D1F97F for ; Sat, 13 Oct 2018 18:47:00 +0000 (UTC) Received: from lwn.net (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ms.lwn.net (Postfix) with ESMTPSA id 96AFC2D6; Sat, 13 Oct 2018 18:46:59 +0000 (UTC) Date: Sat, 13 Oct 2018 12:46:58 -0600 From: Jonathan Corbet To: meta@public-inbox.org Cc: Eric Wong Subject: Race condition in public-inbox-nntpd? Message-ID: <20181013124658.23b9f9d2@lwn.net> Organization: LWN.net MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit List-Id: So for a little while, I've been seeing occasional NNTP weirdness, associated with log lines like: Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. Such complaints tend to be immediately followed by significant disgruntlement on the client side. I use gnus to read the various NNTP feeds, and I mark articles (with "u") when I want to keep them around; I typically have quite a few of them marked at any given time in a group like linux-kernel. When I open the group in gnus, it will do an XOVER on each of those marked articles, generating dozens of single-article XOVERs in quick succession. It's always the single-article ones that fail; the big XOVER for all of the new stuff at the end works just fine. Another user has complained that things fail with Thunderbird, again with the same symptoms on the server side. I have "fixed" the problem with this little patch: diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm index 022bb80..017ad88 100644 --- a/lib/PublicInbox/NNTP.pm +++ b/lib/PublicInbox/NNTP.pm @@ -951,6 +951,10 @@ sub event_read { $self->{rbuf} .= $$buf; } my $r = 1; + if ($self->{long_res}) { + err($self, "long-res on event read"); + $r = 0; + } while ($r > 0 && $self->{rbuf} =~ s/\A\s*([^\r\n]*)\r?\n//) { my $line = $1; return $self->close if $line =~ /[[:cntrl:]]/s; That makes things work, but it is clearly papering over the real problem. I've spent a fair while staring at the code. As far as I can tell, the logic there should be sufficient to prevent this from happening; it's not supposed to be reading while a long response is in the works. But somehow it happens. Does anybody have any thoughts on how this could be coming about? Thanks, jon