From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id ABF021F453; Tue, 16 Oct 2018 06:36:20 +0000 (UTC) Date: Tue, 16 Oct 2018 06:36:20 +0000 From: Eric Wong To: Jonathan Corbet Cc: meta@public-inbox.org Subject: Re: Race condition in public-inbox-nntpd? Message-ID: <20181016063620.bm34ts45yp5irqmh@untitled> References: <20181013124658.23b9f9d2@lwn.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20181013124658.23b9f9d2@lwn.net> List-Id: Jonathan Corbet wrote: > So for a little while, I've been seeing occasional NNTP weirdness, > associated with log lines like: > > Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: > XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. > > Such complaints tend to be immediately followed by significant > disgruntlement on the client side. Do you have any logs of client commands which hit this? public-inbox-nntpd should log commands + timings to stdout. > I use gnus to read the various NNTP feeds, and I mark articles (with "u") > when I want to keep them around; I typically have quite a few of them > marked at any given time in a group like linux-kernel. When I open the > group in gnus, it will do an XOVER on each of those marked articles, > generating dozens of single-article XOVERs in quick succession. It's > always the single-article ones that fail; the big XOVER for all of the new > stuff at the end works just fine. > > Another user has complained that things fail with Thunderbird, again with > the same symptoms on the server side. No idea how to use either gnus or tbird, here; so a command sequence which can be fed into nc or socat would be helpful in reproducing the problem. Any relation to group size in reproducing this? > I have "fixed" the problem with this little patch: > That makes things work, but it is clearly papering over the real > problem. Agreed. > I've spent a fair while staring at the code. As far as I can tell, the > logic there should be sufficient to prevent this from happening; it's not > supposed to be reading while a long response is in the works. But somehow > it happens. I stared at it some, too; but I'm not seeing it right now, either; but it's been a tiring few weeks for me so I'm not at my sharpest. Thanks for bringing this to everyones' attention.