user/dev discussion of public-inbox itself
 help / Atom feed
* Race condition in public-inbox-nntpd?
@ 2018-10-13 18:46 Jonathan Corbet
  2018-10-16  6:36 ` Eric Wong
  2018-12-06  3:59 ` [PATCH] nntp: prevent event_read from firing twice in a row Eric Wong
  0 siblings, 2 replies; 4+ messages in thread
From: Jonathan Corbet @ 2018-10-13 18:46 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

So for a little while, I've been seeing occasional NNTP weirdness,
associated with log lines like:

  Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from:
  XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588.

Such complaints tend to be immediately followed by significant
disgruntlement on the client side.

I use gnus to read the various NNTP feeds, and I mark articles (with "u")
when I want to keep them around; I typically have quite a few of them
marked at any given time in a group like linux-kernel.  When I open the
group in gnus, it will do an XOVER on each of those marked articles,
generating dozens of single-article XOVERs in quick succession.  It's
always the single-article ones that fail; the big XOVER for all of the new
stuff at the end works just fine.

Another user has complained that things fail with Thunderbird, again with
the same symptoms on the server side.

I have "fixed" the problem with this little patch:

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 022bb80..017ad88 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -951,6 +951,10 @@ sub event_read {
 		$self->{rbuf} .= $$buf;
 	}
 	my $r = 1;
+	if ($self->{long_res}) {
+		err($self, "long-res on event read");
+		$r = 0;
+	}
 	while ($r > 0 && $self->{rbuf} =~ s/\A\s*([^\r\n]*)\r?\n//) {
 		my $line = $1;
 		return $self->close if $line =~ /[[:cntrl:]]/s;

That makes things work, but it is clearly papering over the real
problem.

I've spent a fair while staring at the code.  As far as I can tell, the
logic there should be sufficient to prevent this from happening; it's not
supposed to be reading while a long response is in the works.  But somehow
it happens.

Does anybody have any thoughts on how this could be coming about?

Thanks,

jon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Race condition in public-inbox-nntpd?
  2018-10-13 18:46 Race condition in public-inbox-nntpd? Jonathan Corbet
@ 2018-10-16  6:36 ` Eric Wong
  2018-10-16 14:04   ` Jonathan Corbet
  2018-12-06  3:59 ` [PATCH] nntp: prevent event_read from firing twice in a row Eric Wong
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Wong @ 2018-10-16  6:36 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> So for a little while, I've been seeing occasional NNTP weirdness,
> associated with log lines like:
> 
>   Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from:
>   XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588.
> 
> Such complaints tend to be immediately followed by significant
> disgruntlement on the client side.

Do you have any logs of client commands which hit this?

public-inbox-nntpd should log commands + timings to stdout.

> I use gnus to read the various NNTP feeds, and I mark articles (with "u")
> when I want to keep them around; I typically have quite a few of them
> marked at any given time in a group like linux-kernel.  When I open the
> group in gnus, it will do an XOVER on each of those marked articles,
> generating dozens of single-article XOVERs in quick succession.  It's
> always the single-article ones that fail; the big XOVER for all of the new
> stuff at the end works just fine.
> 
> Another user has complained that things fail with Thunderbird, again with
> the same symptoms on the server side.

No idea how to use either gnus or tbird, here; so a command
sequence which can be fed into nc or socat would be helpful in
reproducing the problem.

Any relation to group size in reproducing this?

> I have "fixed" the problem with this little patch:

<snip>

> That makes things work, but it is clearly papering over the real
> problem.

Agreed.

> I've spent a fair while staring at the code.  As far as I can tell, the
> logic there should be sufficient to prevent this from happening; it's not
> supposed to be reading while a long response is in the works.  But somehow
> it happens.

I stared at it some, too; but I'm not seeing it right now, either;
but it's been a tiring few weeks for me so I'm not at my sharpest.

Thanks for bringing this to everyones' attention.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Race condition in public-inbox-nntpd?
  2018-10-16  6:36 ` Eric Wong
@ 2018-10-16 14:04   ` Jonathan Corbet
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Corbet @ 2018-10-16 14:04 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Tue, 16 Oct 2018 06:36:20 +0000
Eric Wong <e@80x24.org> wrote:

> Jonathan Corbet <corbet@lwn.net> wrote:
> > So for a little while, I've been seeing occasional NNTP weirdness,
> > associated with log lines like:
> > 
> >   Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from:
> >   XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588.
> > 
> > Such complaints tend to be immediately followed by significant
> > disgruntlement on the client side.  
> 
> Do you have any logs of client commands which hit this?

I'll attach a snippet.  The pattern always looks about the same - a lot of
XOVER commands; at some point they start failing.

> Any relation to group size in reproducing this?

It tends to happen on big groups, but the real common factor seems to be a
lot of saved messages, resulting in a lot of single-article XOVER commands
in rapid sequence.

> I stared at it some, too; but I'm not seeing it right now, either;
> but it's been a tiring few weeks for me so I'm not at my sharpest.

I can relate to that!

Thanks,

jon

Oct 13 18:01:04 archive.lwn.net public-inbox-nntpd[12352]: [6] 1 GROUP lwn.kernel.lkml - 0.001696
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 153433-153447 - 0.087349 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 158027-158032 - 0.009294 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 181909-181914 - 0.009046 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 182996-183003 - 0.010729 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 187687-187687 - 0.001941 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188205-188205 - 0.003165 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188351-188351 - 0.002194 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188571-188573 - 0.005668 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188580-188580 - 0.001533 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188680-188684 - 0.005570 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188691-188697 - 0.008083 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188792-188792 - 0.003481 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188798-188798 - 0.001609 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188808-188808 - 0.001487 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188818-188818 - 0.002147 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188841-188841 - 0.001679 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189272-189272 - 0.001514 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189524-189528 - 0.008616 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189547-189547 - 0.001627 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189736-189736 - 0.002618 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189868-189877 - 0.012032 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189950-189952 - 0.004263 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190022-190022 - 0.001705 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190123-190123 - 0.001533 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190207-190207 - 0.001467 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190224-190224 - 0.002197 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190452-190452 - 0.001644 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190603-190603 - 0.001524 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190771-190773 - 0.003532 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191276-191276 - 0.001874 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191290-191290 - 0.001543 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191448-191448 - 0.001545 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191519-191519 - 0.001578 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192078-192078 - 0.001462 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192278-192278 - 0.001610 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192385-192385 - 0.001401 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192458-192458 - 0.001678 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192534-192534 - 0.001726 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 193216-193224 - 0.009579 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 193719-193719 - 0.001444 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 193736-193743 - 0.008615 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194152-194152 - 0.001519 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194261-194264 - 0.004747 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194371-194371 - 0.001522 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194390-194390 - 0.001397 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194840-194856 - 0.017661 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194991-195022 - 0.031130 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 196023-196023 - 0.001578 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: XOVER 196034-196034 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588.
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: )
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 1 XOVER 196034-196034 - 0.000694 deferred[6]
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: XOVER 196073-196073 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588.
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: )
Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 1 XOVER 196073-196073 - 0.000563 deferred[6]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] nntp: prevent event_read from firing twice in a row
  2018-10-13 18:46 Race condition in public-inbox-nntpd? Jonathan Corbet
  2018-10-16  6:36 ` Eric Wong
@ 2018-12-06  3:59 ` Eric Wong
  1 sibling, 0 replies; 4+ messages in thread
From: Eric Wong @ 2018-12-06  3:59 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> index 022bb80..017ad88 100644
> --- a/lib/PublicInbox/NNTP.pm
> +++ b/lib/PublicInbox/NNTP.pm
> @@ -951,6 +951,10 @@ sub event_read {
>  		$self->{rbuf} .= $$buf;
>  	}
>  	my $r = 1;
> +	if ($self->{long_res}) {
> +		err($self, "long-res on event read");
> +		$r = 0;
> +	}

It's actually fine, too; and I considered returning early from
event_read if long_res was set once I figured out the problem.

I would at least have made it exit the function before calling
$self->read a few lines above, however; otherwise rbuf might get
too huge if a client kept sending.

But my alternative patch below is fewer syscalls with epoll[*],
since it avoids the needless registration and immediate wakeup

---------8<---------
Subject: [PATCH] nntp: prevent event_read from firing twice in a row

When a client starts pipelining requests to us which trigger
long responses, we need to keep socket readiness checks disabled
and only enable them when our socket rbuf is drained.

Failure to do this caused aborted clients with
"BUG: nested long response" when Danga::Socket calls event_read
for read-readiness after our "next_tick" sub fires in the
same event loop iteration.

Reported-by: Jonathan Corbet <corbet@lwn.net>

cf. https://public-inbox.org/meta/20181013124658.23b9f9d2@lwn.net/
---
 lib/PublicInbox/NNTP.pm | 26 ++++++++++++++++++--------
 t/nntpd.t               | 21 +++++++++++++++++++++
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 022bb80..90a5a3a 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -51,8 +51,16 @@ ()
 		# before finishing reading:
 		if (my $long_cb = $nntp->{long_res}) {
 			$nntp->write($long_cb);
-		} elsif (&Danga::Socket::POLLIN & $nntp->{event_watch}) {
+		} else {
+			# pipelined request, we bypassed socket-readiness
+			# checks to get here:
 			event_read($nntp);
+
+			# maybe there's more pipelined data, or we'll have
+			# to register it for socket-readiness notifications
+			if (!$nntp->{long_res} && !$nntp->{closed}) {
+				check_read($nntp);
+			}
 		}
 	}
 }
@@ -609,7 +617,7 @@ ($$)
 				           now() - $t0);
 			} else {
 				update_idle_time($self);
-				$self->watch_read(1);
+				check_read($self);
 			}
 		} elsif ($more) { # $self->{write_buf_size}:
 			# no recursion, schedule another call ASAP
@@ -620,7 +628,7 @@ ($$)
 			$nextt ||= PublicInbox::EvCleanup::asap(*next_tick);
 		} else { # all done!
 			$self->{long_res} = undef;
-			$self->watch_read(1);
+			check_read($self);
 			res($self, '.');
 			out($self, " deferred[$fd] done - %0.6f", now() - $t0);
 		}
@@ -968,10 +976,9 @@ sub event_read {
 	update_idle_time($self);
 }
 
-sub watch_read {
-	my ($self, $bool) = @_;
-	my $rv = $self->SUPER::watch_read($bool);
-	if ($bool && index($self->{rbuf}, "\n") >= 0) {
+sub check_read {
+	my ($self) = @_;
+	if (index($self->{rbuf}, "\n") >= 0) {
 		# Force another read if there is a pipelined request.
 		# We don't know if the socket has anything for us to read,
 		# and we must double-check again by the time the timer fires
@@ -979,8 +986,11 @@ sub watch_read {
 		# another long response.
 		push @$nextq, $self;
 		$nextt ||= PublicInbox::EvCleanup::asap(*next_tick);
+	} else {
+		# no pipelined requests available, let the kernel know
+		# to wake us up if there's more
+		$self->watch_read(1); # Danga::Socket::watch_read
 	}
-	$rv;
 }
 
 sub not_idle_long ($$) {
diff --git a/t/nntpd.t b/t/nntpd.t
index 9c1d076..ffed437 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -252,6 +252,27 @@ EOF
 		ok($date <= $t1, 'valid date before stop');
 	}
 
+	# pipelined requests:
+	{
+		my $nreq = 90;
+		syswrite($s, "GROUP $group\r\n");
+		my $res = <$s>;
+		my $rdr = fork;
+		if ($rdr == 0) {
+			use POSIX qw(_exit);
+			for (1..$nreq) {
+				<$s> =~ /\A224 / or _exit(1);
+				<$s> =~ /\A1/ or _exit(2);
+				<$s> eq ".\r\n" or _exit(3);
+			}
+			_exit(0);
+		}
+		for (1..$nreq) {
+			syswrite($s, "XOVER 1\r\n");
+		}
+		is($rdr, waitpid($rdr, 0), 'reader done');
+		is($? >> 8, 0, 'no errors');
+	}
 	{
 		setsockopt($s, IPPROTO_TCP, TCP_NODELAY, 1);
 		syswrite($s, 'HDR List-id 1-');



[*] With my noisy system, the time it took to do all the git@vger
XOVER one-by-one with the below script went from ~3:58 to ~3:52
on my system by avoiding unnecessary watch_read (epoll_ctl) calls
and epoll_wait wakeups.

Quick-and-dirty Ruby script I wrote to trigger the problem
(Ruby since it's what I've been spending too much of my time on :x)
--------------------------------
#!/usr/bin/ruby
# frozen_string_literal: true
require 'socket'
require 'uri'
usage = "usage: #$0 news://example.com/inbox.group.name"
uri = ARGV.shift or abort usage
uri = URI(uri)
uri.port ||= 119
group = uri.path.sub(%r{\A/+}, '') # String#delete_prefix requires Ruby 2.5+
s = Socket.tcp(uri.host, uri.port)
l = s.gets
l =~ /\A2\d\d / or abort "bad greeting: #{l}"

s.puts "GROUP #{group}"
l = s.gets
code, _, min, max = l.chomp!.split.map!(&:to_i)
code == 211 or abort "bad GROUP response: #{l}"
rdr = Thread.new do
  nres = 0
  while l = s.gets
    l.start_with?('205 ') and break # cmd_quit
    l.start_with?('224 ') or abort "bad OVER response: #{l}"
    while l = s.gets
      if l == ".\r\n"
        nres += 1
        break
      end
    end
  end
  nres
end

range = min..max
range.each { |i| s.puts "XOVER #{i}" }
puts "requests=#{range.size} #{Time.now}"
s.puts "QUIT"
puts "responses=#{rdr.value} #{Time.now}"

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-13 18:46 Race condition in public-inbox-nntpd? Jonathan Corbet
2018-10-16  6:36 ` Eric Wong
2018-10-16 14:04   ` Jonathan Corbet
2018-12-06  3:59 ` [PATCH] nntp: prevent event_read from firing twice in a row Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox