[ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* [ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
@ 2017-08-09 15:21 ` charlie
  2017-08-09 17:07   ` [ruby-core:82314] " Eric Wong
  2017-08-09 23:33   ` [ruby-core:82319] " Eric Wong
  2017-08-09 23:49 ` [ruby-core:82321] " charlie
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 22+ messages in thread
From: charlie @ 2017-08-09 15:21 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been reported by catphish (Charlie Smurthwaite).

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82314] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-08-09 15:21 ` [ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched_yield charlie
@ 2017-08-09 17:07   ` Eric Wong
  2017-08-09 23:33   ` [ruby-core:82319] " Eric Wong
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Wong @ 2017-08-09 17:07 UTC (permalink / raw
  To: ruby-core

Thanks for the report.  I'll take a look at it in a few hours.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82319] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-08-09 15:21 ` [ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched_yield charlie
  2017-08-09 17:07   ` [ruby-core:82314] " Eric Wong
@ 2017-08-09 23:33   ` Eric Wong
  2017-08-09 23:38     ` [ruby-core:82320] " Eric Wong
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Wong @ 2017-08-09 23:33 UTC (permalink / raw
  To: ruby-core

charlie@atech.media wrote:
> Bug #13794: Infinite loop of sched_yield
> https://bugs.ruby-lang.org/issues/13794
> ----------------------------------------
> I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663
> 
> while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
>   native_thread_yield();
> }
> 
> It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.
> 
> I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

Can you also check the value of timer_thread_pipe.owner_process?

> I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

That is a likely possibility.

> If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

How about checking owner_process before incrementing?
Can you try the following patch to check owner_process?

   https://80x24.org/spew/20170809232533.14932-1-e@80x24.org/raw

timer_thread_pipe.writing was introduced in August 2015 with r51576,
so this bug would definitely be my fault.

> Other examples of similar bugs being reported:
> https://github.com/resque/resque/issues/578
> https://github.com/zk-ruby/zk/issues/50

That also means these bugs from 2012 are from other causes.


Thanks again for this report.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82320] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-08-09 23:33   ` [ruby-core:82319] " Eric Wong
@ 2017-08-09 23:38     ` Eric Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Wong @ 2017-08-09 23:38 UTC (permalink / raw
  To: ruby-core

Eric Wong wrote:
>    https://80x24.org/spew/20170809232533.14932-1-e@80x24.org/raw

Also, a disclaimer: I've been far from my usual self lately.
Everything about my proposed patch could be wrong and bogus; too.

Extra eyes are, as always, very welcome.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82321] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
  2017-08-09 15:21 ` [ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched_yield charlie
@ 2017-08-09 23:49 ` charlie
  2017-08-28 23:51   ` [ruby-core:82495] " Eric Wong
  2017-08-09 23:56 ` [ruby-core:82322] " charlie
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: charlie @ 2017-08-09 23:49 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

> Can you also check the value of timer_thread_pipe.owner_process?

I don't have any broken processes available right now, but I will check as soon as I can.

> How about checking owner_process before incrementing?

I'm afraid this fix doesn't quite match up in my mind. To clarify, I am suggesting that timer_thread_pipe.writing is being incremented in the parent process before the fork occurs. This would still occur because the PID would match at that point.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66120

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82322] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
  2017-08-09 15:21 ` [ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched_yield charlie
  2017-08-09 23:49 ` [ruby-core:82321] " charlie
@ 2017-08-09 23:56 ` charlie
  2017-08-15 12:04 ` [ruby-core:82386] " charlie
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-08-09 23:56 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

>> How about checking owner_process before incrementing?

> I'm afraid this fix doesn't quite match up in my mind. To clarify, I am suggesting that timer_thread_pipe.writing is being incremented in the parent process before the fork occurs. This would still occur because the PID would match at that point.

Checking it before the while loop might work though.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66121

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82386] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2017-08-09 23:56 ` [ruby-core:82322] " charlie
@ 2017-08-15 12:04 ` charlie
  2017-08-17 11:57 ` [ruby-core:82414] " charlie
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-08-15 12:04 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

> Can you also check the value of timer_thread_pipe.owner_process?

(gdb) print timer_thread_pipe.writing
$1 = 1
(gdb) print timer_thread_pipe.owner_process
$2 = 0

(gdb) info threads
  Id   Target Id         Frame 
  2    Thread 0x7f1f98a2f700 (LWP 19597) "ruby-timer-thr" 0x00007f1f976e9c5d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x7f1f98a24740 (LWP 19595) "ruby" 0x00007f1f976c81d7 in sched_yield ()
    at ../sysdeps/unix/syscall-template.S:81

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66184

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82414] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2017-08-15 12:04 ` [ruby-core:82386] " charlie
@ 2017-08-17 11:57 ` charlie
  2017-08-17 15:07 ` [ruby-core:82417] " charlie
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-08-17 11:57 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

I am now testing the following patch:

~~~
diff --git a/thread_pthread.c b/thread_pthread.c
index 4aa2d620a2..fe99524a54 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -1685,6 +1685,7 @@ native_stop_timer_thread(void)
 static void
 native_reset_timer_thread(void)
 {
+    timer_thread_pipe.writing = 0;
     if (TT_DEBUG)  fprintf(stderr, "reset timer thread\n");
 }
~~~

I don't know if this is the correct way to solve the problem, but I will update this thread when I know if it is effective or not. Even if this is the wrong way to solve the issue it will serve to provide more information about it.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66214

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [ruby-core:82417] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2017-08-17 11:57 ` [ruby-core:82414] " charlie
@ 2017-08-17 15:07 ` charlie
  2017-08-17 15:08 ` [ruby-core:82418] " charlie
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-08-17 15:07 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

File sched_yield_1.patch added

The patch above does not work because native_reset_timer_thread runs after fork in the parent. Attached an alternative patch that runs in gvl_atfork and appears not to run in the parent. Again I do not believe this is necessarily the correct way to fix this but I am also looking for something that works for me i the meantime.

Comments appreciated.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66217

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (1.04 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82418] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2017-08-17 15:07 ` [ruby-core:82417] " charlie
@ 2017-08-17 15:08 ` charlie
  2017-08-23  0:04   ` [ruby-core:82452] " Eric Wong
  2017-08-29 10:17 ` [ruby-core:82510] " charlie
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: charlie @ 2017-08-17 15:08 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

File sched_yield_1.patch added

The patch above does not work because native_reset_timer_thread runs after fork in the parent. Attached an alternative patch that runs in gvl_atfork and appears not to run in the parent. Again I do not believe this is necessarily the correct way to fix this but I am also looking for something that works for me i the meantime.

Comments appreciated.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66219

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82452] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-08-17 15:08 ` [ruby-core:82418] " charlie
@ 2017-08-23  0:04   ` Eric Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Wong @ 2017-08-23  0:04 UTC (permalink / raw
  To: ruby-core

charlie@atech.media wrote:
> File sched_yield_1.patch added

> The patch above does not work because
> native_reset_timer_thread runs after fork in the parent.
> Attached an alternative patch that runs in gvl_atfork and
> appears not to run in the parent. Again I do not believe this
> is necessarily the correct way to fix this but I am also
> looking for something that works for me i the meantime.

Agreed that it may not necessarily be an optimal way to fix the
problem; but have you been able to reproduce the problem with
your patch applied?

> Comments appreciated.

Sorry, I've been too distracted this summer.  Will attempt to
concentrate on it, now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82495] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-08-09 23:49 ` [ruby-core:82321] " charlie
@ 2017-08-28 23:51   ` Eric Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Wong @ 2017-08-28 23:51 UTC (permalink / raw
  To: ruby-core

charlie@atech.media wrote:
> Eric Wong <normalperson@yhbt.net> wrote:
> > How about checking owner_process before incrementing?
> 
> I'm afraid this fix doesn't quite match up in my mind. To
> clarify, I am suggesting that timer_thread_pipe.writing is
> being incremented in the parent process before the fork
> occurs. This would still occur because the PID would match at
> that point.

I am now more sure that checking .owner_process before
incrementing is sufficient.  We zero .owner_process before
looping on the .writing check in native_stop_timer_thread, so if
sighandler fires after the .writing check loop in the parent; it
could be carried over to the child.  Merely checking
.owner_process as in my original patch is enough to stop a
non-zero .writing value from being carried over to a child.

A stronger version which zeroes .writing when re-initializing the
timer thread pipe would be:

  https://80x24.org/spew/20170828232657.GA22848@dcvr/raw

However, I think zero-ing .writing is unnecessary, so I stand
by my original patch:

  https://80x24.org/spew/20170809232533.14932-1-e@80x24.org/raw

Perhaps the following to ensure (.writing == 0) in
rb_thread_create_timer_thread will be useful, at least:

  https://80x24.org/spew/20170828234743.GA27737@dcvr/raw

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82510] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2017-08-17 15:08 ` [ruby-core:82418] " charlie
@ 2017-08-29 10:17 ` charlie
  2017-09-01 16:09 ` [ruby-core:82626] " charlie
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-08-29 10:17 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

Apologies for my delay in replying. I have not yet had an opportunity to fully test any of these patches (apart from my initial hack which did not work). I will aim to test your patch as soon as possible. Thank you for your assistance!

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66321

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:82626] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2017-08-29 10:17 ` [ruby-core:82510] " charlie
@ 2017-09-01 16:09 ` charlie
  2017-09-29 10:20 ` [ruby-core:83061] " greg.potamianos
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-09-01 16:09 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

Hi Eric,

I have been testing your original patch (just the PID check) for a couple of days and it appears to have resolved the problem. I will report on this again in 1 week as the issue occurs quite randomly but I am currently hopeful. Thank you very much again for looking into this.

Charlie

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66453

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:83061] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2017-09-01 16:09 ` [ruby-core:82626] " charlie
@ 2017-09-29 10:20 ` greg.potamianos
  2017-09-30  3:21 ` [ruby-core:83064] " xkernigh
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: greg.potamianos @ 2017-09-29 10:20 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by gregoryp (Gregory Potamianos).

File sched_yield_loop.rb added

Hi,

We are also affected by this bug, running Debian's ruby 2.3.3p222. We are encountering it with Exec resources of puppet agents timing out and leaving stray processes on busyloop. We have tried to reproduce it but not with much success. The attached simple fork/exec script seems to reproduce it sporadically if run like `while true; do nice -n19 ruby sched_yield_loop.rb; [ $? -ne 0 ] && break; done`. It eventually raises a timeout error and leaves a child behind spinning on sched_yield.

catphish (Charlie Smurthwaite) wrote:
> Hi Eric,
> 
> I have been testing your original patch (just the PID check) for a couple of days and it appears to have resolved the problem. I will report on this again in 1 week as the issue occurs quite randomly but I am currently hopeful. Thank you very much again for looking into this.
> 
> Charlie

We have been running ruby with this patch for 2 weeks now and it seems to solve the problem for us also. Is there any chance of this being merged/backported to ruby 2.3?

Regards,
Gregory

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66992

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:83064] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2017-09-29 10:20 ` [ruby-core:83061] " greg.potamianos
@ 2017-09-30  3:21 ` xkernigh
  2017-09-30 21:54   ` [ruby-core:83067] " Eric Wong
  2017-10-01  2:32   ` [ruby-core:83069] " Eric Wong
  2017-10-19 10:51 ` [ruby-core:83392] " charlie
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 22+ messages in thread
From: xkernigh @ 2017-09-30  3:21 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by kernigh (George Koehler).

Gregory Potamianos wrote:
> `while true; do nice -n19 ruby sched_yield_loop.rb; [ $? -ne 0 ] && break; done`

With your script, it is easy to reproduce the bug.
I shortened your shell loop to
`while nice -n19 ruby sched_yield_loop.rb; do done`

I tested both patches by Eric Wong,
the weaker version with PID check
> https://80x24.org/spew/20170809232533.14932-1-e@80x24.org/raw

and the stronger version with PID check and zeroing .writing
> https://80x24.org/spew/20170828232657.GA22848@dcvr/raw

I used this Ruby,
```
$ ruby -v
ruby 2.5.0dev (2017-09-30 trunk 60064) [x86_64-openbsd6.1]
```

The shell loop running sched_yield_loop.rb can run for a few minutes before the bug happens. It happens when sched_yield_loop.rb raises a timeout error; then I find a child Ruby spinning the CPU, as Gregory Potamianos described. Gregory, running Debian, reported that the weaker patch seems to fix the bug. I, running OpenBSD, observe that neither patch fixes the bug. I can still get the timeout error and the spinning child when Ruby is without patch, with the weaker patch, or with the stronger patch.

But I might have found a different bug. I did kill -ABRT a spinning child and gave the core dump to gdb; it seemed that both threads were stuck inside OpenBSD's thread library. The main thread was stuck in pthread_join(), and the other thread was stuck in _rthread_tls_destructors(). I did not find any thread stuck in the loop `while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0))` identified by Charlie Smurthwaite in the original bug report.

Anyone can use Gregory's sched_yield_loop.rb to check for the bug. If the weaker patch from Eric Wong fixes the bug for Linux, I suggest to put the weaker patch in trunk, and to backport it to older Ruby versions.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-67001

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:83067] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-09-30  3:21 ` [ruby-core:83064] " xkernigh
@ 2017-09-30 21:54   ` Eric Wong
  2017-10-01  2:32   ` [ruby-core:83069] " Eric Wong
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Wong @ 2017-09-30 21:54 UTC (permalink / raw
  To: ruby-core

xkernigh@netscape.net wrote:
> Anyone can use Gregory's sched_yield_loop.rb to check for the
> bug. If the weaker patch from Eric Wong fixes the bug for
> Linux, I suggest to put the weaker patch in trunk, and to
> backport it to older Ruby versions.

Thanks to all you for the feedback, I've commited my original patcha
as r60079.

> But I might have found a different bug. I did kill -ABRT a
> spinning child and gave the core dump to gdb; it seemed that
> both threads were stuck inside OpenBSD's thread library. The
> main thread was stuck in pthread_join(), and the other thread
> was stuck in _rthread_tls_destructors(). I did not find any
> thread stuck in the loop `while
> (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0))`
> identified by Charlie Smurthwaite in the original bug report.

Yes, this seems like a different bug.  Maybe OpenBSD pthreads
doesn't work well with fork/vfork (glibc barely does), and
that's a bug for OpenBSD guys to fix.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:83069] Re: [Ruby trunk Bug#13794] Infinite loop of sched_yield
  2017-09-30  3:21 ` [ruby-core:83064] " xkernigh
  2017-09-30 21:54   ` [ruby-core:83067] " Eric Wong
@ 2017-10-01  2:32   ` Eric Wong
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Wong @ 2017-10-01  2:32 UTC (permalink / raw
  To: ruby-core

xkernigh@netscape.net wrote:
> But I might have found a different bug. I did kill -ABRT a
> spinning child and gave the core dump to gdb; it seemed that
> both threads were stuck inside OpenBSD's thread library. The
> main thread was stuck in pthread_join(), and the other thread
> was stuck in _rthread_tls_destructors(). I did not find any
> thread stuck in the loop `while
> (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0))`
> identified by Charlie Smurthwaite in the original bug report.

Oh, it might be related to wanabe's patch for [Bug #13887]
for systems with FIBER_USE_NATIVE=0
https://bugs.ruby-lang.org/issues/13887

https://80x24.org/spew/20170928004228.4538-2-e@80x24.org/raw

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:83392] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2017-09-30  3:21 ` [ruby-core:83064] " xkernigh
@ 2017-10-19 10:51 ` charlie
  2018-01-08 10:26 ` [ruby-core:84699] " charlie
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2017-10-19 10:51 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

I'd just like to confirm that after several weeks, I have not seen a recurrence of this issue while running the original PID check patch. Thanks all!

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-67353

* Author: catphish (Charlie Smurthwaite)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: REQUIRED, 2.4: REQUIRED
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:84699] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (11 preceding siblings ...)
  2017-10-19 10:51 ` [ruby-core:83392] " charlie
@ 2018-01-08 10:26 ` charlie
  2018-01-15 13:27 ` [ruby-core:84870] " nagachika00
  2018-01-31 13:49 ` [ruby-core:85307] " usa
  14 siblings, 0 replies; 22+ messages in thread
From: charlie @ 2018-01-08 10:26 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by catphish (Charlie Smurthwaite).

I notice that this bug has been closed for a while but has not been backported into 2.3. Is this likely to happen? Thanks!

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-69420

* Author: catphish (Charlie Smurthwaite)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: REQUIRED, 2.4: REQUIRED
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:84870] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (12 preceding siblings ...)
  2018-01-08 10:26 ` [ruby-core:84699] " charlie
@ 2018-01-15 13:27 ` nagachika00
  2018-01-31 13:49 ` [ruby-core:85307] " usa
  14 siblings, 0 replies; 22+ messages in thread
From: nagachika00 @ 2018-01-15 13:27 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by nagachika (Tomoyuki Chikanaga).

Backport changed from 2.2: UNKNOWN, 2.3: REQUIRED, 2.4: REQUIRED to 2.2: UNKNOWN, 2.3: REQUIRED, 2.4: DONE

ruby_2_4 r61854 merged revision(s) 60079.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-69581

* Author: catphish (Charlie Smurthwaite)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: REQUIRED, 2.4: DONE
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [ruby-core:85307] [Ruby trunk Bug#13794] Infinite loop of sched_yield
       [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
                   ` (13 preceding siblings ...)
  2018-01-15 13:27 ` [ruby-core:84870] " nagachika00
@ 2018-01-31 13:49 ` usa
  14 siblings, 0 replies; 22+ messages in thread
From: usa @ 2018-01-31 13:49 UTC (permalink / raw
  To: ruby-core

Issue #13794 has been updated by usa (Usaku NAKAMURA).

Backport changed from 2.2: UNKNOWN, 2.3: REQUIRED, 2.4: DONE to 2.2: UNKNOWN, 2.3: DONE, 2.4: DONE

ruby_2_3 r62142 merged revision(s) 60079.

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-70091

* Author: catphish (Charlie Smurthwaite)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: DONE, 2.4: DONE
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-01-31 13:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-13794.20170809152106@ruby-lang.org>
2017-08-09 15:21 ` [ruby-core:82311] [Ruby trunk Bug#13794] Infinite loop of sched_yield charlie
2017-08-09 17:07   ` [ruby-core:82314] " Eric Wong
2017-08-09 23:33   ` [ruby-core:82319] " Eric Wong
2017-08-09 23:38     ` [ruby-core:82320] " Eric Wong
2017-08-09 23:49 ` [ruby-core:82321] " charlie
2017-08-28 23:51   ` [ruby-core:82495] " Eric Wong
2017-08-09 23:56 ` [ruby-core:82322] " charlie
2017-08-15 12:04 ` [ruby-core:82386] " charlie
2017-08-17 11:57 ` [ruby-core:82414] " charlie
2017-08-17 15:07 ` [ruby-core:82417] " charlie
2017-08-17 15:08 ` [ruby-core:82418] " charlie
2017-08-23  0:04   ` [ruby-core:82452] " Eric Wong
2017-08-29 10:17 ` [ruby-core:82510] " charlie
2017-09-01 16:09 ` [ruby-core:82626] " charlie
2017-09-29 10:20 ` [ruby-core:83061] " greg.potamianos
2017-09-30  3:21 ` [ruby-core:83064] " xkernigh
2017-09-30 21:54   ` [ruby-core:83067] " Eric Wong
2017-10-01  2:32   ` [ruby-core:83069] " Eric Wong
2017-10-19 10:51 ` [ruby-core:83392] " charlie
2018-01-08 10:26 ` [ruby-core:84699] " charlie
2018-01-15 13:27 ` [ruby-core:84870] " nagachika00
2018-01-31 13:49 ` [ruby-core:85307] " usa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).