ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:88503] [Ruby trunk Bug#14999] ConditionVariable doesn't require the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
@ 2018-08-16 12:59 ` eregontp
  2018-08-16 20:04   ` [ruby-core:88505] " Eric Wong
  2018-08-17 23:54 ` [ruby-core:88523] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire " eregontp
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-16 12:59 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been reported by Eregon (Benoit Daloze).

----------------------------------------
Bug #14999: ConditionVariable doesn't require the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it whens synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88505] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't require the Mutex if Thread#kill-ed
  2018-08-16 12:59 ` [ruby-core:88503] [Ruby trunk Bug#14999] ConditionVariable doesn't require the Mutex if Thread#kill-ed eregontp
@ 2018-08-16 20:04   ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-16 20:04 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> Bug #14999: ConditionVariable doesn't require the Mutex if Thread#kill-ed
> https://bugs.ruby-lang.org/issues/14999

r64398

Thank you; it was actually a regression introduced while fixing
[Bug #14841].  I was going to investigate this today (same as
[ruby-core:88498]), anyways; but you saved me a bunch of time.

Btw, I haven't come up with a better reproducer for [Bug #14841]
in the test suite.  It takes forever :/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88523] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
  2018-08-16 12:59 ` [ruby-core:88503] [Ruby trunk Bug#14999] ConditionVariable doesn't require the Mutex if Thread#kill-ed eregontp
@ 2018-08-17 23:54 ` eregontp
  2018-08-18  0:31   ` [ruby-core:88524] " Eric Wong
  2018-08-18 11:05 ` [ruby-core:88540] " eregontp
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-17 23:54 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


@normalperson I added the specs in r64409.
However I just saw that the spec failed once on Ubuntu:
https://rubyci.org/logs/rubyci.s3.amazonaws.com/ubuntu/ruby-trunk/log/20180817T213003Z.fail.html.gz
Any idea?

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73580

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88524] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-17 23:54 ` [ruby-core:88523] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire " eregontp
@ 2018-08-18  0:31   ` Eric Wong
  2018-08-18  4:35     ` [ruby-core:88533] " Eric Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Wong @ 2018-08-18  0:31 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> @normalperson I added the specs in r64409.
> However I just saw that the spec failed once on Ubuntu:
> https://rubyci.org/logs/rubyci.s3.amazonaws.com/ubuntu/ruby-trunk/log/20180817T213003Z.fail.html.gz

It might be because of rb_funcallv switching and hitting
interrupts.  I think this should fix it, but it can make
an incompatibility if somebody relies on redefining
Mutex#sleep...


```
diff --git a/thread_sync.c b/thread_sync.c
index 5e511af0db..87c17316a9 100644
--- a/thread_sync.c
+++ b/thread_sync.c
@@ -1358,7 +1358,7 @@ static VALUE
 do_sleep(VALUE args)
 {
     struct sleep_call *p = (struct sleep_call *)args;
-    return rb_funcallv(p->mutex, id_sleep, 1, &p->timeout);
+    return rb_mutex_sleep(p->mutex, p->timeout);
 }
 
 static VALUE
```

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [ruby-core:88533] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-18  0:31   ` [ruby-core:88524] " Eric Wong
@ 2018-08-18  4:35     ` Eric Wong
  2018-08-18  6:29       ` [ruby-core:88536] " Eric Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Wong @ 2018-08-18  4:35 UTC (permalink / raw
  To: ruby-core

Eric Wong wrote:
> eregontp@gmail.com wrote:
> > @normalperson I added the specs in r64409.
> > However I just saw that the spec failed once on Ubuntu:
> > https://rubyci.org/logs/rubyci.s3.amazonaws.com/ubuntu/ruby-trunk/log/20180817T213003Z.fail.html.gz
> 
> It might be because of rb_funcallv switching and hitting
> interrupts.  I think this should fix it, but it can make
> an incompatibility if somebody relies on redefining
> Mutex#sleep...

Committed a version which only avoids switching on r64436
We'll see if CI alerts are quieter, now.

However, I don't know what to do about Mutex_m since that still
hits the funcall path.  We can't blindly mask out interrupts
because we need Mutex#sleep to react to Thread#run.

Also, please don't add similar specs for Mutex_m, yet.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88536] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-18  4:35     ` [ruby-core:88533] " Eric Wong
@ 2018-08-18  6:29       ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-18  6:29 UTC (permalink / raw
  To: ruby-core

Eric Wong wrote:
> Committed a version which only avoids switching on r64436
> We'll see if CI alerts are quieter, now.

No good.  Now testing:
  https://80x24.org/spew/20180818062657.3424-1-e@80x24.org/raw
(should commit when done testing, but I need food :x)

Also, https://bugs.ruby-lang.org/issues/15002 should
fix spurious wakeups from ConditionVariable#wait which
might cause some spec failures under MJIT.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88540] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
  2018-08-16 12:59 ` [ruby-core:88503] [Ruby trunk Bug#14999] ConditionVariable doesn't require the Mutex if Thread#kill-ed eregontp
  2018-08-17 23:54 ` [ruby-core:88523] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire " eregontp
@ 2018-08-18 11:05 ` eregontp
  2018-08-18 11:36   ` [ruby-core:88541] " Eric Wong
  2018-08-18 13:50 ` [ruby-core:88542] " eregontp
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-18 11:05 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


What was wrong with the first patch?
It looks good from a quick glance, although indeed it doesn't deal with Mutex_m.
Maybe Thread.handle_interrupt(Object => :on_blocking) (or the equivalent in C) would help?

> No good. Now testing:
> https://80x24.org/spew/20180818062657.3424-1-e@80x24.org/raw

I don't think that's OK for semantics.
IMHO, anything in the synchronize block should hold the Mutex, otherwise we break users' assumptions.
So even if there is an exception waking ConditionVariable#wait, I believe users expect to own the Mutex during that exception, until getting out of the synchronize.
Otherwise, it's very surprising if the ensure block sometimes has, sometimes doesn't have the Mutex and so can't reliably modify data structures protected by the Mutex.
I also believe this was the behavior on Ruby 2.5 and older, but I might be wrong (at least I couldn't observe the spec to fail in thousands of tries).

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73598

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88541] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-18 11:05 ` [ruby-core:88540] " eregontp
@ 2018-08-18 11:36   ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-18 11:36 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> What was wrong with the first patch?

I still saw CI failures with it:

http://ci.rvm.jp/results/trunk_gcc7@silicon-docker/1234097

In retrospect, it could've also been fixed with [Feature #15002]

> It looks good from a quick glance, although indeed it doesn't deal with Mutex_m.
> Maybe Thread.handle_interrupt(Object => :on_blocking) (or the equivalent in C) would help?

I considered that, but the problem might stem from blocking in
an unexpected place (because CV#wait is being spuriously woken).

> > No good. Now testing:
> > https://80x24.org/spew/20180818062657.3424-1-e@80x24.org/raw
> 
> I don't think that's OK for semantics.
> IMHO, anything in the synchronize block should hold the Mutex, otherwise we break users' assumptions.
> So even if there is an exception waking ConditionVariable#wait, I believe users expect to own the Mutex during that exception, until getting out of the synchronize.
> Otherwise, it's very surprising if the ensure block sometimes has, sometimes doesn't have the Mutex and so can't reliably modify data structures protected by the Mutex.
> I also believe this was the behavior on Ruby 2.5 and older, but I might be wrong (at least I couldn't observe the spec to fail in thousands of tries).

OK, I didn't consider the ensure case.  So maybe you can revert
r64441 (I'm going AFK for a while), because there's a chance
r64444 also fixed the issue.

Last alert email I got was 2018-08-18 08:59Z, so it's been a
while since I got an alert (this is from ko1's ruby-alerts
list, I don't know if you're on it).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88542] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-08-18 11:05 ` [ruby-core:88540] " eregontp
@ 2018-08-18 13:50 ` eregontp
  2018-08-18 15:56 ` [ruby-core:88543] " eregontp
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: eregontp @ 2018-08-18 13:50 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


OK, I'll revert r64441 since I think we should preserve the semantics of always being locked inside Mutex#synchronize, even if an exception interrupts ConditionVariable#wait.

Let's keep thinking about how to implement that correctly.
Mutex_m would be nice to have correct too, but I think it's less critical as I suspect very few people use it.

FWIW, we're trying to optimize ConditionVariable in TruffleRuby and it's proving quite tricky to:
* not miss ConditionVariable#signal / ConditionVariable#broadcast
* handles other interrupts in #wait
* make sure to reacquire the Mutex when going out of #wait
* still handle interrupts/thread switching in the re-acquire

Maybe the original Ruby code can be an inspiration here: see lib/thread.rb at r42801 or
https://github.com/ruby/ruby/blob/324df61eea3e4c107e419ea3c685eaea4da7fd5e/lib/thread.rb#L65-L81

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73601

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88543] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-08-18 13:50 ` [ruby-core:88542] " eregontp
@ 2018-08-18 15:56 ` eregontp
  2018-08-18 21:15   ` [ruby-core:88547] " Eric Wong
  2018-08-19  0:13 ` [ruby-core:88549] " eregontp
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-18 15:56 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


It failed for http://ci.rvm.jp/results/trunk_gcc6@silicon-docker/1235550

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73604

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88547] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-18 15:56 ` [ruby-core:88543] " eregontp
@ 2018-08-18 21:15   ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-18 21:15 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> It failed for http://ci.rvm.jp/results/trunk_gcc6@silicon-docker/1235550

Maybe wishful thinking, but r64464 might be the right fix.

Time for Me#sleep

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88549] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-08-18 15:56 ` [ruby-core:88543] " eregontp
@ 2018-08-19  0:13 ` eregontp
  2018-08-19  9:57   ` [ruby-core:88552] " Eric Wong
  2018-08-19 21:42 ` [ruby-core:88557] " eregontp
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-19  0:13 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


normalperson (Eric Wong) wrote:
>  Maybe wishful thinking, but r64464 might be the right fix.

Seems it's not yet fully fixed unfortunately:
http://ci.rvm.jp/results/trunk_clang_39@silicon-docker/1236189
http://ci.rvm.jp/results/trunk_gcc5@silicon-docker/1236243
http://ci.rvm.jp/results/trunk-asserts-nopara@silicon-docker/1236250
http://ci.rvm.jp/results/trunk_gcc5@silicon-docker/1236262
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/1236304

I think we need to give some thinking on this,
and I don't want to stress you to fix it to fix the build.
(although this should be solved soon IMHO, latest before the next preview/RC)

So I propose to "tag" the spec so it won't run in CI, so you can focus on it when there is time.
I'll do that tomorrow if you agree.

It would be also be interesting to see if Ruby <= 2.5 actually ensured the semantics I described above.

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73610

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88552] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-19  0:13 ` [ruby-core:88549] " eregontp
@ 2018-08-19  9:57   ` Eric Wong
  2018-08-19 19:44     ` [ruby-core:88556] " Eric Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Wong @ 2018-08-19  9:57 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> I think we need to give some thinking on this,
> and I don't want to stress you to fix it to fix the build.
> (although this should be solved soon IMHO, latest before the next preview/RC)

It needs to be fixed ASAP.

r64467 seems to make CI happy, at least (along with r64398,
which was an absolutely necessary fix).  The key difference with
r64467 fix is I figured out I needed to reproduce the failure
reliably by using a single CPU (schedtool -a 0x1 -e ...)

> So I propose to "tag" the spec so it won't run in CI, so you can focus on it when there is time.
> I'll do that tomorrow if you agree.

Please don't.  CI is happy at the moment, at least.

> It would be also be interesting to see if Ruby <= 2.5 actually
> ensured the semantics I described above.

I was able to reproduce the problem with the old
timer-thread using "schedtool -a 0x1".

"git bisect" points to r63498 ("thread_pthread.c: enable thread
cache by default"), which is HIGHLY unexpected.

I'm not seeing how thread caching can be the cause of the bug,
but rather it makes an existing bug more apparent by being
faster and hitting the race sooner.

The thread cache only operates at the pthreads level, so there's
no way any Ruby-level interrupt could hit across Ruby Thread
lifetimes.  pthread_kill is never used in the
Mutex/ConditionVariable code paths; and different pthreads
condvars are used for sleeping/wakeups.  So it is a mystery as
to how thread caching ends up affecting Ruby-level interrupt
handling...

What thread-caching does do is make the MSPECOPT "-R" option
noticeably faster

Fwiw, I can still reproduce the failures with low timeouts:

```
diff --git a/thread_pthread.c b/thread_pthread.c
index 2fd60ddd4a..e8da3ee9c2 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -187,7 +187,7 @@ do_gvl_timer(rb_vm_t *vm, rb_thread_t *th)
 
     if (vm->gvl.timer_err == ETIMEDOUT) {
         ts.tv_sec = 0;
-        ts.tv_nsec = TIME_QUANTUM_NSEC;
+        ts.tv_nsec = 1;
         ts = native_cond_timeout(&nd->cond.gvlq, ts);
     }
     vm->gvl.timer = th;
```

So there's definitely more to investigate until I'm satisified
with the reliability of this (but I'm too tired, now).

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [ruby-core:88556] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-19  9:57   ` [ruby-core:88552] " Eric Wong
@ 2018-08-19 19:44     ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-19 19:44 UTC (permalink / raw
  To: ruby-core

Eric Wong wrote:
> Fwiw, I can still reproduce the failures with low timeouts:
> 
> ```
> diff --git a/thread_pthread.c b/thread_pthread.c
> index 2fd60ddd4a..e8da3ee9c2 100644
> --- a/thread_pthread.c
> +++ b/thread_pthread.c
> @@ -187,7 +187,7 @@ do_gvl_timer(rb_vm_t *vm, rb_thread_t *th)
>  
>      if (vm->gvl.timer_err == ETIMEDOUT) {
>          ts.tv_sec = 0;
> -        ts.tv_nsec = TIME_QUANTUM_NSEC;
> +        ts.tv_nsec = 1;
>          ts = native_cond_timeout(&nd->cond.gvlq, ts);
>      }
>      vm->gvl.timer = th;
> ```
> 
> So there's definitely more to investigate until I'm satisified
> with the reliability of this (but I'm too tired, now).

I think the only way to work reliably is to disable interrupt
checking when attempting to lock a mutex in ensure:

  https://80x24.org/spew/20180819193954.13807-1-e@80x24.org/raw

We can rely on deadlock checking to prevent problems; but it
could still affect user-experience if Ctrl-C takes a while
*shrug*

Currently testing, will take a while.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88557] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-08-19  0:13 ` [ruby-core:88549] " eregontp
@ 2018-08-19 21:42 ` eregontp
  2018-08-19 23:06   ` [ruby-core:88558] " Eric Wong
  2018-08-20 13:30 ` [ruby-core:88570] " eregontp
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-19 21:42 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


normalperson (Eric Wong) wrote:
> r64467 seems to make CI happy, at least

Great!

> I figured out I needed to reproduce the failure
>  reliably by using a single CPU (schedtool -a 0x1 -e ...)

I could also verify it fails before your commit, and passes after with:

    taskset -c 1 mspec-run -R1000 library/conditionvariable/wait_spec.rb
  
>  "git bisect" points to r63498 ("thread_pthread.c: enable thread
>  cache by default"), which is HIGHLY unexpected.

So it seems it doesn't happen in Ruby <= 2.5 due to thread caching being disabled then?
 
>  I'm not seeing how thread caching can be the cause of the bug,
>  but rather it makes an existing bug more apparent by being
>  faster and hitting the race sooner.

No idea either, but I guess a bug could lurk in there.
I did experience a few weird bugs when doing thread caching in TruffleRuby
(e.g., inline caches based on pthread_self() are incorrect as multiple Ruby threads could use the same pthread thread)

>  What thread-caching does do is make the MSPECOPT "-R" option
>  noticeably faster

Right, because creating threads is so much faster.
Maybe the fact pthread startup is done each time has some effect on races
(e.g., the pthread thread has a perfect view of the caller thread when starting,
but not true with caching, but maybe the GVL still ensure that)

> I think the only way to work reliably is to disable interrupt
> checking when attempting to lock a mutex in ensure:

At least if the interrupt causes to escape the function (e.g., Thread#raise/Thread#kill),
then that escape must be delayed after re-acquiring the Mutex.

In normal Mutex#lock and Mutex#synchronize, it's fine to raise/kill before taking the Mutex,
because it's before entering the critical section, but unfortunately here we are inside the critical section.

> it could still affect user-experience if Ctrl-C takes a while

When would that happen?
If some other Thread holds the Mutex too long?
It's probably never a good idea to hold a Mutex for seconds or a long time, especially when using a ConditionVariable.
It seems fair enough to delay the interrupt in such a case, as the main Thread is waiting for the Mutex.
Maybe we could warn if re-acquiring takes too long and it becomes a problem in practice (I think it won't).

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73615

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88558] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-19 21:42 ` [ruby-core:88557] " eregontp
@ 2018-08-19 23:06   ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-19 23:06 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> normalperson (Eric Wong) wrote:
> >  "git bisect" points to r63498 ("thread_pthread.c: enable
> >  thread cache by default"), which is HIGHLY unexpected.
> 
> So it seems it doesn't happen in Ruby <= 2.5 due to thread
> caching being disabled then?

Right; but that might just be "luck"...

> >  I'm not seeing how thread caching can be the cause of the
> >  bug, but rather it makes an existing bug more apparent by
> >  being faster and hitting the race sooner.
> 
> No idea either, but I guess a bug could lurk in there.  I did
> experience a few weird bugs when doing thread caching in
> TruffleRuby (e.g., inline caches based on pthread_self() are
> incorrect as multiple Ruby threads could use the same pthread
> thread)

At least for the core VM and standard library; I don't think
pthread_self is a factor; and we already clobber our TSD
pointer since r63836

> >  What thread-caching does do is make the MSPECOPT "-R"
> >  option noticeably faster
> 
> Right, because creating threads is so much faster.  Maybe the
> fact pthread startup is done each time has some effect on
> races (e.g., the pthread thread has a perfect view of the
> caller thread when starting, but not true with caching, but
> maybe the GVL still ensure that)

Yes, it might affect accounting done in the kernel; particularly
when bound to a single CPU.  And it still takes -R1000 to
reliably reproduce the problem with a single CPU; -R500 was
sometimes successful, even.

> > I think the only way to work reliably is to disable
> > interrupt checking when attempting to lock a mutex in
> > ensure:
> 
> At least if the interrupt causes to escape the function (e.g.,
> Thread#raise/Thread#kill), then that escape must be delayed
> after re-acquiring the Mutex.

Yes, so I've committed r64476

> In normal Mutex#lock and Mutex#synchronize, it's fine to
> raise/kill before taking the Mutex, because it's before
> entering the critical section, but unfortunately here we are
> inside the critical section.

So this is why pthread_mutex_lock is forbidden from returning
EINTR by POSIX.

Mutex#lock/synchronize should have been uninterruptible, too;
but it's too late to change that as it would break compatibility
with existing code (bootstraptest/test_thread.rb already breaks).

> > it could still affect user-experience if Ctrl-C takes a while
> 
> When would that happen?
> If some other Thread holds the Mutex too long?

Yes, but I suppose it's not a real problem.

> It's probably never a good idea to hold a Mutex for seconds or
> a long time, especially when using a ConditionVariable.
> It seems fair enough to delay the interrupt in such a case, as
> the main Thread is waiting for the Mutex.

Agreed(*).

> Maybe we could warn if re-acquiring takes too long and it
> becomes a problem in practice (I think it won't).

We already have deadlock checking to alert users to real
problems, so it's probably not a problem in practice.


(*) Along those lines, I think the "lock" idiom for GVL is not
    ideal for performance (kosaki rewrote the GVL for 1.9.3 to
    optimize for contention as a result).  I might try replacing
    the GVL with a platform-independent scheduler which limits
    parallelism to the data structures the GVL currently protects.

    Note: This will NOT solve the parallelism problem which
    exists in C Ruby; that is a much bigger task (but still
    doable with a scheduler, and perhaps even more so).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88570] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2018-08-19 21:42 ` [ruby-core:88557] " eregontp
@ 2018-08-20 13:30 ` eregontp
  2018-08-20 20:05   ` [ruby-core:88577] " Eric Wong
  2018-09-05  7:05 ` [ruby-core:88852] " lars
  2018-09-05 12:18 ` [ruby-core:88860] " eregontp
  9 siblings, 1 reply; 21+ messages in thread
From: eregontp @ 2018-08-20 13:30 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


normalperson (Eric Wong) wrote:
>  (*) Along those lines, I think the "lock" idiom for GVL is not
>  ideal for performance (kosaki rewrote the GVL for 1.9.3 to
>  optimize for contention as a result).  I might try replacing
>  the GVL with a platform-independent scheduler which limits
>  parallelism to the data structures the GVL currently protects.

I don't follow, which data structures would the scheduler protect?
Internal VM data structures but not e.g. Ruby Hash?

>  Note: This will NOT solve the parallelism problem which
>  exists in C Ruby; that is a much bigger task (but still
>  doable with a scheduler, and perhaps even more so).

Now I am curious, how would a scheduler solve the
lack of parallelism at the Ruby level in CRuby?

I'm particularly interested since I looked at concurrency in Ruby
and specifically TruffleRuby during my PhD, and how to have GIL-like
guarantees and yet allow to run Ruby code in parallel.

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73627

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88577] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-08-20 13:30 ` [ruby-core:88570] " eregontp
@ 2018-08-20 20:05   ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-08-20 20:05 UTC (permalink / raw
  To: ruby-core

eregontp@gmail.com wrote:
> normalperson (Eric Wong) wrote:
> >  (*) Along those lines, I think the "lock" idiom for GVL is not
> >  ideal for performance (kosaki rewrote the GVL for 1.9.3 to
> >  optimize for contention as a result).  I might try replacing
> >  the GVL with a platform-independent scheduler which limits
> >  parallelism to the data structures the GVL currently protects.
> 
> I don't follow, which data structures would the scheduler protect?
> Internal VM data structures but not e.g. Ruby Hash?

Everything the GVL currently protects, including the Ruby Hash.

> >  Note: This will NOT solve the parallelism problem which
> >  exists in C Ruby; that is a much bigger task (but still
> >  doable with a scheduler, and perhaps even more so).
> 
> Now I am curious, how would a scheduler solve the
> lack of parallelism at the Ruby level in CRuby?

It might allow cheaper entry/exit for blocking regions;
so we can use more blocking regions to achieve parallelism.

GVL release/acquire is expensive at the moment; so there's
places where we hold the GVL (e.g. readdir calls) and sacrifice
parallelism when dealing with slow filesystems.

Also, pthread_cond_wait is just like ConditionVariable#wait in
that it must re-lock the mutex after sleeping.  This re-lock
can be contended, and it should not be necessary if we move
away from condvars for some platforms (Linux and win32).
The signaller (FUTEX_WAKE/SetEvent) can remove from the waitqueue
while it's holding the lock, so the sleeper won't have to.

Anyways, just a small idea in my head and totally unproven.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88852] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2018-08-20 13:30 ` [ruby-core:88570] " eregontp
@ 2018-09-05  7:05 ` lars
  2018-09-06 17:33   ` [ruby-core:88884] " Eric Wong
  2018-09-05 12:18 ` [ruby-core:88860] " eregontp
  9 siblings, 1 reply; 21+ messages in thread
From: lars @ 2018-09-05  7:05 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by larskanis (Lars Kanis).


> >    So it seems it doesn't happen in Ruby <= 2.5 due to thread
> >    caching being disabled then?
>
> Right; but that might just be "luck"...

I get this error on ruby-2.4.4 [i386-mingw32] and [i386-mingw32] while nightly builds of RubyInstaller: https://ci.appveyor.com/project/larskanis/rubyinstaller2-hbuor/build/1.0.722/job/dva7ug4vako5jhrl#L1295
It appears with a probability of around 50%.

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73891

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88860] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
       [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2018-09-05  7:05 ` [ruby-core:88852] " lars
@ 2018-09-05 12:18 ` eregontp
  9 siblings, 0 replies; 21+ messages in thread
From: eregontp @ 2018-09-05 12:18 UTC (permalink / raw
  To: ruby-core

Issue #14999 has been updated by Eregon (Benoit Daloze).


@larskanis Thanks for the report.
I also noticed it happens on RubyCI for Ruby 2.4 and earlier
(example: https://rubyci.org/logs/rubyci.s3.amazonaws.com/arch/ruby-2.4/log/20180905T105615Z.fail.html.gz)

I'm not sure what's best.

@normalperson Do you think we could backport the fix (or the logic behind the fix since the code changed quite a bit) to Ruby 2.4 and 2.3?
Interestingly enough, it seems to not fail on Ruby 2.5 at least in recent RubyCI builds.

For now, I'll mark the spec as ruby_bug on Ruby < 2.5.

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73899

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28@cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [ruby-core:88884] Re: [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
  2018-09-05  7:05 ` [ruby-core:88852] " lars
@ 2018-09-06 17:33   ` Eric Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wong @ 2018-09-06 17:33 UTC (permalink / raw
  To: ruby-core

lars@greiz-reinsdorf.de wrote:
> Issue #14999 has been updated by larskanis (Lars Kanis).
> > >    So it seems it doesn't happen in Ruby <= 2.5 due to thread
> > >    caching being disabled then?
> >
> > Right; but that might just be "luck"...
> 
> I get this error on ruby-2.4.4 [i386-mingw32] and [i386-mingw32] while nightly builds of RubyInstaller: https://ci.appveyor.com/project/larskanis/rubyinstaller2-hbuor/build/1.0.722/job/dva7ug4vako5jhrl#L1295
> It appears with a probability of around 50%.

Thanks for the reminder (this bugfix already feels like a year ago):

Backport requested: https://bugs.ruby-lang.org/issues/15084

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-09-06 17:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-14999.20180816125954@ruby-lang.org>
2018-08-16 12:59 ` [ruby-core:88503] [Ruby trunk Bug#14999] ConditionVariable doesn't require the Mutex if Thread#kill-ed eregontp
2018-08-16 20:04   ` [ruby-core:88505] " Eric Wong
2018-08-17 23:54 ` [ruby-core:88523] [Ruby trunk Bug#14999] ConditionVariable doesn't reacquire " eregontp
2018-08-18  0:31   ` [ruby-core:88524] " Eric Wong
2018-08-18  4:35     ` [ruby-core:88533] " Eric Wong
2018-08-18  6:29       ` [ruby-core:88536] " Eric Wong
2018-08-18 11:05 ` [ruby-core:88540] " eregontp
2018-08-18 11:36   ` [ruby-core:88541] " Eric Wong
2018-08-18 13:50 ` [ruby-core:88542] " eregontp
2018-08-18 15:56 ` [ruby-core:88543] " eregontp
2018-08-18 21:15   ` [ruby-core:88547] " Eric Wong
2018-08-19  0:13 ` [ruby-core:88549] " eregontp
2018-08-19  9:57   ` [ruby-core:88552] " Eric Wong
2018-08-19 19:44     ` [ruby-core:88556] " Eric Wong
2018-08-19 21:42 ` [ruby-core:88557] " eregontp
2018-08-19 23:06   ` [ruby-core:88558] " Eric Wong
2018-08-20 13:30 ` [ruby-core:88570] " eregontp
2018-08-20 20:05   ` [ruby-core:88577] " Eric Wong
2018-09-05  7:05 ` [ruby-core:88852] " lars
2018-09-06 17:33   ` [ruby-core:88884] " Eric Wong
2018-09-05 12:18 ` [ruby-core:88860] " eregontp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).