ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
@ 2022-06-05 21:39 nevans (Nicholas Evans)
  2022-06-06  0:47 ` [ruby-core:108776] " mame (Yusuke Endoh)
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: nevans (Nicholas Evans) @ 2022-06-05 21:39 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been reported by nevans (Nicholas Evans).

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:108776] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
@ 2022-06-06  0:47 ` mame (Yusuke Endoh)
  2022-06-06  5:17 ` [ruby-core:108781] " ioquatix (Samuel Williams)
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mame (Yusuke Endoh) @ 2022-06-06  0:47 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by mame (Yusuke Endoh).

Assignee set to ioquatix (Samuel Williams)

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-97841

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:108781] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
  2022-06-06  0:47 ` [ruby-core:108776] " mame (Yusuke Endoh)
@ 2022-06-06  5:17 ` ioquatix (Samuel Williams)
  2022-06-06  5:18 ` [ruby-core:108782] " ioquatix (Samuel Williams)
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ioquatix (Samuel Williams) @ 2022-06-06  5:17 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by ioquatix (Samuel Williams).


Unfortunately Mutex, Queue and probably other objects don't take a reference to the waiting fiber, so your fiber is being garbage collected after it's in the wait list. Then some other object is allocated and unblock is called with it (or not = SEGV).

You need to keep track of blocked fibers:

```ruby
def block...
  @blocked[fiber] = true
...

def unblock
  @blocked.delete(fiber)
...
```

This prevents the fiber from going out of scope.

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-97846

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:108782] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
  2022-06-06  0:47 ` [ruby-core:108776] " mame (Yusuke Endoh)
  2022-06-06  5:17 ` [ruby-core:108781] " ioquatix (Samuel Williams)
@ 2022-06-06  5:18 ` ioquatix (Samuel Williams)
  2022-06-06 16:40 ` [ruby-core:108784] " nevans (Nicholas Evans)
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ioquatix (Samuel Williams) @ 2022-06-06  5:18 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by ioquatix (Samuel Williams).


@ko1 I saw this problem because fiber is not retained while waiting, because we have waiting threads but not waiting fibers at VM level IIRC. Probably we need to make mutex/queue mark the wait list correctly? Is there performance issue?

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-97847

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:108784] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
                   ` (2 preceding siblings ...)
  2022-06-06  5:18 ` [ruby-core:108782] " ioquatix (Samuel Williams)
@ 2022-06-06 16:40 ` nevans (Nicholas Evans)
  2022-06-06 18:31 ` [ruby-core:108788] " nevans (Nicholas Evans)
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: nevans (Nicholas Evans) @ 2022-06-06 16:40 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by nevans (Nicholas Evans).


Aha! Thanks, that makes perfect sense.  And it does indeed fix it.  I knew this toy scheduler wasn't *good*, and my original version did retain references to the waiting fibers, but I was slowly golfing it down to the smallest readable version that could possibly work.

IMO, pure ruby code should never need to worry about SEGV nor should a ruby method ever be called with garbage collected or reallocated values.  And the most obvious answer (to me) is that the mutex/queue wait lists should mark all waiting fibers.  I had already assumed that they did.

Probably there are scenarios where it's useful to allow waiting fibers to be GCed?  In that case, the fiber scheduler should still never be passed "fiber" arguments unless they really truly are fibers (and the correct fibers, in-case some other fiber is allocated).  Perhaps the fiber scheduler should be given a callback, e.g. `fiber_will_gc(fiber)`: not only would this give the scheduler an opportunity to do something before they are simply abandoned, it would make explicit to fiber scheduler users and implementers the expectation that they can be GCed and abandoned.  But that sounds a lot more complicated than simply adding it to the wait list, and I can't recall the scenarios where it would be useful.

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-97849

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:108788] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
                   ` (3 preceding siblings ...)
  2022-06-06 16:40 ` [ruby-core:108784] " nevans (Nicholas Evans)
@ 2022-06-06 18:31 ` nevans (Nicholas Evans)
  2022-07-09 22:44 ` [ruby-core:109174] " nevans (Nicholas Evans)
  2022-09-20 22:51 ` [ruby-core:109968] " ioquatix (Samuel Williams)
  6 siblings, 0 replies; 8+ messages in thread
From: nevans (Nicholas Evans) @ 2022-06-06 18:31 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by nevans (Nicholas Evans).

File 0001-Mark-Fibers-in-Mutex-Queue-SizedQueue-wait-lists.patch added

There might be a good reason we don't want to do this (or want to do it differently).  But I can confirm that marking the fibers in the wait queues makes the SEGV go away! :D

Patch also at https://github.com/ruby/ruby/pull/5979

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-97855

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)
0001-Mark-Fibers-in-Mutex-Queue-SizedQueue-wait-lists.patch (1.66 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:109174] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
                   ` (4 preceding siblings ...)
  2022-06-06 18:31 ` [ruby-core:108788] " nevans (Nicholas Evans)
@ 2022-07-09 22:44 ` nevans (Nicholas Evans)
  2022-09-20 22:51 ` [ruby-core:109968] " ioquatix (Samuel Williams)
  6 siblings, 0 replies; 8+ messages in thread
From: nevans (Nicholas Evans) @ 2022-07-09 22:44 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by nevans (Nicholas Evans).

File 0001-Mark-blocked-fibers-in-waitq-Mutex-Queue-etc.patch added

IMO, neither ruby nor C should *ever* be in a position where they are referencing freed memory.  And, although `sync_waiter` lists check `waiter->th->status != THREAD_KILLED` when they are being traversed, that would be dangerous if the waiter's thread or fiber (including the stack the `sync_waiter` lives on) has been freed, right?

At first I thought this `SEGV` could only happen while using FiberScheduler with non-blocking fibers.  But dead threads can be GCed too, right?  So I suspect it can also happen if threads have been killed abnormally (skipping the `delete_from_waitq` code).  Forking is handled (`rb_thread_atfork` for mutex and `fork_gen` for queues and condition variables), but are there other scenarios where a thread could end without a chance to run its cleanup (e.g. `delete_from_waitq`)?  I'm more concerned with scenarios that can be created from within ruby alone than with C extensions or FFI.

One alternative: maintain "weak ref" relationship between wait queues and their blocked fibers.  If we added a waiter(s) field to `rb_fiber_t` we could then remove it(them) in `fiber_free`.  But weak pointers are inherently strange and surprising, even from the wait list of a queue or mutex (etc).  Not only would it need to be documented, I believe most developers will simply *assume* (like I did) that fibers can be reached by the queue or mutex (etc) that they are waiting on.  Anything else is surprising, *especially* if you've spent time in go (i.e. named channels connecting unnamed processes).

And, as far as I can tell, fibers are marked by the other two places in places in standard lib that use them:
* `enumerator.c`: `enumerator_mark` marks both `fib` (the enumerator fiber) and `dst` (the resuming fiber)
* `ext/monitor/monitor.c`: `monitor_mark` and `VALUE owner`

--------

I did look at Haskell's GC to try and understand how they use it for deadlock detection.  The simple version, translated slightly for ruby:
1. mvars and stm (for ruby: mutexes, queues, and condition variables) will mark blocked threads (fibers) as reachable.
2. thread/fiber schedulers do *not* mark blocked threads/fibers.
3. GC needs to have a special case for blocking threads/fibers:
   * After marking everything reachable from the GC roots, but before running any finalizers:
   * All living but unreachable (i.e. blocked and deadlocked) threads/fibers are "resurrected"
   * Marking continues for everything that is reachable from the resurrected threads/fibers.
   * The appropriate deadlock exception is raised in all resurrected threads (which unblocks them).
   * GC can move on to finalizers and sweeping *only after the resurrected fibers have been marked.*

Ruby's common memory use and object reference patterns are *very* different from Haskell's, and I suspect this would be less likely to catch deadlocks for ruby than it is for Haskell.  But if ruby *did* add something like this to its GC, I suspect that libraries would shift their usage patterns to make it more useful!

At any rate, adding deadlock detection to GC would still require marking blocked fibers as reachable from the object they are waiting for.  :)

--------

Also, I updated my PR and attached the patch here.

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-98314

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)
0001-Mark-Fibers-in-Mutex-Queue-SizedQueue-wait-lists.patch (1.66 KB)
0001-Mark-blocked-fibers-in-waitq-Mutex-Queue-etc.patch (5.9 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:109968] [Ruby master Bug#18818] SEGV (Fiber scheduler?)
  2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
                   ` (5 preceding siblings ...)
  2022-07-09 22:44 ` [ruby-core:109174] " nevans (Nicholas Evans)
@ 2022-09-20 22:51 ` ioquatix (Samuel Williams)
  6 siblings, 0 replies; 8+ messages in thread
From: ioquatix (Samuel Williams) @ 2022-09-20 22:51 UTC (permalink / raw)
  To: ruby-core

Issue #18818 has been updated by ioquatix (Samuel Williams).


@ko1 you mentioned you had some ideas to fix this. What would you like to do?

----------------------------------------
Bug #18818: SEGV (Fiber scheduler?)
https://bugs.ruby-lang.org/issues/18818#change-99218

* Author: nevans (Nicholas Evans)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* ruby -v: 3.1.2, 3.0.4, master
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
The attached script (and/or others like it) can cause SEGV in 3.0, 3.1, and master.  It has always behaved as expected when I use `optflags=-O0`.

When I use it with `make run` on `master`:
```
./miniruby -I../lib -I. -I.ext/common  -r./x86_64-linux-fake  ../test.rb 
========================================================================
fiber_queue
completed in 0.00031349004711955786
========================================================================
fiber_sized_queue
../test.rb:62: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0dev (2022-06-05T06:18:26Z master 5ce0be022f) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0023 e:000022 CFUNC  :%
c:0004 p:0031 s:0018 e:000015 METHOD ../test.rb:62 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :pop
c:0002 p:0009 s:0006 e:000005 BLOCK  ../test.rb:154 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
../test.rb:154:in `block (2 levels) in <main>'
../test.rb:154:in `pop'
../test.rb:62:in `unblock'
../test.rb:62:in `%'

-- Machine register context ------------------------------------------------
 RIP: 0x000055eae9ffa417 RBP: 0x00007f80aba855d8 RSP: 0x00007f80a9789598
 RAX: 0x000000000000009b RBX: 0x00007f80a9789628 RCX: 0x00007f80ab9c37a0
 RDX: 0x00007f80a97895c0 RDI: 0x0000000000000000 RSI: 0x000000000000009b
  R8: 0x0000000000000000  R9: 0x00007f80a97895c0 R10: 0x0000000055550083
 R11: 0x00007f80ac32ace0 R12: 0x00007f80aba855d8 R13: 0x00007f80ab9c3780
 R14: 0x00007f80a97895c0 R15: 0x000000000000009b EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x5cf) [0x55eaea06b0ef]
./miniruby(rb_bug_for_fatal_signal+0xec) [0x55eae9e4fc2c]
./miniruby(sigsegv+0x4d) [0x55eae9fba30d]
[0x7f80ac153520]
./miniruby(rb_id_table_lookup+0x7) [0x55eae9ffa417]
./miniruby(callable_method_entry+0x103) [0x55eaea046bd3]
./miniruby(vm_respond_to+0x3f) [0x55eaea056c1f]
./miniruby(rb_check_funcall_default_kw+0x19c) [0x55eaea05788c]
./miniruby(rb_check_convert_type_with_id+0x8e) [0x55eae9f1b85e]
./miniruby(rb_str_format_m+0x1a) [0x55eae9fce82a]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_funcallv_scope+0x1b0) [0x55eaea05a770]
./miniruby(rb_fiber_scheduler_unblock+0x3e) [0x55eae9fb979e]
./miniruby(sync_wakeup+0x10d) [0x55eae9ffd45d]
./miniruby(rb_szqueue_pop+0xf5) [0x55eae9ffefd5]
./miniruby(vm_call_cfunc_with_frame+0x127) [0x55eaea041ac7]
./miniruby(vm_exec_core+0x114) [0x55eaea05d684]
./miniruby(rb_vm_exec+0x187) [0x55eaea04e747]
./miniruby(rb_vm_invoke_proc+0x5f) [0x55eaea05584f]
./miniruby(rb_fiber_start+0x1da) [0x55eae9e1e24a]
./miniruby(fiber_entry+0x0) [0x55eae9e1e550]

```

I've attached the rest of the VM dump.  `make runruby` gives a nearly identical dump.  I can post a core dump or `rr` recording, if needed.
_
I'm sorry I didn't simplify the script more; small, seemingly irrelevant changes can change the failure or allow it to pass.  Sometimes it raises a bizarre exception instead of SEGV, most commonly a NoMethodError which seemingly indicates that the local vars have been shifted or scrambled.  For example, this particular SEGV was caused by a guard clause checking that `unblock(blocker, fiber)` was given a Fiber object.  Here, that object is invalid, but I've seen it be a string or some other object from elsewhere in the process.

For comparison, this is what the script output should look like:
```
========================================================================
fiber_queue
completed in 0.00031569297425448895
========================================================================
fiber_sized_queue
completed in 0.1176840600091964
========================================================================
fiber_sized_queue2
completed in 0.19209402799606323
========================================================================
fiber_sized_queue3
completed in 0.21404067997355014
========================================================================
fiber_sized_queue4
completed in 0.30277197097893804
```

I was attempting to create some simple benchmarks for `Queue` and `SizedQueue` with fibers, to mimic `benchmark/vm_thread_*queue*.rb`.  I never completed the benchmarks because of this SEGV.  :)

---Files--------------------------------
test.rb (5.6 KB)
segv-master-5ce0be022f.txt (11.8 KB)
0001-Mark-Fibers-in-Mutex-Queue-SizedQueue-wait-lists.patch (1.66 KB)
0001-Mark-blocked-fibers-in-waitq-Mutex-Queue-etc.patch (5.9 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-20 22:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-05 21:39 [ruby-core:108774] [Ruby master Bug#18818] SEGV (Fiber scheduler?) nevans (Nicholas Evans)
2022-06-06  0:47 ` [ruby-core:108776] " mame (Yusuke Endoh)
2022-06-06  5:17 ` [ruby-core:108781] " ioquatix (Samuel Williams)
2022-06-06  5:18 ` [ruby-core:108782] " ioquatix (Samuel Williams)
2022-06-06 16:40 ` [ruby-core:108784] " nevans (Nicholas Evans)
2022-06-06 18:31 ` [ruby-core:108788] " nevans (Nicholas Evans)
2022-07-09 22:44 ` [ruby-core:109174] " nevans (Nicholas Evans)
2022-09-20 22:51 ` [ruby-core:109968] " ioquatix (Samuel Williams)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).