ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:90193] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
@ 2018-11-30 20:22 ` alanwucanada
  2018-11-30 22:47 ` [ruby-core:90196] " aselder
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: alanwucanada @ 2018-11-30 20:22 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been reported by alanwu (Alan Wu).

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90196] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
  2018-11-30 20:22 ` [ruby-core:90193] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber alanwucanada
@ 2018-11-30 22:47 ` aselder
  2018-12-01  3:21 ` [ruby-core:90199] " samuel
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: aselder @ 2018-11-30 22:47 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by aselder (Andrew Selder).


Alan,

You're amazing... We're completely blocked on upgrading by this bug. You're a savior

Thanks,

Andrew


----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75318

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90199] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
  2018-11-30 20:22 ` [ruby-core:90193] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber alanwucanada
  2018-11-30 22:47 ` [ruby-core:90196] " aselder
@ 2018-12-01  3:21 ` samuel
  2018-12-01  3:51 ` [ruby-core:90200] " samuel
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: samuel @ 2018-12-01  3:21 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).

Assignee set to ioquatix (Samuel Williams)
Target version set to 2.6
Backport changed from 2.4: UNKNOWN, 2.5: UNKNOWN to 2.5: REQUIRED

I will take look at it.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75321

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90200] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-12-01  3:21 ` [ruby-core:90199] " samuel
@ 2018-12-01  3:51 ` samuel
  2018-12-01  4:57 ` [ruby-core:90201] " samuel
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: samuel @ 2018-12-01  3:51 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).


I looked at the patch it seemed okay.

I committed it to trunk r66111

Once we've confirmed it solves the issue, I will backport 2.5 branch, hopefully it can be included 2.5.4 or next patch release.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75322

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90201] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-12-01  3:51 ` [ruby-core:90200] " samuel
@ 2018-12-01  4:57 ` samuel
  2018-12-01  5:07   ` [ruby-core:90203] " Eric Wong
  2018-12-01  5:01 ` [ruby-core:90202] " samuel
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 15+ messages in thread
From: samuel @ 2018-12-01  4:57 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).


After I merged this patch, https://travis-ci.org/ruby/ruby/jobs/462071967 fails. I am testing locally in more detail.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75323

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90202] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-12-01  4:57 ` [ruby-core:90201] " samuel
@ 2018-12-01  5:01 ` samuel
  2018-12-01  5:26 ` [ruby-core:90204] " samuel
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: samuel @ 2018-12-01  5:01 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).


Okay, it was random failure. That concerns me too. But, it passed now. I'll merge to ruby_2_5 too.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75324

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90203] Re: [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
  2018-12-01  4:57 ` [ruby-core:90201] " samuel
@ 2018-12-01  5:07   ` Eric Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2018-12-01  5:07 UTC (permalink / raw)
  To: ruby-core

samuel@oriontransfer.net wrote:
> After I merged this patch,
> https://travis-ci.org/ruby/ruby/jobs/462071967 fails. I am
> testing locally in more detail.

Alan + Sam: thanks for investigating this.

Perhaps an alternative fix would be to avoid calling
rb_gc_mark_machine_stack() if the owner thread is dead.

I couldn't reproduce the original problem on FreeBSD 11.2
with both FIBER_USE_NATIVE and FIBER_USE_COROUTINE disabled;
so I can't verify my hypothesis
(Didn't bother testing on Linux)

> https://bugs.ruby-lang.org/issues/15362#change-75323

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90204] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-12-01  5:01 ` [ruby-core:90202] " samuel
@ 2018-12-01  5:26 ` samuel
  2018-12-01  6:05 ` [ruby-core:90205] " samuel
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: samuel @ 2018-12-01  5:26 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).


@normalperson I am open to ideas. I don't know a lot about how thread and GC interact within Ruby, unfortunately it's not an area I spent much time on. So, if you have better proposal, I suggest you implement it. This change included a spec, so I was happy to merge it even thought I didn't fully understand what is going wrong. Confirmed green build matrix so I am happy to close. If you want to make a change, let me know, otherwise I will close this and it would be backported to 2.5.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75326

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90205] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2018-12-01  5:26 ` [ruby-core:90204] " samuel
@ 2018-12-01  6:05 ` samuel
  2018-12-01  6:42   ` [ruby-core:90206] " Eric Wong
  2018-12-01  6:50 ` [ruby-core:90207] [Ruby trunk Bug#15362][Closed] " samuel
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 15+ messages in thread
From: samuel @ 2018-12-01  6:05 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).


@normalperson I thought about it a bit more, perhaps both ideas should be implemented for maximum protection. We could add some notes in the source code too about the issue/invariants.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75327

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90206] Re: [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
  2018-12-01  6:05 ` [ruby-core:90205] " samuel
@ 2018-12-01  6:42   ` Eric Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2018-12-01  6:42 UTC (permalink / raw)
  To: ruby-core

samuel@oriontransfer.net wrote:
> @normalperson I thought about it a bit more, perhaps both
> ideas should be implemented for maximum protection. We could
> add some notes in the source code too about the
> issue/invariants.

NAK.  I'm against belt-and-suspenders fixes because it favors
lack-of-understanding and makes Ruby slower with more overhead.

Looking at Alan's original fix (which you committed), it seems
to make sense.  Anyways, my hypothesis is here:

  https://80x24.org/spew/20181201063852.30438-1-e@80x24.org/raw

It may allow Alan's simple one-line fix to be reverted.
However, I prefer Alan's one-liner fix since it's smaller
with instructions with no extra branches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90207] [Ruby trunk Bug#15362][Closed] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2018-12-01  6:05 ` [ruby-core:90205] " samuel
@ 2018-12-01  6:50 ` samuel
  2018-12-01 17:16 ` [ruby-core:90216] [Ruby trunk Bug#15362] " alanwucanada
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: samuel @ 2018-12-01  6:50 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by ioquatix (Samuel Williams).

Status changed from Open to Closed

Okay, I think we are in agreement then.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75329

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90216] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2018-12-01  6:50 ` [ruby-core:90207] [Ruby trunk Bug#15362][Closed] " samuel
@ 2018-12-01 17:16 ` alanwucanada
  2018-12-14 21:32 ` [ruby-core:90534] " dazuma
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: alanwucanada @ 2018-12-01 17:16 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by alanwu (Alan Wu).


A comment for future maintainers:
Since `&fib->cont.saved_ec == ruby_current_execution_context_ptr` in the place my patch makes the change, if GC starts below where I set `stack_end` to NULL, the stack doesn't get marked.
This is not a problem right now since there is no call to any function that could trigger GC (correct me if I'm wrong), but future changes should be careful to not introduce such calls.
I wish there was some way to codify this constraint and have the compiler statically check this for us.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75338

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90534] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2018-12-01 17:16 ` [ruby-core:90216] [Ruby trunk Bug#15362] " alanwucanada
@ 2018-12-14 21:32 ` dazuma
  2018-12-31 19:30 ` [ruby-core:90840] " aselder
  2019-01-10 14:18 ` [ruby-core:90998] " nagachika00
  12 siblings, 0 replies; 15+ messages in thread
From: dazuma @ 2018-12-14 21:32 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by dazuma (Daniel Azuma).


Did this get backported to ruby_2_5? I don't see a corresponding commit in the github mirror https://github.com/ruby/ruby/commits/ruby_2_5

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-75687

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90840] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2018-12-14 21:32 ` [ruby-core:90534] " dazuma
@ 2018-12-31 19:30 ` aselder
  2019-01-10 14:18 ` [ruby-core:90998] " nagachika00
  12 siblings, 0 replies; 15+ messages in thread
From: aselder @ 2018-12-31 19:30 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by aselder (Andrew Selder).


This still looks like it's waiting on a backport to Ruby 2.5. Also, does anyone know when the next release of the Ruby 2.5 branch will be done?

Thanks!

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-76025

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: REQUIRED
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [ruby-core:90998] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber
       [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
                   ` (11 preceding siblings ...)
  2018-12-31 19:30 ` [ruby-core:90840] " aselder
@ 2019-01-10 14:18 ` nagachika00
  12 siblings, 0 replies; 15+ messages in thread
From: nagachika00 @ 2019-01-10 14:18 UTC (permalink / raw)
  To: ruby-core

Issue #15362 has been updated by nagachika (Tomoyuki Chikanaga).

Backport changed from 2.5: REQUIRED to 2.5: DONE

ruby_2_5 r66777 merged revision(s) 66111.

----------------------------------------
Bug #15362: [PATCH] Avoid GCing dead stack after switching away from a fiber
https://bugs.ruby-lang.org/issues/15362#change-76218

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
* ruby -v: 
* Backport: 2.5: DONE
----------------------------------------
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but
it affects the default build configuration for MacOS and is causing segfaults on 2.5.x.
I've put the test for this in a separate patch because I'm not sure if we want to have
a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.  
I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs.

Please let me know if anything in my understanding is wrong. I've pasted my commit message below.

----

Fibers save execution contextes, and execution contexts include a native
stack pointer. It may happen that a Fiber outlive the native thread
it executed on. Consider the following code adapted from Bug #14561:

```ruby
enum = Enumerator.new { |y| y << 1 }
thread = Thread.new { enum.peek }  # fiber constructed inside the
                                   # block and saved inside `enum`
thread.join
sleep 5      # thread finishes and thread cache wait time runs out.
             # Native thread exits, possibly freeing its stack.
GC.start     # segfault because GC tires to mark the dangling stack pointer
             # inside `enum`'s fiber

```

The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE,
as those implementations already do what this commit does.
Generally on Linux systems, FIBER_USE_NATIVE is 1 even when
one uses `./configure --disable-fiber-coroutine`, since most
Linux systems have getcontext() and setcontext() which
turns on FIBER_USE_NATIVE. (compile with `make
DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it)

Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE
are off, and the GC reads from the stack of a dead native
thread, MRI does not segfault on Linux. This is probably due to
libpthread not marking the page where the dead stack lives as
unreadable. Nevertheless, this use-after-free is visible through
Valgrind.

On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE.
Thread cache is also unavailable for 2.5.x, triggering this issue
more often. (thread cache gives this bug a grace period since
it makes native threads wait a little before exiting)

This issue is very visible on MacOS on 2.5.x since libpthread marks
the dead stack as unreadable, consistently turning this use-after-free
into a segfault.

Fixes Bug #14561

 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a
           fiber to keep the GC marking it. `saved_ec` gets rehydrated with a
           stack pointer if/when the fiber runs again.


---Files--------------------------------
0001-Avoid-GCing-dead-stack-after-switching-away-from-a-f.patch (2.63 KB)
0001-Add-a-test-for-Bug-14561.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-10 14:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-15362.20181130202250@ruby-lang.org>
2018-11-30 20:22 ` [ruby-core:90193] [Ruby trunk Bug#15362] [PATCH] Avoid GCing dead stack after switching away from a fiber alanwucanada
2018-11-30 22:47 ` [ruby-core:90196] " aselder
2018-12-01  3:21 ` [ruby-core:90199] " samuel
2018-12-01  3:51 ` [ruby-core:90200] " samuel
2018-12-01  4:57 ` [ruby-core:90201] " samuel
2018-12-01  5:07   ` [ruby-core:90203] " Eric Wong
2018-12-01  5:01 ` [ruby-core:90202] " samuel
2018-12-01  5:26 ` [ruby-core:90204] " samuel
2018-12-01  6:05 ` [ruby-core:90205] " samuel
2018-12-01  6:42   ` [ruby-core:90206] " Eric Wong
2018-12-01  6:50 ` [ruby-core:90207] [Ruby trunk Bug#15362][Closed] " samuel
2018-12-01 17:16 ` [ruby-core:90216] [Ruby trunk Bug#15362] " alanwucanada
2018-12-14 21:32 ` [ruby-core:90534] " dazuma
2018-12-31 19:30 ` [ruby-core:90840] " aselder
2019-01-10 14:18 ` [ruby-core:90998] " nagachika00

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).