ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
@ 2024-01-09 10:37 ko1 (Koichi Sasada) via ruby-core
  2024-01-09 11:05 ` [ruby-core:116115] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2024-01-09 10:37 UTC (permalink / raw
  To: ruby-core; +Cc: ko1 (Koichi Sasada)

Issue #20169 has been reported by ko1 (Koichi Sasada).

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` points read-barrier protected pages.

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116115] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
@ 2024-01-09 11:05 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-01-10 23:39 ` [ruby-core:116164] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-01-09 11:05 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


I need to think about this a bit more, but I wonder if we can fix this on Linux at least by using `userfaultfd` instead of registering SIGBUS/SIGSEGV handlers…

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106114

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116164] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
  2024-01-09 11:05 ` [ruby-core:116115] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-01-10 23:39 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-01-11  3:28 ` [ruby-core:116167] " luke-gru (Luke Gruber) via ruby-core
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-01-10 23:39 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


I did a bit of experimentation with the `userfaultfd(2)` system call here: https://gist.github.com/KJTsanaktsidis/40e2a8e23012bf16af823db9ff9a890e

The SIGBUS/SIGSEGV handling we currently do only traps userspace accesses to memory, but it seems with userfaultfd it's possible to trap kernel access to memory too - my example above works both when accessing `page` directly, or when `write(2)`'ing it into a memfd.

Instead of read-protecting pages with `mprotect(2)` and then handling the trap, we can do the following:

* Register pages with userfaultfd when we allocate them
* When we would readprotect a page, instead, remap it somewhere else with `mremap(2)` `MREMAP_MAYMOVE`, and leave a faulting region behind with `MREMAP_DONTUNMAP`.
* When someone tries to read that page, we'll get the fault in the userfaultfd thread
* The userfaultfd thread can then remap the page back into its original position with `mremap(2)` `MAP_FIXED` and re-attempt the faulting access.

This pattern is actually documented in the Linux manpage for `mremap(2)`: https://man7.org/linux/man-pages/man2/mremap.2.html

> Garbage collection: `MREMAP_DONTUNMAP` can be used in conjunction with `userfaultfd(2)` to implement garbage collection algorithms (e.g., in a Java virtual machine). Such an implementation can be cheaper (and simpler) than conventional garbage collection techniques that involve marking pages with protection `PROT_NONE` in conjunction with the use of a `SIGSEGV` handler to catch accesses to those pages.

So that's the good part. The bad parts are:

* This will, of course, only work on Linux.
* It will only work on Linux kernels >= 5.7
* It requires that the calling process either have `CAP_SYS_PTRACE` or that `/proc/sys/vm/unprivileged_userfaultfd` be set to 1. It seems common  distributions default this to 0 :(

So... if we did want to go down the userfaultfd handling path, we would need to either:

* Only support GC compaction when the above conditions are met?
* Or, have both a userfaultfd implementation and the current signal-based implementation, and just accept that GC compaction can cause rare crashes on non-linux? (this sounds bad)


The only other more portable option I can think of is to define symbols for `read`, `write`, and other system calls that take userspace buffers which overwrite and wrap the libc versions, handle EFAULT return values, and invoke the "cancel GC compaction" logic if the faulting address is in the Ruby heap. I'm open to other bright ideas anybody might have...

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106169

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116167] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
  2024-01-09 11:05 ` [ruby-core:116115] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-01-10 23:39 ` [ruby-core:116164] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-01-11  3:28 ` luke-gru (Luke Gruber) via ruby-core
  2024-01-13  2:58 ` [ruby-core:116188] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: luke-gru (Luke Gruber) via ruby-core @ 2024-01-11  3:28 UTC (permalink / raw
  To: ruby-core; +Cc: luke-gru (Luke Gruber)

Issue #20169 has been updated by luke-gru (Luke Gruber).


Another idea is to unembed the string (undesirable) or copy the buffer (on the stack, preferably).

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106175

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116188] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (2 preceding siblings ...)
  2024-01-11  3:28 ` [ruby-core:116167] " luke-gru (Luke Gruber) via ruby-core
@ 2024-01-13  2:58 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-01-15  5:25 ` [ruby-core:116212] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-01-13  2:58 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


Well I did a bit more thinking about this.

Firstly, I had a very unproductive morning trying to see if Mach exceptions on MacOS could catch invalid accesses in system calls, the way that `userfaultfd` can on Linux. Short answer: no.

The second insight I had, though, is that if these objects are on the machine stack, then they actually should be pinned anyway. If you have a C extension and you take a pointer to some internal part of an object, it's already a requirement that you ensure the Ruby value gets spilled to the stack - i.e. you need to do something like this.

```
VALUE str = rb_sprintf("i am a cool string");
write(fd, RSTRING_PTR(str), RSTRING_LEN(str));
RB_GC_GUARD(str); // spill string to stack; NOT OPTIONAL!
```

Mabye we could change the GC compaction algorithm to not move _any_ objects on a page (and hence skip protecting the page) if any objects in the page are live on the machine stack? That _would_ substantially lessen the effectiveness of GC compaction I suppose, but we could maybe get that effectiveness back if userfaultfd is available?


----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106203

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116212] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (3 preceding siblings ...)
  2024-01-13  2:58 ` [ruby-core:116188] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-01-15  5:25 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-01-16  9:34 ` [ruby-core:116224] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-01-15  5:25 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


I spent some more time today studying this problem. This is what I've learned so far. I'm definitely far from an expert in the mechanics of how GC's work, so please jump in and correct me if you are!

1. Obviously, as @luke-gru suggested, we can fix this particular manifestation of the problem by passing a stack buffer to the system calls in `io.c` and copying that to/from the Ruby heap in userspace. However, the really concerning part of this bug is what it means for the C extension API.

2. From my research, it seems that read barriers (either `SIGSEGV`/`userfaultfd(2)` based barriers like Ruby's, or explicit barriers around every pointer read emitted by a JIT compiler) are mostly used to support _incremental_ compaction. The relationship between read barriers and incremental compaction is somewhat analogous to the relationship between write barriers (like Ruby's `RB_OBJ_WRITE` macro) and incremental marking. Ruby does not actually do incremental compaction at the moment (although presumably we would like to add that someday).

3. The reason _why_ we need read barriers for incrmental compaction is so that user code can see a consistent view of the world even if the heap is half-compacted. I think there are two specific things read barriers need to do in incremental compaction. Imagine a read barrier macro as `VALUE RB_OBJ_READ(VALUE owning_object, VALUE *object_to_read)`. It needs to, firstly, detect if `object_to_read` is `T_MOVED`, and update its reference in `owning_object` if so, to point to its new location. But, secondly, it also needs to set the pin bit on `*object_to_read`, so that it can't be compacted elsewhere in a subsequent step of GC compaction - once `RB_OBJ_READ` has returned, the value of `object_to_read` is on the machine stack, so it needs to be pinned just as if it was live on the machine stack during the marking phase.

4. Although Ruby doesn't have incremental GC compaction (yet), it _does_ have a mechanism for user code to run in between GC compaction steps. If a page is swept during compaction because of a call to `gc_sweep_page` [here](https://github.com/ruby/ruby/blob/9c3299896e7ea0aa1d9cc0d4786d7be2908ca9be/gc.c#L8644), it might free some objects on that page. If a `T_DATA` is freed, and it has the `RUBY_TYPED_FREE_IMMEDIATELY` flag, then we will call its free function right there. At this point, we have not yet called the `dcompact` function on the object to fix up its references to other objects; so, it might try and read VALUEs that are now `T_MOVED`. We need read barriers in place to detect this, just as much as if we had incremental marking.

5. Rather than explicit read barriers based on `RB_OBJ_READ`, which react to already-happened moves and prevent future moves, Ruby is using `mprotect(2)` wizadry to trap access to pages with `T_MOVED` objects and invalidate moves after-the-fact. I actually can't find a discussion comparing the two anywhere, but presumably the main reason for wanting to do this automagically with page protection is that it doesn't require any changes to how C extensions work.

6. The specific problem in this issue in `io.c` involves access to a pointer to the Ruby heap from inside a thread which doesn't hold the GVL. However, I believe it would be entirely possible to construct a similar problem without any threading at all, because of point 4. Imagine you had a `T_DATA`, which held a reference to an integer file descriptor, a string VALUE, and had the `RUBY_TYPED_FREE_IMMEDIATELY` flag. In its `dfree` function, imagine it called `write(self->fd, RSTRING_PTR(self->str), RSTRING_LEN(self->str))`. As far as I can tell, this is perfectly legal according to our C API rules today; the thread holds the GVL, so use of the `RSTRING_` macros is OK, and it does not allocate anything, which makes it a legal `dfree` function. However, if `str` had been moved, in a prior compaction step, then the page containing `self->str` will be locked, and so the `write` call will return `EFAULT` instead of triggering the read barrier.

7. I think it's currently illegal to access pointers to objects embedded in the Ruby heap without the GVL, and the accesses in `io.c` need to be changed to either un-embed the string or copy it through the C stack. There is no possible way we can make this safe in the presence of compaction without fundamentally rearchitecting the GC to be concurrent. However, we still need to do something about the perfectly legal non-concurrent code in point 6.

8. So I think we also need to do one or more of the following:
  a) use an API like userfaultfd, when it's available, which allows kernel-mode accesses to be trapped just like user-mode ones. However, this is obviously only viable for Linux (and not even all the time then - the feature must be enabled in `/proc/sys/vm/unprivileged_userfaultfd`).
  b) overwrite all of the common libc symbols used to perform syscalls with userspace buffers, and provide wrapped versions that detect EFAULT return values, invoke the page-moved logic, and retry. 
  c) declare that it is illegal to access Ruby heap memory in kernel mode. This is kind of impossible to detect, however, without also doing something like b), so it'll just lead to incredibly rare failures like the one which spawned this issue throughout the ecosystem.
  d) Implement explicit read barriers with a `RB_OBJ_READ` macro and a `RB_TYPED_RB_PROT` flag analogous to how we implement write barriers, and delete our SIGSEGV based automatic read barriers. This would be an incredibly disruptive ecosystem change (I think we'd probably need to disable compaction for any object marked by an object that didn't use read barriers). But on the plus side, it makes explicit what the requirements are for C extensions, allows the barriers to work at VALUE granularity rather than page granularity, and perhaps brings us closer to incremental compaction.


----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106232

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116224] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (4 preceding siblings ...)
  2024-01-15  5:25 ` [ruby-core:116212] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-01-16  9:34 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-02-02 17:03 ` [ruby-core:116560] " peterzhu2118 (Peter Zhu) via ruby-core
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-01-16  9:34 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


Thought: we could probably remove the need for the read barrier if we swept the heap, and _then_ compacted. So we wouldn't free any objects in between moving things and updating refs. It would mean scanning the first half of each heap twice (once to sweep it, and once again to find the holes to fill with compacted objects). But maybe `GC.compact` is not that performance sensitive since it’s mostly used before forking multiprocess web servers?

It would hurt perf for `GC.auto_compact`, but I don't really know if there's much to be done about it. I can't find any literature on conservative, incremental, moving GC's, which is what you'd want for `auto_compact`. Immix is conservative and moving, but it moves objects whilst stopping the world. Incremental or concurrent moving GC's like Shenandoah work by inserting compiler-provided read barriers around all accesses (which we can't do to C extensions) to fix up moved objects as they're loaded from the heap, and using compiler generated stackmaps (which we can't have for C extensions) to directly update existing moved references on the stack. 

CC @eightbitraptor since I'm pretty sure I've heard you talk about immix before, maybe you have thoughts?

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106244

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116560] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (5 preceding siblings ...)
  2024-01-16  9:34 ` [ruby-core:116224] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-02-02 17:03 ` peterzhu2118 (Peter Zhu) via ruby-core
  2024-02-03  1:37 ` [ruby-core:116562] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: peterzhu2118 (Peter Zhu) via ruby-core @ 2024-02-02 17:03 UTC (permalink / raw
  To: ruby-core; +Cc: peterzhu2118 (Peter Zhu)

Issue #20169 has been updated by peterzhu2118 (Peter Zhu).


I implemented a fix here: https://github.com/ruby/ruby/pull/9817

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106577

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116562] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (6 preceding siblings ...)
  2024-02-02 17:03 ` [ruby-core:116560] " peterzhu2118 (Peter Zhu) via ruby-core
@ 2024-02-03  1:37 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-02-04 22:14 ` [ruby-core:116577] " peterzhu2118 (Peter Zhu) via ruby-core
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-02-03  1:37 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


So I think we can go with that fix, but I want to try and spell out what it means for "the rules" for extensions if we accept it.

AFAICT, by merging https://github.com/ruby/ruby/pull/9817, we are saying the following:

1. It is illegal to dereference or store into any pointer to the Ruby heap in a no-GVL context. This implies:
   a) It is illegal to directly access any field in a `struct RBasic`, `struct RString`, etc,
   b) Thus it implies that things like `RSTRING_PTR(str)` are 100% illegal in a no-GVL context (this shouldn't be controversial)
   c) This rule _does_ allow you to manipulate a C structure/array/etc you received a pointer to from a `struct RThing` (an "external pointer"), provided
       i) you _know_ that this structure is stored in the C heap and not the Ruby heap,
      ii) you obtained this pointer out of the `struct RThing` whilst holding the GVL.
     iii) and you obey the restrictions spelled out in Rule 2 below.
2. You can dereference and read from an "external pointer" in a no-GVL context, provided that
  a) the VALUE from which you obtained the pointer is guaranteed to be GC live AND GC pinned from the beginning of your no-GVL context through to the last access of the pointer.
      i) In practice, normally extension developers will adhere to this rule by inserting an `RB_GC_GUARD` macro after a call to `rb_thread_call_without_gvl` to instruct the compiler to keep the VALUE live on the thread/fiber's C stack and thus visible to Ruby's GC.
     ii) However, there are other ways to meet this requirement too (e.g. adding the object directly as a GC root with `rb_global_variable` or such).
    iii) The reason why it's important for the object to be GC pinned is that if the object is moved, the object's `dcompact` function is called, and this function is allowed to mutate the "external pointer".
  b) No Ruby thread can be using this object, either in C or in Ruby. If an object's "external pointer"s are being used in a no-GVL context, then the object as a whole cannot be used in a GVL context.
  c) the VALUE has not at any point in its lifetime been made visible to Ruby code, IF the value is not frozen.
      i) This obviously means it e.g. cannot have been yielded back to Ruby with `rb_yield`, or previously returned, etc.
     ii) It _also_ means that the object must have been hidden with a call to `rb_gc_hide` _immediately after_ being allocated by the GC. There cannot be any call to Ruby, nor call to the GC, in between where the program allocates the VALUE and hides it. In practice, APIs like the `rb_str_tmp_*` family will return pre-hidden objects which meet these requirements. This ensures that a mutable value being accessed from C was hidden from calls like `ObjectSpace.each_object`.
    iii) However, if an object is frozen, then this requirement does not apply - it _is_ legal to read an "external pointer" in a no-GVL context if the object it was obtained from is frozen, even if that object is visible to Ruby.
     iv) A corollary of iii is that an extension which exposes "external pointers" to C-heap allocated data owned by its T_DATA object cannot mutate this data if the underlying `T_DATA` is frozen. It's fairly uncommon for extensions to actually expose C-level APIs like this, but it is possible by e.g. exporting symbols with `__attribute__((visibility(default)))`.
3. You can write to an "external pointer" in a no-GVL context, provided that you adhere to rule 2, AND that the object is not frozen.
4. When reading from and writing to "external pointers", extension developers are responsible for appropriately synchronizing their reads and writes to this pointer (in practice, most usage of such "external pointers" will be restricted to a single thread, however).


These rules are pretty complicated, but I think this is what's implied by the patch. In particular, i think 2.c.iii is implied by

```
if (OBJ_FROZEN_RAW(orig) && !STR_EMBED_P(orig) && !rb_str_reembeddable_p(orig)) return orig;
```

but was kind of surprising to me. Possibly, although these are the rules that seem to apply _within_ cruby, we want to document a more restrictive set of rules for extension developers which are easier to reason about (perhaps even as far as "don't store results of no-GVL routines into Ruby strings at all in your extension, you must bounce everything through native memory and copy it back into Ruby with the GVL").  

Whatever these rules are, I think we need a clearer description of them in extension.rdoc. If they're different from the internal rules, let's document the "real" messy internal rules in the source tree as well. I'm happy to open a PR to do this once we get some agreement on the broad outlines.

As an aside, I'm researching whether there might be ways to automatically enforce some of this in CI (both for CRuby and for extension developers) by leveraging LLVM's sanitizer infrastructure somehow - I think it would be really helpful to have an executable spec of what the rules are like this!
 

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106579

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116577] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (7 preceding siblings ...)
  2024-02-03  1:37 ` [ruby-core:116562] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-02-04 22:14 ` peterzhu2118 (Peter Zhu) via ruby-core
  2024-02-04 23:41 ` [ruby-core:116578] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-05-28 21:24 ` [ruby-core:118049] " k0kubun (Takashi Kokubun) via ruby-core
  10 siblings, 0 replies; 12+ messages in thread
From: peterzhu2118 (Peter Zhu) via ruby-core @ 2024-02-04 22:14 UTC (permalink / raw
  To: ruby-core; +Cc: peterzhu2118 (Peter Zhu)

Issue #20169 has been updated by peterzhu2118 (Peter Zhu).


> I want to try and spell out what it means for "the rules" for extensions if we accept it.

I don't think this implies any new rules. IMO it's undefined behaviour to use a Ruby object in a non-GVL scenario, this just fixes the issue inside of Ruby using a hack. Extension developers should not be relying on anything implied by this patch.

> It is illegal to directly access any field in a struct RBasic, struct RString, etc

It's already illegal (or undefined behaviour) to do this.


> Possibly, although these are the rules that seem to apply within cruby

They're not exactly rules in CRuby. The implementation inside of CRuby is tightly coupled, so changing the implementation of one thing can lead to many other things needing their implementation changed.


----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106592

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:116578] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (8 preceding siblings ...)
  2024-02-04 22:14 ` [ruby-core:116577] " peterzhu2118 (Peter Zhu) via ruby-core
@ 2024-02-04 23:41 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
  2024-05-28 21:24 ` [ruby-core:118049] " k0kubun (Takashi Kokubun) via ruby-core
  10 siblings, 0 replies; 12+ messages in thread
From: kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core @ 2024-02-04 23:41 UTC (permalink / raw
  To: ruby-core; +Cc: kjtsanaktsidis (KJ Tsanaktsidis)

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


> IMO it's undefined behaviour to use a Ruby object in a non-GVL scenario, this just fixes the issue inside of Ruby using a hack. 

In this case, shouldn’t we fix the original issuing by copying the write through a C-allocated buffer? But…

> They're not exactly rules in CRuby

I guess what you’re saying is “CRuby itself can do what it wants as long as it’s internally consistent”. Which makes sense. 

I guess it would be good to find a way to define and enforce what “internally consistent” means, but perhaps there isn’t much value in strictly defining it in the absence of a way to strictly enforce it. Things like VM_CHECK_MODE help, but couldn’t catch something like this. If I actually come up with an actionable idea, I’ll open a new proposal for that.

———

Anyway, fixing this bug with your patch sounds good to me. I definitely am not trying to hold up real fixes just to have academic arguments about memory models.

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106593

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [ruby-core:118049] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
  2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
                   ` (9 preceding siblings ...)
  2024-02-04 23:41 ` [ruby-core:116578] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
@ 2024-05-28 21:24 ` k0kubun (Takashi Kokubun) via ruby-core
  10 siblings, 0 replies; 12+ messages in thread
From: k0kubun (Takashi Kokubun) via ruby-core @ 2024-05-28 21:24 UTC (permalink / raw
  To: ruby-core; +Cc: k0kubun (Takashi Kokubun)

Issue #20169 has been updated by k0kubun (Takashi Kokubun).

Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: DONE

ruby_3_3 commit:b77b5c191513f5f281e72a51e6b2de29e2d2d7a6 merged revision(s) 5e0c17145131e073814c7e5b15227d0b4e73cabe.

----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-108472

* Author: ko1 (Koichi Sasada)
* Status: Closed
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: DONE
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-05-28 21:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
2024-01-09 11:05 ` [ruby-core:116115] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-10 23:39 ` [ruby-core:116164] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-11  3:28 ` [ruby-core:116167] " luke-gru (Luke Gruber) via ruby-core
2024-01-13  2:58 ` [ruby-core:116188] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-15  5:25 ` [ruby-core:116212] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-16  9:34 ` [ruby-core:116224] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-02-02 17:03 ` [ruby-core:116560] " peterzhu2118 (Peter Zhu) via ruby-core
2024-02-03  1:37 ` [ruby-core:116562] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-02-04 22:14 ` [ruby-core:116577] " peterzhu2118 (Peter Zhu) via ruby-core
2024-02-04 23:41 ` [ruby-core:116578] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-05-28 21:24 ` [ruby-core:118049] " k0kubun (Takashi Kokubun) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).