ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "kjtsanaktsidis (KJ Tsanaktsidis)" <noreply@ruby-lang.org>
Subject: [ruby-core:116212] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO
Date: Mon, 15 Jan 2024 05:25:27 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-106232.20240115052527.17@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-20169.20240109103749.17@ruby-lang.org

Issue #20169 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


I spent some more time today studying this problem. This is what I've learned so far. I'm definitely far from an expert in the mechanics of how GC's work, so please jump in and correct me if you are!

1. Obviously, as @luke-gru suggested, we can fix this particular manifestation of the problem by passing a stack buffer to the system calls in `io.c` and copying that to/from the Ruby heap in userspace. However, the really concerning part of this bug is what it means for the C extension API.

2. From my research, it seems that read barriers (either `SIGSEGV`/`userfaultfd(2)` based barriers like Ruby's, or explicit barriers around every pointer read emitted by a JIT compiler) are mostly used to support _incremental_ compaction. The relationship between read barriers and incremental compaction is somewhat analogous to the relationship between write barriers (like Ruby's `RB_OBJ_WRITE` macro) and incremental marking. Ruby does not actually do incremental compaction at the moment (although presumably we would like to add that someday).

3. The reason _why_ we need read barriers for incrmental compaction is so that user code can see a consistent view of the world even if the heap is half-compacted. I think there are two specific things read barriers need to do in incremental compaction. Imagine a read barrier macro as `VALUE RB_OBJ_READ(VALUE owning_object, VALUE *object_to_read)`. It needs to, firstly, detect if `object_to_read` is `T_MOVED`, and update its reference in `owning_object` if so, to point to its new location. But, secondly, it also needs to set the pin bit on `*object_to_read`, so that it can't be compacted elsewhere in a subsequent step of GC compaction - once `RB_OBJ_READ` has returned, the value of `object_to_read` is on the machine stack, so it needs to be pinned just as if it was live on the machine stack during the marking phase.

4. Although Ruby doesn't have incremental GC compaction (yet), it _does_ have a mechanism for user code to run in between GC compaction steps. If a page is swept during compaction because of a call to `gc_sweep_page` [here](https://github.com/ruby/ruby/blob/9c3299896e7ea0aa1d9cc0d4786d7be2908ca9be/gc.c#L8644), it might free some objects on that page. If a `T_DATA` is freed, and it has the `RUBY_TYPED_FREE_IMMEDIATELY` flag, then we will call its free function right there. At this point, we have not yet called the `dcompact` function on the object to fix up its references to other objects; so, it might try and read VALUEs that are now `T_MOVED`. We need read barriers in place to detect this, just as much as if we had incremental marking.

5. Rather than explicit read barriers based on `RB_OBJ_READ`, which react to already-happened moves and prevent future moves, Ruby is using `mprotect(2)` wizadry to trap access to pages with `T_MOVED` objects and invalidate moves after-the-fact. I actually can't find a discussion comparing the two anywhere, but presumably the main reason for wanting to do this automagically with page protection is that it doesn't require any changes to how C extensions work.

6. The specific problem in this issue in `io.c` involves access to a pointer to the Ruby heap from inside a thread which doesn't hold the GVL. However, I believe it would be entirely possible to construct a similar problem without any threading at all, because of point 4. Imagine you had a `T_DATA`, which held a reference to an integer file descriptor, a string VALUE, and had the `RUBY_TYPED_FREE_IMMEDIATELY` flag. In its `dfree` function, imagine it called `write(self->fd, RSTRING_PTR(self->str), RSTRING_LEN(self->str))`. As far as I can tell, this is perfectly legal according to our C API rules today; the thread holds the GVL, so use of the `RSTRING_` macros is OK, and it does not allocate anything, which makes it a legal `dfree` function. However, if `str` had been moved, in a prior compaction step, then the page containing `self->str` will be locked, and so the `write` call will return `EFAULT` instead of triggering the read barrier.

7. I think it's currently illegal to access pointers to objects embedded in the Ruby heap without the GVL, and the accesses in `io.c` need to be changed to either un-embed the string or copy it through the C stack. There is no possible way we can make this safe in the presence of compaction without fundamentally rearchitecting the GC to be concurrent. However, we still need to do something about the perfectly legal non-concurrent code in point 6.

8. So I think we also need to do one or more of the following:
  a) use an API like userfaultfd, when it's available, which allows kernel-mode accesses to be trapped just like user-mode ones. However, this is obviously only viable for Linux (and not even all the time then - the feature must be enabled in `/proc/sys/vm/unprivileged_userfaultfd`).
  b) overwrite all of the common libc symbols used to perform syscalls with userspace buffers, and provide wrapped versions that detect EFAULT return values, invoke the page-moved logic, and retry. 
  c) declare that it is illegal to access Ruby heap memory in kernel mode. This is kind of impossible to detect, however, without also doing something like b), so it'll just lead to incredibly rare failures like the one which spawned this issue throughout the ecosystem.
  d) Implement explicit read barriers with a `RB_OBJ_READ` macro and a `RB_TYPED_RB_PROT` flag analogous to how we implement write barriers, and delete our SIGSEGV based automatic read barriers. This would be an incredibly disruptive ecosystem change (I think we'd probably need to disable compaction for any object marked by an object that didn't use read barriers). But on the plus side, it makes explicit what the requirements are for C extensions, allows the barriers to work at VALUE granularity rather than page granularity, and perhaps brings us closer to incremental compaction.


----------------------------------------
Bug #20169: `GC.compact` can raises `EFAULT` on IO
https://bugs.ruby-lang.org/issues/20169#change-106232

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
1. `GC.compact` introduces read barriers to detect read accesses to the pages.
2. I/O operations release GVL to pass the control while their execution, and another thread can call `GC.compact` (or auto compact feature I guess, but not checked yet).
3. Call `write(ptr)` can return `EFAULT` when `GC.compact` is running because `ptr` can point read-barrier protected pages (embed strings).

Reproducible steps:


Apply the following patch to increase possibility:

```patch
diff --git a/io.c b/io.c
index f6cd2c1a56..83d67ba2dc 100644
--- a/io.c
+++ b/io.c
@@ -1212,8 +1212,12 @@ internal_write_func(void *ptr)
         }
     }

+    int cnt = 0;
   retry:
-    do_write_retry(write(iis->fd, iis->buf, iis->capa));
+    for (; cnt < 1000; cnt++) {
+        do_write_retry(write(iis->fd, iis->buf, iis->capa));
+        if (result <= 0) break;
+    }

     if (result < 0 && !iis->nonblock) {
         int e = errno;
```

Run the following code:

```ruby
t1 = Thread.new{ 10_000.times.map{"#{_1}"}; GC.compact while true }
t2 = Thread.new{
  i=0
  $stdout.write "<#{i+=1}>" while true
}
t2.join
```

and 

```
$ make run
(snip)
4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4><4>#<Thread:0x00007fa61b4dd758 ../../src/trunk/test.rb:3 run> terminated with exception (report_on_exception is true):
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
../../src/trunk/test.rb:5:in `write': Bad address @ io_write - <STDOUT> (Errno::EFAULT)
        from ../../src/trunk/test.rb:5:in `block in <main>'
make: *** [uncommon.mk:1383: run] Error 1
```

I think this is why we get `EFAULT` on CI. To increase possibilities running many busy processes (`ruby -e 'loop{}'` for example) will help (and on CI environment there are such busy processes accidentally).



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

  parent reply	other threads:[~2024-01-15  5:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-09 10:37 [ruby-core:116114] [Ruby master Bug#20169] `GC.compact` can raises `EFAULT` on IO ko1 (Koichi Sasada) via ruby-core
2024-01-09 11:05 ` [ruby-core:116115] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-10 23:39 ` [ruby-core:116164] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-11  3:28 ` [ruby-core:116167] " luke-gru (Luke Gruber) via ruby-core
2024-01-13  2:58 ` [ruby-core:116188] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-01-15  5:25 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core [this message]
2024-01-16  9:34 ` [ruby-core:116224] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-02-02 17:03 ` [ruby-core:116560] " peterzhu2118 (Peter Zhu) via ruby-core
2024-02-03  1:37 ` [ruby-core:116562] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-02-04 22:14 ` [ruby-core:116577] " peterzhu2118 (Peter Zhu) via ruby-core
2024-02-04 23:41 ` [ruby-core:116578] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
2024-05-28 21:24 ` [ruby-core:118049] " k0kubun (Takashi Kokubun) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-106232.20240115052527.17@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).