ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:89848] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
@ 2018-11-18  4:49 ` normalperson
  2018-11-18  6:49   ` [ruby-core:89851] " Eric Wong
  2018-11-18  8:09 ` [ruby-core:89852] " takashikkbn
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: normalperson @ 2018-11-18  4:49 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been reported by normalperson (Eric Wong).

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89851] Re: [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
  2018-11-18  4:49 ` [ruby-core:89848] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe normalperson
@ 2018-11-18  6:49   ` Eric Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2018-11-18  6:49 UTC (permalink / raw)
  To: ruby-core

> https://bugs.ruby-lang.org/issues/15316

> So it looks like we need a THREAD-SAFE (not necessarily async-safe)
> version of rb_postponed_job_register.  Or a one-off API for MJIT...
> job registration for MJIT.

I think I will introduce a new "workqueue" API dedicated to
thread-safety (but not async-signal-safety).  We can use the
same interrupt flag, but it will use a separate data structure.

My goal is also to support MJIT in forked processes,
because (AFAIK) some production Ruby instances use fork
with long-running processes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89852] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
  2018-11-18  4:49 ` [ruby-core:89848] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe normalperson
@ 2018-11-18  8:09 ` takashikkbn
  2018-11-18  8:19   ` [ruby-core:89853] " Eric Wong
  2018-11-18  8:40 ` [ruby-core:89854] " takashikkbn
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: takashikkbn @ 2018-11-18  8:09 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been updated by k0kubun (Takashi Kokubun).


Thanks to report that.

> My goal is also to support MJIT in forked processes

Oh by the way, I have a patch which was created yesterday but I was too busy to commit that yet. (I'll commit that shortly.)

I think thread-safety of postponed_job and supporting MJIT in forked Ruby process are independent issues, but maybe your "workqueue" will be helpful for forked child somehow?

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316#change-74910

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89853] Re: [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
  2018-11-18  8:09 ` [ruby-core:89852] " takashikkbn
@ 2018-11-18  8:19   ` Eric Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2018-11-18  8:19 UTC (permalink / raw)
  To: ruby-core

takashikkbn@gmail.com wrote:
> Oh by the way, I have a patch which was created yesterday but
> I was too busy to commit that yet. (I'll commit that shortly.)

OK, please do.

I am also testing a minor cleanup:
https://80x24.org/spew/20181118081847.16293-1-e@80x24.org/raw
(but will probably be afk next 12 hours or so)

> I think thread-safety of postponed_job and supporting MJIT in
> forked Ruby process are independent issues, but maybe your
> "workqueue" will be helpful for forked child somehow?

Somewhat related.  We need to ensure a consistent state of the
queue (either postponed_job or workqueue).

Right now, we cannot guarantee a consistent state after forking;
either.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89854] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
  2018-11-18  4:49 ` [ruby-core:89848] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe normalperson
  2018-11-18  8:09 ` [ruby-core:89852] " takashikkbn
@ 2018-11-18  8:40 ` takashikkbn
  2018-11-19  0:07 ` [ruby-core:89858] " takashikkbn
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: takashikkbn @ 2018-11-18  8:40 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been updated by k0kubun (Takashi Kokubun).


> OK, please do.

Done, in r65785. Comments from your point of view would be much appreciated.

> I am also testing a minor cleanup

looks fine.

> Somewhat related. We need to ensure a consistent state of the queue (either postponed_job or workqueue).

In a sense that postponed_job could be in an unexpected state by creating a child Ruby process which suddenly lost MJIT worker thread modifying postponed_job queue, agreed.

> Right now, we cannot guarantee a consistent state after forking; either.

Yeah, I found that issue when I was creating r65785. In r65785, I added `mjit_pause` before fork and `mjit_resume`s after fork (in different places depending on whether it's child or parent). As long as MJIT worker thread is absent when Ruby is forked, I think the situation became a little better.

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316#change-74912

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89858] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-11-18  8:40 ` [ruby-core:89854] " takashikkbn
@ 2018-11-19  0:07 ` takashikkbn
  2018-11-19  0:22   ` [ruby-core:89859] " Eric Wong
  2018-11-19  0:52 ` [ruby-core:89860] " takashikkbn
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: takashikkbn @ 2018-11-19  0:07 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been updated by k0kubun (Takashi Kokubun).


Well, I seem to have introduced at least two deadlocks by that. I already fixed the first one, and I'm going to fix the second one http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1467162.

I'll take a deeper look when I come home today, but do you know who wakes up rb_sigwait_sleep and/or how the hang there could happen?

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316#change-74917

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89859] Re: [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
  2018-11-19  0:07 ` [ruby-core:89858] " takashikkbn
@ 2018-11-19  0:22   ` Eric Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2018-11-19  0:22 UTC (permalink / raw)
  To: ruby-core

takashikkbn@gmail.com wrote:
> Well, I seem to have introduced at least two deadlocks by
> that. I already fixed the first one, and I'm going to fix the
> second one
> http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1467162.
> 
> I'll take a deeper look when I come home today, but do you
> know who wakes up rb_sigwait_sleep and/or how the hang there
> could happen?

Any signal should wake it up...

If you can't solve it, can you wait until after the 11/22
developers meeting?

It's a small chance, but I really want to get Thread::Light
(auto-fiber) ready and into 2.6 (at least for Linux/FreeBSD),
and I barely have time for Ruby as it is.

It's not looking like I'll be able to afford to hack on Ruby
next year :<

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89860] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-11-19  0:07 ` [ruby-core:89858] " takashikkbn
@ 2018-11-19  0:52 ` takashikkbn
  2018-11-20  1:12 ` [ruby-core:89892] " takashikkbn
  2018-11-30  5:19 ` [ruby-core:90186] " takashikkbn
  6 siblings, 0 replies; 10+ messages in thread
From: takashikkbn @ 2018-11-19  0:52 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been updated by k0kubun (Takashi Kokubun).


I see.

> If you can't solve it, can you wait until after the 11/22
developers meeting?

Sure. I hope we'll release the fix on rc1, so I'll try to fix that anyway.

I hope auto-fiber will be in 2.6 too :)

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316#change-74919

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:89892] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-11-19  0:52 ` [ruby-core:89860] " takashikkbn
@ 2018-11-20  1:12 ` takashikkbn
  2018-11-30  5:19 ` [ruby-core:90186] " takashikkbn
  6 siblings, 0 replies; 10+ messages in thread
From: takashikkbn @ 2018-11-20  1:12 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been updated by k0kubun (Takashi Kokubun).


> If you can't solve it, can you wait until after the 11/22 developers meeting?

As tracked in [Bug #15320], I succeeded to fix it in r 65817. So, please never mind about that.
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/1468912
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468910

I'll take a look at postponed_job race then.

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316#change-74968

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:90186] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe
       [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-11-20  1:12 ` [ruby-core:89892] " takashikkbn
@ 2018-11-30  5:19 ` takashikkbn
  6 siblings, 0 replies; 10+ messages in thread
From: takashikkbn @ 2018-11-30  5:19 UTC (permalink / raw)
  To: ruby-core

Issue #15316 has been updated by k0kubun (Takashi Kokubun).


> r66100

Thanks to work on this, Eric.

----------------------------------------
Bug #15316: rb_postponed_job_register not thread-safe
https://bugs.ruby-lang.org/issues/15316#change-75307

* Author: normalperson (Eric Wong)
* Status: Closed
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)
```
t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT
```

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register.  Or a one-off API for MJIT...
job registration for MJIT.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-30  5:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-15316.20181118044928@ruby-lang.org>
2018-11-18  4:49 ` [ruby-core:89848] [Ruby trunk Bug#15316] rb_postponed_job_register not thread-safe normalperson
2018-11-18  6:49   ` [ruby-core:89851] " Eric Wong
2018-11-18  8:09 ` [ruby-core:89852] " takashikkbn
2018-11-18  8:19   ` [ruby-core:89853] " Eric Wong
2018-11-18  8:40 ` [ruby-core:89854] " takashikkbn
2018-11-19  0:07 ` [ruby-core:89858] " takashikkbn
2018-11-19  0:22   ` [ruby-core:89859] " Eric Wong
2018-11-19  0:52 ` [ruby-core:89860] " takashikkbn
2018-11-20  1:12 ` [ruby-core:89892] " takashikkbn
2018-11-30  5:19 ` [ruby-core:90186] " takashikkbn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).