ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:91816] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
@ 2019-03-13 18:25 ` eregontp
  2019-03-13 21:49 ` [ruby-core:91820] " shevegen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: eregontp @ 2019-03-13 18:25 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been reported by Eregon (Benoit Daloze).

----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:91820] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
  2019-03-13 18:25 ` [ruby-core:91816] [Ruby trunk Feature#15663] Documenting autoload semantics eregontp
@ 2019-03-13 21:49 ` shevegen
  2019-03-20  2:48 ` [ruby-core:91890] " akr
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: shevegen @ 2019-03-13 21:49 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been updated by shevegen (Robert A. Heiler).


May explain why matz wants to remove autoload in the long run. :)

----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663#change-77092

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:91890] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
  2019-03-13 18:25 ` [ruby-core:91816] [Ruby trunk Feature#15663] Documenting autoload semantics eregontp
  2019-03-13 21:49 ` [ruby-core:91820] " shevegen
@ 2019-03-20  2:48 ` akr
  2019-03-20  3:33 ` [ruby-core:91891] " akr
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: akr @ 2019-03-20  2:48 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been updated by akr (Akira Tanaka).


Eregon (Benoit Daloze) wrote:

> I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

Agreed.

I think no one seriously considered about requiring a library used for autoload.

> Who is knowledgeable about `autoload` and could answer these questions?
> Could we start a document specifying the semantics?

I think we should start to make autoload semantics simpler by introducing "global autoload lock"
as I described in https://bugs.ruby-lang.org/issues/15598 .
This is needed because ruby doesn't (cannot) know dependencies of autoloaded libraries before loading.
It makes autoload related procedure single threaded which is much simpler than multi threads.





----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663#change-77214

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:91891] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-03-20  2:48 ` [ruby-core:91890] " akr
@ 2019-03-20  3:33 ` akr
  2019-03-20 11:03 ` [ruby-core:91896] " eregontp
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: akr @ 2019-03-20  3:33 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been updated by akr (Akira Tanaka).


akr (Akira Tanaka) wrote:

> > Could we start a document specifying the semantics?
> 
> I think we should start to make autoload semantics simpler by introducing "global autoload lock"
> as I described in https://bugs.ruby-lang.org/issues/15598 .

Of course, there is no problem to describe the single thread semantics.

----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663#change-77215

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:91896] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2019-03-20  3:33 ` [ruby-core:91891] " akr
@ 2019-03-20 11:03 ` eregontp
  2019-03-20 13:31 ` [ruby-core:91900] " akr
  2019-04-20  1:44 ` [ruby-core:92337] " fxn
  6 siblings, 0 replies; 7+ messages in thread
From: eregontp @ 2019-03-20 11:03 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been updated by Eregon (Benoit Daloze).


akr (Akira Tanaka) wrote:
> Eregon (Benoit Daloze) wrote:
> 
> > I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.
> 
> Agreed.

It seems tricky implementation-wise, constant resolution (e.g. #const_get) has the constant data structure, and #require has the expanded file path + related lock but none of them has both.

I was thinking maybe the require lock per path should be used for everything, but then since #autoload calls `require` dynamically, how to keep track which thread is loading the constant and so should observe the autoload constant as "not defined" while loading it, without deadlocks?
Other threads might try to load the constant too, or require the autoload path, and only one thread should be the loading thread for that constant.

I'd be tempted for constant resolution to basically do nothing more than call `require`, but then we need `Kernel#require` when it starts loading the file to also mark the autoload constant as being loaded by the current thread (such that the constant looks as "not defined").
How to pass that information (e.g., the constant data structure) from constant resolution down to Kernel#require?
The require `feature` could be changed by user-defined `require`. And a user-defined `require` might very well `require` other files for its own logic (e.g., rubygems files).

> I think no one seriously considered about requiring a library used for autoload.

`lib/net/http.rb` has `autoload :OpenSSL, 'openssl'` and therefore just `require "net/http"; require "openssl"` produces such a case.

> I think we should start to make autoload semantics simpler by introducing "global autoload lock"
> as I described in https://bugs.ruby-lang.org/issues/15598 .
> This is needed because ruby doesn't (cannot) know dependencies of autoloaded libraries before loading.
> It makes autoload related procedure single threaded which is much simpler than multi threads.

That would simplify things, but then I think it would also need to be a global require lock, and I think that can be problematic for compatibility:
e.g., what if a required file starts a server and so the `require` never ends, and later on another thread wants to `require` some code?

----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663#change-77222

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:91900] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2019-03-20 11:03 ` [ruby-core:91896] " eregontp
@ 2019-03-20 13:31 ` akr
  2019-04-20  1:44 ` [ruby-core:92337] " fxn
  6 siblings, 0 replies; 7+ messages in thread
From: akr @ 2019-03-20 13:31 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been updated by akr (Akira Tanaka).


Eregon (Benoit Daloze) wrote:

> I was thinking maybe the require lock per path should be used for everything, but then since #autoload calls `require` dynamically, how to keep track which thread is loading the constant and so should observe the autoload constant as "not defined" while loading it, without deadlocks?
> Other threads might try to load the constant too, or require the autoload path, and only one thread should be the loading thread for that constant.

I think "lock per path" can cause deadlock with "mutual require" similar to
https://bugs.ruby-lang.org/issues/15598 .

> That would simplify things, but then I think it would also need to be a global require lock, and I think that can be problematic for compatibility:
> e.g., what if a required file starts a server and so the `require` never ends, and later on another thread wants to `require` some code?

Why "it would also need to be a global require lock"?




----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663#change-77226

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:92337] [Ruby trunk Feature#15663] Documenting autoload semantics
       [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2019-03-20 13:31 ` [ruby-core:91900] " akr
@ 2019-04-20  1:44 ` fxn
  6 siblings, 0 replies; 7+ messages in thread
From: fxn @ 2019-04-20  1:44 UTC (permalink / raw)
  To: ruby-core

Issue #15663 has been updated by fxn (Xavier Noria).


Let me share some thoughts that won't help much, but would like to contribute anyway :).

To me it is a surprise that constants for which there is an autoload are treated as existing by the constants API. My basic observation is that you don't know if the constant will actually be there until you execute the require. Since the require could fail, from non-existing files, to syntax errors, to files not actually defining the constants. The constant may never materialize.

For me the semantics would be easier if autoloads were treated separately. For example, if `const_defined?` or `defined?` returned `false` for autoloads, `constants` would not include them, etc. You have _actually existing constants_, and autoloads, you have `autoload?` for autoloads if you need to introspect them. You would need `remove_autoload` perhaps... you see the mental model: two separate collections.

Of course, nothing of this is backwards compatible, so of no practical value for this thread surely.

----------------------------------------
Feature #15663: Documenting autoload semantics
https://bugs.ruby-lang.org/issues/15663#change-77680

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The semantics of autoload are extremely complicated.

As far as I can see, they are unfortunately not documented.

ruby/spec tries to test [many aspects](https://github.com/ruby/spec/blob/master/core/module/autoload_spec.rb) of it, `test/ruby/test_autoload.rb` has a few tests, and e.g. zeitwerk tests [some other parts](https://github.com/fxn/zeitwerk/blob/master/test/lib/zeitwerk/test_ruby_compatibility.rb).
One could of course read the MRI source code, but I find it very hard to follow around `autoload`.

For the context, I'm trying to implement `autoload` as correct as possible in TruffleRuby and finding it very difficult given the inconsistencies (see below) and lack of documentation.

There is nowhere a document on how it should behave, and given the complexity of it I am not even sure MRI behaves as expected.
Could we create this document?
For instance, there is such a [document for refinements](https://github.com/ruby/ruby/blob/trunk/doc/syntax/refinements.rdoc).

Here is an example how confusing autoload can be, and I would love to hear the rationale or have some written semantics on why it is that way.

main.rb:
```ruby
require "pp"

$: << __dir__

Object.autoload(:Foo, "foo")

CHECK = -> state {
  checks = -> {
    {
      defined: defined?(Foo),
      const_defined: Object.const_defined?(:Foo),
      autoload?: Object.autoload?(:Foo),
      in_constants: Object.constants.include?(:Foo),
    }
  }

  pp when: state, **checks.call, other_thread: Thread.new { checks.call }.value
}

CHECK.call(:before_require)

if ARGV.first == "require"
  require "foo"
else
  Foo # trigger the autoload
end

CHECK.call(:after)

p Foo
```

foo.rb:
```ruby
CHECK.call(:during_before_defining)

module Foo
end

CHECK.call(:during_after_defining)
```

Here are the results for MRI 2.6.1:
```ruby
$ ruby main.rb        
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, the constant looks not defined during the autoload for the Thread loading it, but looks defined and as an autoload for other threads.

Now we can discover other subtle semantics, by using `require` on the autoload file instead of accessing the constant:

```ruby
$ ruby main.rb require 
{:when=>:before_require,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>"foo",
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>"foo",
   :in_constants=>true}}
{:when=>:during_before_defining,
 :defined=>nil,
 :const_defined=>false,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>nil, :const_defined=>false, :autoload?=>nil, :in_constants=>true}}
{:when=>:during_after_defining,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
{:when=>:after,
 :defined=>"constant",
 :const_defined=>true,
 :autoload?=>nil,
 :in_constants=>true,
 :other_thread=>
  {:defined=>"constant",
   :const_defined=>true,
   :autoload?=>nil,
   :in_constants=>true}}
Foo
```

Looking at during_before_defining, now the other threads seem to see the constant not defined, although it is still in `Object.constants`.
But of course, the constant cannot be removed, as otherwise that would not be thread-safe and other threads would raise NameError when accessing the constant.
In fact, we can see other threads actually wait for the constant, by changing to `Thread.new { Foo; checks.call }`, and then we get a deadlock:

```
Traceback (most recent call last):
	2: from main.rb:20:in `<main>'
	1: from main.rb:17:in `block in <main>'
main.rb:17:in `value': No live threads left. Deadlock? (fatal)
3 threads, 3 sleeps current:0x00007f0124004cb0 main thread:0x000055929cc2c470
* #<Thread:0x000055929cc5b348 sleep_forever>
   rb_thread_t:0x000055929cc2c470 native:0x00007f013381d700 int:0
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   main.rb:20:in `<main>'
* #<Thread:0x000055929ce2b380@main.rb:17 sleep_forever>
   rb_thread_t:0x000055929ce026d0 native:0x00007f0129007700 int:0
    depended by: tb_thread_id:0x000055929cc2c470
   main.rb:17:in `value'
   main.rb:17:in `block in <main>'
   foo.rb:1:in `<top (required)>'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   /home/eregon/.rubies/ruby-2.6.1/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
   main.rb:17:in `block (2 levels) in <main>'
* #<Thread:0x000055929ce29c38@main.rb:17 sleep_forever>
   rb_thread_t:0x00007f0124004cb0 native:0x00007f0128e05700 int:0
    depended by: tb_thread_id:0x000055929ce026d0
   main.rb:17:in `block (2 levels) in <main>'
```

This is quite weird. Is the second behavior a bug?
Why should other threads suddenly see the constant as "not defined" while it is loading via `require` in the main thread?
It's also inconsistent with the first case.

I would have thought `require autoload_path` would basically do the same as triggering the autoload of the constant (such as `Foo`). But the results above show they differ.

There are many more complex cases for autoload, such as [this spec](https://github.com/ruby/spec/blob/72bd058b5cf0a9d9de5a188052db2fba021581cc/core/module/autoload_spec.rb#L360-L375), or how is thread-safety is achieved when methods are defined incrementally in Ruby but the module is defined immediately.

Who is knowledgeable about `autoload` and could answer these questions?
Could we start a document specifying the semantics?



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-20  1:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-15663.20190313182518@ruby-lang.org>
2019-03-13 18:25 ` [ruby-core:91816] [Ruby trunk Feature#15663] Documenting autoload semantics eregontp
2019-03-13 21:49 ` [ruby-core:91820] " shevegen
2019-03-20  2:48 ` [ruby-core:91890] " akr
2019-03-20  3:33 ` [ruby-core:91891] " akr
2019-03-20 11:03 ` [ruby-core:91896] " eregontp
2019-03-20 13:31 ` [ruby-core:91900] " akr
2019-04-20  1:44 ` [ruby-core:92337] " fxn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).