ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:95344] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
       [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
@ 2019-10-15 18:58 ` ko1
  2019-10-15 20:01 ` [ruby-core:95345] " eregontp
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: ko1 @ 2019-10-15 18:58 UTC (permalink / raw)
  To: ruby-core

Issue #16254 has been reported by ko1 (Koichi Sasada).

----------------------------------------
Feature #16254: MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
https://bugs.ruby-lang.org/issues/16254

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
# Abstract

MRI defines most of built-in classes in C with C-APIs like `rb_define_method()`.
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in `prelude.rb`.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk <https://rubykaigi.org/2019/presentations/ko1.html>.

# Terminology

* C-methods: methods defined in C (defined with `rb_define_method()`, etc).
* Ruby-methods: methods defined in Ruby.
* ISeq: The body of `RUbyVM::InstructionSequence` object which represents bytecode for VM.

# Background / Problem / Idea

## Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

### (1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have `parameters` information which are returned by `Method#parameters`, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

### (2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

### (3) Performance issue

There are several features which are slower in C than written in Ruby.

* exception handling (`rb_ensure()`, etc) because we need to capture context with `setjmp` on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
* Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (`foo(k1: v1, k2: v2)`).

-> Idea: Writing methods in Ruby makes them faster.

### (4) Productivity

It is tough to write some features in C:

For example, it is easy to write `rescue` syntax in Ruby:

```ruby
# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end
```

But it is difficult to write/read in C:

```C
static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}
```

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

### (5) API change

To introduce `Guild`, I want to pass a "context" parameter (as a first parameter) for each C-functions like `mrb_state` on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by `rb_define_method()`.

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current `rb_define_method()` APIs (with additional cost on `Guild` available MRI).

## Written in Ruby in `prelude.rb`

There is a file `prelude.rb` which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (`IO#read_nonblock`, `TracePoint#enable`).

However, writing all of methods in Ruby is not possible because:

* (1) feasibility issue (we can not access internal data structure)
* (2) performance issue (slow in general, of course)
* (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

# Proposal

(1) Introducing `intrinsic` mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

## (1) Intrinsic function

### Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax `__intrinsic__.func(args)`.
In this case, registered intrinsic function `func()` is called with `args`.

In normal Ruby program, `__intrinsic__` is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

* block 
* keyword arguments
* splat arguments

### Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

```ruby
# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end
```

(2) Implement intrinsic functions

It is almost same as functions used by `rb_define_method()`.
However it will accept context parameter as the first parameter.

(`rb_execution_context_t` is too long, so we can rename it, `rb_state` for example)

```C
static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}
```

(3) Define an intrinsic function table and load `.rb` file with the table.

```C
Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}
```

### Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

## (2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have `ISeq#to_binary` method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. `USE_LAZY_LOAD` in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

# Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: <https://rubykaigi.org/2019/presentations/ko1.html>

Points:

* Calling overhead of Ruby mehtods with intrinsic functions
  * Normal case, it is almost same as C-methods using optimized VM instructions.
  * With keyword parameters, it is faster than C-methods.
  * With optional parameters, it is x2 slower so it should be solved (*1).

* Loading overhead
  * Requiring ".rb" files is about x15 slower than defining C methods.
  * Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
  * At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

```ruby
# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end
```

# Implementation

Done:
* Compile calling intrinsic functions (.rb)
* Exporting intrinsic function table (.c)

Not yet:
* Loading from binary mechanism
* Attribute syntax
* most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

# Discussion

## Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

## Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

## `__intrinsic__` keyword

On my RubyKaigi 2019 talk, I proposed `__C__`, but I think `__intrinsic__` is more descriptive (but a bit long).
Another idea is `RubyVM::intrinsic.func(...)`.

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

## Can we support `__intrinsic__` in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

## Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (`trace_point.rb`, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (`BASE_RUBY`) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

```C
INTRINSIC_FUNCTION static VALUE
str_length(...)
...
```

(2) Build another ruby parser with source code, "parse-ruby".

* 1. generate parse-ruby with C code.
* 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
* 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by `BASE_RUBY` for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or `ISeq#to_a`.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.


## Path of '*.rb' files and install or not

The path of `prelude.rb` is `<internal:prelude>`. We have several options.

* (1) Don't install ".rb" files and make these path `<internal:trace_point.rb>`, for example.
* (2) Install ".rb" and make these paths non-existing paths such as `<internal>/installdir/lib/builtin/trace_point.rb`.
* (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into `ruby` (`libruby`).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

## Compatibility issue?

There are several compatibility issues. For example, `TracePoint` `c-call` events are changed to `call` events.
And there are more incompatibles.
We need to check them carefully.

## Bootstrap issue?

Yes, there are.

Loading `.rb` files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

# Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:95345] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
       [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
  2019-10-15 18:58 ` [ruby-core:95344] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax ko1
@ 2019-10-15 20:01 ` eregontp
  2019-10-16  2:04 ` [ruby-core:95350] " daniel
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: eregontp @ 2019-10-15 20:01 UTC (permalink / raw)
  To: ruby-core

Issue #16254 has been updated by Eregon (Benoit Daloze).


This sounds great.

We have very a similar mechanism in TruffleRuby, inherited from Rubinius, which is called "primitives" (ala Smalltalk).
Compared to Rubinius we changed the syntax and always use the `invoke_primitive` form (like `__intrinsic__` above), not the "try intrinsic, if it fails fallback to the Ruby code below" (Smalltalk-style).

It looks like this:
```ruby
class WeakRef < Delegator
  def initialize(obj)
    TrufflePrimitive.weakref_set_object(self, obj)
  end
end
```
(from https://github.com/oracle/truffleruby/blob/44e61173f0661c41dbf9a4c7a229091cf6ab83e3/lib/truffle/weakref.rb
see more examples with https://github.com/oracle/truffleruby/search?q=TrufflePrimitive&unscoped_q=TrufflePrimitive )

We recently changed from `Truffle.invoke_primitive(:weakref_set_object, self, obj)` to `TrufflePrimitive.weakref_set_object(self, obj)` because:
* It's more concise and arguably easier to read.
* As it's like a normal method call, we can generate stub definitions in IDEs (e.g., in IntelliJ), which allows to jump from Ruby code to the corresponding Java code of TruffleRuby.

Implementation-wise, the code above will generate an AST of the form
```
MethodNode(name=initialize, args=obj, body=[
  ReadArgumentNode,
  WeakRefSetObjectNode(children=[ReadSelfNode, ReadLocalNode(name=obj)])
])
```
I.e., the intrinsic node is directly in the AST of the method in TruffleRuby.
Only a fixed number of positional arguments is allowed. That way, there is no code for argument handling, arguments values are just passed directly to the WeakRefSetObjectNode as arguments are simply child nodes.

The main reason I'm detailing this is I think this an opportunity to standardize the syntax for "primitives"/"intrinsics".
No matter the implementation language, it seems the concept of "primitives"/"intrinsics" is universal.

A common syntax for intrinsics/primitives would allow to share Ruby code for core classes using these intrinsics/primitives.
It might look like it's little Ruby code, but I'm confident it will grow.
For instance, I wouldn't be surprised to see some of the argument processing/validation moved to Ruby, as it might just be easier.
At the very least, method definitions (the argument names and their default values) could be shared and avoid duplication.

What do you think?

----------------------------------------
Feature #16254: MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
https://bugs.ruby-lang.org/issues/16254#change-82054

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
# Abstract

MRI defines most of built-in classes in C with C-APIs like `rb_define_method()`.
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in `prelude.rb`.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk <https://rubykaigi.org/2019/presentations/ko1.html>.

# Terminology

* C-methods: methods defined in C (defined with `rb_define_method()`, etc).
* Ruby-methods: methods defined in Ruby.
* ISeq: The body of `RUbyVM::InstructionSequence` object which represents bytecode for VM.

# Background / Problem / Idea

## Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

### (1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have `parameters` information which are returned by `Method#parameters`, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

### (2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

### (3) Performance issue

There are several features which are slower in C than written in Ruby.

* exception handling (`rb_ensure()`, etc) because we need to capture context with `setjmp` on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
* Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (`foo(k1: v1, k2: v2)`).

-> Idea: Writing methods in Ruby makes them faster.

### (4) Productivity

It is tough to write some features in C:

For example, it is easy to write `rescue` syntax in Ruby:

```ruby
# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end
```

But it is difficult to write/read in C:

```C
static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}
```

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

### (5) API change

To introduce `Guild`, I want to pass a "context" parameter (as a first parameter) for each C-functions like `mrb_state` on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by `rb_define_method()`.

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current `rb_define_method()` APIs (with additional cost on `Guild` available MRI).

## Written in Ruby in `prelude.rb`

There is a file `prelude.rb` which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (`IO#read_nonblock`, `TracePoint#enable`).

However, writing all of methods in Ruby is not possible because:

* (1) feasibility issue (we can not access internal data structure)
* (2) performance issue (slow in general, of course)
* (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

# Proposal

(1) Introducing `intrinsic` mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

## (1) Intrinsic function

### Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax `__intrinsic__.func(args)`.
In this case, registered intrinsic function `func()` is called with `args`.

In normal Ruby program, `__intrinsic__` is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

* block 
* keyword arguments
* splat arguments

### Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

```ruby
# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end
```

(2) Implement intrinsic functions

It is almost same as functions used by `rb_define_method()`.
However it will accept context parameter as the first parameter.

(`rb_execution_context_t` is too long, so we can rename it, `rb_state` for example)

```C
static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}
```

(3) Define an intrinsic function table and load `.rb` file with the table.

```C
Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}
```

### Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

## (2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have `ISeq#to_binary` method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. `USE_LAZY_LOAD` in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

# Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: <https://rubykaigi.org/2019/presentations/ko1.html>

Points:

* Calling overhead of Ruby mehtods with intrinsic functions
  * Normal case, it is almost same as C-methods using optimized VM instructions.
  * With keyword parameters, it is faster than C-methods.
  * With optional parameters, it is x2 slower so it should be solved (*1).

* Loading overhead
  * Requiring ".rb" files is about x15 slower than defining C methods.
  * Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
  * At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

```ruby
# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end
```

# Implementation

Done:
* Compile calling intrinsic functions (.rb)
* Exporting intrinsic function table (.c)

Not yet:
* Loading from binary mechanism
* Attribute syntax
* most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

# Discussion

## Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

## Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

## `__intrinsic__` keyword

On my RubyKaigi 2019 talk, I proposed `__C__`, but I think `__intrinsic__` is more descriptive (but a bit long).
Another idea is `RubyVM::intrinsic.func(...)`.

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

## Can we support `__intrinsic__` in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

## Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (`trace_point.rb`, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (`BASE_RUBY`) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

```C
INTRINSIC_FUNCTION static VALUE
str_length(...)
...
```

(2) Build another ruby parser with source code, "parse-ruby".

* 1. generate parse-ruby with C code.
* 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
* 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by `BASE_RUBY` for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or `ISeq#to_a`.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.


## Path of '*.rb' files and install or not

The path of `prelude.rb` is `<internal:prelude>`. We have several options.

* (1) Don't install ".rb" files and make these path `<internal:trace_point.rb>`, for example.
* (2) Install ".rb" and make these paths non-existing paths such as `<internal>/installdir/lib/builtin/trace_point.rb`.
* (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into `ruby` (`libruby`).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

## Compatibility issue?

There are several compatibility issues. For example, `TracePoint` `c-call` events are changed to `call` events.
And there are more incompatibles.
We need to check them carefully.

## Bootstrap issue?

Yes, there are.

Loading `.rb` files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

# Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:95350] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
       [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
  2019-10-15 18:58 ` [ruby-core:95344] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax ko1
  2019-10-15 20:01 ` [ruby-core:95345] " eregontp
@ 2019-10-16  2:04 ` daniel
  2019-10-18  6:08 ` [ruby-core:95413] " naruse
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: daniel @ 2019-10-16  2:04 UTC (permalink / raw)
  To: ruby-core

Issue #16254 has been updated by Dan0042 (Daniel DeLorme).


There's something I'm not sure I understood so I'd like to clarify if this proposal can be described as

A) write the boilerplate `rb_define_class` and `rb_define_method` using a ruby-like macro language;

B) write core classes and methods with the full power of ruby plus a few "invoke C function" macros, which are then compiled to VM instructions and serialized to become part of the binary.

It sounds to me like the proposal is (B) which is really an amazing idea and implementation, but the examples provided for Comparable and TracePoint could be trivially written as (A) so I'm not sure what is the advantage of the heavyweight (B) approach. It would really help to have an example that does more than just wrap the intrinsic function calls.


----------------------------------------
Feature #16254: MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
https://bugs.ruby-lang.org/issues/16254#change-82058

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
# Abstract

MRI defines most of built-in classes in C with C-APIs like `rb_define_method()`.
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in `prelude.rb`.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk <https://rubykaigi.org/2019/presentations/ko1.html>.

# Terminology

* C-methods: methods defined in C (defined with `rb_define_method()`, etc).
* Ruby-methods: methods defined in Ruby.
* ISeq: The body of `RUbyVM::InstructionSequence` object which represents bytecode for VM.

# Background / Problem / Idea

## Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

### (1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have `parameters` information which are returned by `Method#parameters`, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

### (2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

### (3) Performance issue

There are several features which are slower in C than written in Ruby.

* exception handling (`rb_ensure()`, etc) because we need to capture context with `setjmp` on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
* Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (`foo(k1: v1, k2: v2)`).

-> Idea: Writing methods in Ruby makes them faster.

### (4) Productivity

It is tough to write some features in C:

For example, it is easy to write `rescue` syntax in Ruby:

```ruby
# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end
```

But it is difficult to write/read in C:

```C
static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}
```

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

### (5) API change

To introduce `Guild`, I want to pass a "context" parameter (as a first parameter) for each C-functions like `mrb_state` on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by `rb_define_method()`.

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current `rb_define_method()` APIs (with additional cost on `Guild` available MRI).

## Written in Ruby in `prelude.rb`

There is a file `prelude.rb` which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (`IO#read_nonblock`, `TracePoint#enable`).

However, writing all of methods in Ruby is not possible because:

* (1) feasibility issue (we can not access internal data structure)
* (2) performance issue (slow in general, of course)
* (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

# Proposal

(1) Introducing `intrinsic` mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

## (1) Intrinsic function

### Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax `__intrinsic__.func(args)`.
In this case, registered intrinsic function `func()` is called with `args`.

In normal Ruby program, `__intrinsic__` is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

* block 
* keyword arguments
* splat arguments

### Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

```ruby
# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end
```

(2) Implement intrinsic functions

It is almost same as functions used by `rb_define_method()`.
However it will accept context parameter as the first parameter.

(`rb_execution_context_t` is too long, so we can rename it, `rb_state` for example)

```C
static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}
```

(3) Define an intrinsic function table and load `.rb` file with the table.

```C
Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}
```

### Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

## (2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have `ISeq#to_binary` method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. `USE_LAZY_LOAD` in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

# Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: <https://rubykaigi.org/2019/presentations/ko1.html>

Points:

* Calling overhead of Ruby mehtods with intrinsic functions
  * Normal case, it is almost same as C-methods using optimized VM instructions.
  * With keyword parameters, it is faster than C-methods.
  * With optional parameters, it is x2 slower so it should be solved (*1).

* Loading overhead
  * Requiring ".rb" files is about x15 slower than defining C methods.
  * Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
  * At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

```ruby
# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end
```

# Implementation

Done:
* Compile calling intrinsic functions (.rb)
* Exporting intrinsic function table (.c)

Not yet:
* Loading from binary mechanism
* Attribute syntax
* most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

# Discussion

## Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

## Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

## `__intrinsic__` keyword

On my RubyKaigi 2019 talk, I proposed `__C__`, but I think `__intrinsic__` is more descriptive (but a bit long).
Another idea is `RubyVM::intrinsic.func(...)`.

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

## Can we support `__intrinsic__` in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

## Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (`trace_point.rb`, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (`BASE_RUBY`) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

```C
INTRINSIC_FUNCTION static VALUE
str_length(...)
...
```

(2) Build another ruby parser with source code, "parse-ruby".

* 1. generate parse-ruby with C code.
* 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
* 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by `BASE_RUBY` for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or `ISeq#to_a`.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.


## Path of '*.rb' files and install or not

The path of `prelude.rb` is `<internal:prelude>`. We have several options.

* (1) Don't install ".rb" files and make these path `<internal:trace_point.rb>`, for example.
* (2) Install ".rb" and make these paths non-existing paths such as `<internal>/installdir/lib/builtin/trace_point.rb`.
* (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into `ruby` (`libruby`).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

## Compatibility issue?

There are several compatibility issues. For example, `TracePoint` `c-call` events are changed to `call` events.
And there are more incompatibles.
We need to check them carefully.

## Bootstrap issue?

Yes, there are.

Loading `.rb` files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

# Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:95413] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
       [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-10-16  2:04 ` [ruby-core:95350] " daniel
@ 2019-10-18  6:08 ` naruse
  2019-11-07 21:32 ` [ruby-core:95749] " ko1
  2019-11-07 21:35 ` [ruby-core:95750] " ko1
  5 siblings, 0 replies; 6+ messages in thread
From: naruse @ 2019-10-18  6:08 UTC (permalink / raw)
  To: ruby-core

Issue #16254 has been updated by naruse (Yui NARUSE).


Dan0042 (Daniel DeLorme) wrote:
> It would really help to have an example that does more than just wrap the intrinsic function calls.

Below is an example which solves Problem: Writen in C: (3) Performance issue: keyword parameters.
https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612#file-trace_point-rb-L195-L197

----------------------------------------
Feature #16254: MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
https://bugs.ruby-lang.org/issues/16254#change-82167

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
# Abstract

MRI defines most of built-in classes in C with C-APIs like `rb_define_method()`.
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in `prelude.rb`.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk <https://rubykaigi.org/2019/presentations/ko1.html>.

# Terminology

* C-methods: methods defined in C (defined with `rb_define_method()`, etc).
* Ruby-methods: methods defined in Ruby.
* ISeq: The body of `RUbyVM::InstructionSequence` object which represents bytecode for VM.

# Background / Problem / Idea

## Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

### (1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have `parameters` information which are returned by `Method#parameters`, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

### (2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

### (3) Performance issue

There are several features which are slower in C than written in Ruby.

* exception handling (`rb_ensure()`, etc) because we need to capture context with `setjmp` on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
* Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (`foo(k1: v1, k2: v2)`).

-> Idea: Writing methods in Ruby makes them faster.

### (4) Productivity

It is tough to write some features in C:

For example, it is easy to write `rescue` syntax in Ruby:

```ruby
# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end
```

But it is difficult to write/read in C:

```C
static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}
```

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

### (5) API change

To introduce `Guild`, I want to pass a "context" parameter (as a first parameter) for each C-functions like `mrb_state` on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by `rb_define_method()`.

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current `rb_define_method()` APIs (with additional cost on `Guild` available MRI).

## Written in Ruby in `prelude.rb`

There is a file `prelude.rb` which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (`IO#read_nonblock`, `TracePoint#enable`).

However, writing all of methods in Ruby is not possible because:

* (1) feasibility issue (we can not access internal data structure)
* (2) performance issue (slow in general, of course)
* (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

# Proposal

(1) Introducing `intrinsic` mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

## (1) Intrinsic function

### Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax `__intrinsic__.func(args)`.
In this case, registered intrinsic function `func()` is called with `args`.

In normal Ruby program, `__intrinsic__` is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

* block 
* keyword arguments
* splat arguments

### Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

```ruby
# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end
```

(2) Implement intrinsic functions

It is almost same as functions used by `rb_define_method()`.
However it will accept context parameter as the first parameter.

(`rb_execution_context_t` is too long, so we can rename it, `rb_state` for example)

```C
static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}
```

(3) Define an intrinsic function table and load `.rb` file with the table.

```C
Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}
```

### Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

## (2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have `ISeq#to_binary` method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. `USE_LAZY_LOAD` in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

# Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: <https://rubykaigi.org/2019/presentations/ko1.html>

Points:

* Calling overhead of Ruby mehtods with intrinsic functions
  * Normal case, it is almost same as C-methods using optimized VM instructions.
  * With keyword parameters, it is faster than C-methods.
  * With optional parameters, it is x2 slower so it should be solved (*1).

* Loading overhead
  * Requiring ".rb" files is about x15 slower than defining C methods.
  * Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
  * At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

```ruby
# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end
```

# Implementation

Done:
* Compile calling intrinsic functions (.rb)
* Exporting intrinsic function table (.c)

Not yet:
* Loading from binary mechanism
* Attribute syntax
* most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

# Discussion

## Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

## Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

## `__intrinsic__` keyword

On my RubyKaigi 2019 talk, I proposed `__C__`, but I think `__intrinsic__` is more descriptive (but a bit long).
Another idea is `RubyVM::intrinsic.func(...)`.

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

## Can we support `__intrinsic__` in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

## Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (`trace_point.rb`, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (`BASE_RUBY`) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

```C
INTRINSIC_FUNCTION static VALUE
str_length(...)
...
```

(2) Build another ruby parser with source code, "parse-ruby".

* 1. generate parse-ruby with C code.
* 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
* 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by `BASE_RUBY` for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or `ISeq#to_a`.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.


## Path of '*.rb' files and install or not

The path of `prelude.rb` is `<internal:prelude>`. We have several options.

* (1) Don't install ".rb" files and make these path `<internal:trace_point.rb>`, for example.
* (2) Install ".rb" and make these paths non-existing paths such as `<internal>/installdir/lib/builtin/trace_point.rb`.
* (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into `ruby` (`libruby`).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

## Compatibility issue?

There are several compatibility issues. For example, `TracePoint` `c-call` events are changed to `call` events.
And there are more incompatibles.
We need to check them carefully.

## Bootstrap issue?

Yes, there are.

Loading `.rb` files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

# Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:95749] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
       [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2019-10-18  6:08 ` [ruby-core:95413] " naruse
@ 2019-11-07 21:32 ` ko1
  2019-11-07 21:35 ` [ruby-core:95750] " ko1
  5 siblings, 0 replies; 6+ messages in thread
From: ko1 @ 2019-11-07 21:32 UTC (permalink / raw)
  To: ruby-core

Issue #16254 has been updated by ko1 (Koichi Sasada).


Design change:

(1) Table auto generation 

> Should we maintain intrinsic function table?

I wrote "yes", but I found it is too difficult by human being. So I decide to generate this table by parsing .rb files.

As I wrote:

> Restrict syntax which can be used by BASE_RUBY for built-in ".rb" files.
> It is easy to list up intrinsic functions using Ripper or AST or ISeq#to_a.

There is this kind of restriction. You can not use pattern matches in .rb files :p

(2) `__intrinsic__.func(...)` to `__builtin_func(...)`

Reasons:
(a) similar to gcc's intrinsic format.
(b) easy to introduce special inline pragmra with `__builtin_`, like `__builtin_attribute(:pure)` and so on to teach the special information to Ruby interpreter. In this case, `__builtin_attribute(:pure)` can specify this method is "pure" (no-side effect) and so on.
(c) easy to parse (find out this format) by external tools. Without AST module, it is a bit difficult to parse `__intrinsic__.foo()` with compiled VM asm. However, AST module was introduced from Ruby 2.6 and the BASERUBY can be more older versions (the oldest version of BASERUBY on rubyci is ruby 2.2). This restriction can be relaxed by making analyzing microruby from source code (microruby is small subset of ruby interpreter to generate miniruby).


Completed code is https://github.com/ruby/ruby/pull/2655
I'll merge it soon.

----------------------------------------
Feature #16254: MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
https://bugs.ruby-lang.org/issues/16254#change-82566

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
# Abstract

MRI defines most of built-in classes in C with C-APIs like `rb_define_method()`.
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in `prelude.rb`.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk <https://rubykaigi.org/2019/presentations/ko1.html>.

# Terminology

* C-methods: methods defined in C (defined with `rb_define_method()`, etc).
* Ruby-methods: methods defined in Ruby.
* ISeq: The body of `RUbyVM::InstructionSequence` object which represents bytecode for VM.

# Background / Problem / Idea

## Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

### (1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have `parameters` information which are returned by `Method#parameters`, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

### (2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

### (3) Performance issue

There are several features which are slower in C than written in Ruby.

* exception handling (`rb_ensure()`, etc) because we need to capture context with `setjmp` on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
* Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (`foo(k1: v1, k2: v2)`).

-> Idea: Writing methods in Ruby makes them faster.

### (4) Productivity

It is tough to write some features in C:

For example, it is easy to write `rescue` syntax in Ruby:

```ruby
# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end
```

But it is difficult to write/read in C:

```C
static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}
```

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

### (5) API change

To introduce `Guild`, I want to pass a "context" parameter (as a first parameter) for each C-functions like `mrb_state` on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by `rb_define_method()`.

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current `rb_define_method()` APIs (with additional cost on `Guild` available MRI).

## Written in Ruby in `prelude.rb`

There is a file `prelude.rb` which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (`IO#read_nonblock`, `TracePoint#enable`).

However, writing all of methods in Ruby is not possible because:

* (1) feasibility issue (we can not access internal data structure)
* (2) performance issue (slow in general, of course)
* (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

# Proposal

(1) Introducing `intrinsic` mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

## (1) Intrinsic function

### Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax `__intrinsic__.func(args)`.
In this case, registered intrinsic function `func()` is called with `args`.

In normal Ruby program, `__intrinsic__` is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

* block 
* keyword arguments
* splat arguments

### Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

```ruby
# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end
```

(2) Implement intrinsic functions

It is almost same as functions used by `rb_define_method()`.
However it will accept context parameter as the first parameter.

(`rb_execution_context_t` is too long, so we can rename it, `rb_state` for example)

```C
static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}
```

(3) Define an intrinsic function table and load `.rb` file with the table.

```C
Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}
```

### Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

## (2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have `ISeq#to_binary` method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. `USE_LAZY_LOAD` in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

# Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: <https://rubykaigi.org/2019/presentations/ko1.html>

Points:

* Calling overhead of Ruby mehtods with intrinsic functions
  * Normal case, it is almost same as C-methods using optimized VM instructions.
  * With keyword parameters, it is faster than C-methods.
  * With optional parameters, it is x2 slower so it should be solved (*1).

* Loading overhead
  * Requiring ".rb" files is about x15 slower than defining C methods.
  * Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
  * At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

```ruby
# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end
```

# Implementation

Done:
* Compile calling intrinsic functions (.rb)
* Exporting intrinsic function table (.c)

Not yet:
* Loading from binary mechanism
* Attribute syntax
* most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

# Discussion

## Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

## Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

## `__intrinsic__` keyword

On my RubyKaigi 2019 talk, I proposed `__C__`, but I think `__intrinsic__` is more descriptive (but a bit long).
Another idea is `RubyVM::intrinsic.func(...)`.

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

## Can we support `__intrinsic__` in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

## Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (`trace_point.rb`, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (`BASE_RUBY`) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

```C
INTRINSIC_FUNCTION static VALUE
str_length(...)
...
```

(2) Build another ruby parser with source code, "parse-ruby".

* 1. generate parse-ruby with C code.
* 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
* 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by `BASE_RUBY` for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or `ISeq#to_a`.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.


## Path of '*.rb' files and install or not

The path of `prelude.rb` is `<internal:prelude>`. We have several options.

* (1) Don't install ".rb" files and make these path `<internal:trace_point.rb>`, for example.
* (2) Install ".rb" and make these paths non-existing paths such as `<internal>/installdir/lib/builtin/trace_point.rb`.
* (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into `ruby` (`libruby`).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

## Compatibility issue?

There are several compatibility issues. For example, `TracePoint` `c-call` events are changed to `call` events.
And there are more incompatibles.
We need to check them carefully.

## Bootstrap issue?

Yes, there are.

Loading `.rb` files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

# Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:95750] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
       [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2019-11-07 21:32 ` [ruby-core:95749] " ko1
@ 2019-11-07 21:35 ` ko1
  5 siblings, 0 replies; 6+ messages in thread
From: ko1 @ 2019-11-07 21:35 UTC (permalink / raw)
  To: ruby-core

Issue #16254 has been updated by ko1 (Koichi Sasada).


Eregon (Benoit Daloze) wrote:
> A common syntax for intrinsics/primitives would allow to share Ruby code for core classes using these intrinsics/primitives.
> It might look like it's little Ruby code, but I'm confident it will grow.
> For instance, I wouldn't be surprised to see some of the argument processing/validation moved to Ruby, as it might just be easier.
> At the very least, method definitions (the argument names and their default values) could be shared and avoid duplication.
> 
> What do you think?

I understand your concern. But I'm not sure we can share these code because they are "implementation" and depend on backend interpreter.

Anyway, it is very first stage and if this approach becomes mature, we can discuss more again.

----------------------------------------
Feature #16254: MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax
https://bugs.ruby-lang.org/issues/16254#change-82567

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
# Abstract

MRI defines most of built-in classes in C with C-APIs like `rb_define_method()`.
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in `prelude.rb`.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk <https://rubykaigi.org/2019/presentations/ko1.html>.

# Terminology

* C-methods: methods defined in C (defined with `rb_define_method()`, etc).
* Ruby-methods: methods defined in Ruby.
* ISeq: The body of `RUbyVM::InstructionSequence` object which represents bytecode for VM.

# Background / Problem / Idea

## Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

### (1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have `parameters` information which are returned by `Method#parameters`, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

### (2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

### (3) Performance issue

There are several features which are slower in C than written in Ruby.

* exception handling (`rb_ensure()`, etc) because we need to capture context with `setjmp` on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
* Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (`foo(k1: v1, k2: v2)`).

-> Idea: Writing methods in Ruby makes them faster.

### (4) Productivity

It is tough to write some features in C:

For example, it is easy to write `rescue` syntax in Ruby:

```ruby
# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end
```

But it is difficult to write/read in C:

```C
static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}
```

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

### (5) API change

To introduce `Guild`, I want to pass a "context" parameter (as a first parameter) for each C-functions like `mrb_state` on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by `rb_define_method()`.

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current `rb_define_method()` APIs (with additional cost on `Guild` available MRI).

## Written in Ruby in `prelude.rb`

There is a file `prelude.rb` which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (`IO#read_nonblock`, `TracePoint#enable`).

However, writing all of methods in Ruby is not possible because:

* (1) feasibility issue (we can not access internal data structure)
* (2) performance issue (slow in general, of course)
* (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

# Proposal

(1) Introducing `intrinsic` mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

## (1) Intrinsic function

### Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax `__intrinsic__.func(args)`.
In this case, registered intrinsic function `func()` is called with `args`.

In normal Ruby program, `__intrinsic__` is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

* block 
* keyword arguments
* splat arguments

### Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

```ruby
# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end
```

(2) Implement intrinsic functions

It is almost same as functions used by `rb_define_method()`.
However it will accept context parameter as the first parameter.

(`rb_execution_context_t` is too long, so we can rename it, `rb_state` for example)

```C
static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}
```

(3) Define an intrinsic function table and load `.rb` file with the table.

```C
Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}
```

### Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

## (2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have `ISeq#to_binary` method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. `USE_LAZY_LOAD` in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

# Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: <https://rubykaigi.org/2019/presentations/ko1.html>

Points:

* Calling overhead of Ruby mehtods with intrinsic functions
  * Normal case, it is almost same as C-methods using optimized VM instructions.
  * With keyword parameters, it is faster than C-methods.
  * With optional parameters, it is x2 slower so it should be solved (*1).

* Loading overhead
  * Requiring ".rb" files is about x15 slower than defining C methods.
  * Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
  * At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

```ruby
# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end
```

# Implementation

Done:
* Compile calling intrinsic functions (.rb)
* Exporting intrinsic function table (.c)

Not yet:
* Loading from binary mechanism
* Attribute syntax
* most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

# Discussion

## Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

## Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

## `__intrinsic__` keyword

On my RubyKaigi 2019 talk, I proposed `__C__`, but I think `__intrinsic__` is more descriptive (but a bit long).
Another idea is `RubyVM::intrinsic.func(...)`.

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

## Can we support `__intrinsic__` in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

## Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (`trace_point.rb`, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (`BASE_RUBY`) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

```C
INTRINSIC_FUNCTION static VALUE
str_length(...)
...
```

(2) Build another ruby parser with source code, "parse-ruby".

* 1. generate parse-ruby with C code.
* 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
* 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by `BASE_RUBY` for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or `ISeq#to_a`.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.


## Path of '*.rb' files and install or not

The path of `prelude.rb` is `<internal:prelude>`. We have several options.

* (1) Don't install ".rb" files and make these path `<internal:trace_point.rb>`, for example.
* (2) Install ".rb" and make these paths non-existing paths such as `<internal>/installdir/lib/builtin/trace_point.rb`.
* (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into `ruby` (`libruby`).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

## Compatibility issue?

There are several compatibility issues. For example, `TracePoint` `c-call` events are changed to `call` events.
And there are more incompatibles.
We need to check them carefully.

## Bootstrap issue?

Yes, there are.

Loading `.rb` files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

# Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-07 21:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-16254.20191015185758@ruby-lang.org>
2019-10-15 18:58 ` [ruby-core:95344] [Ruby master Feature#16254] MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax ko1
2019-10-15 20:01 ` [ruby-core:95345] " eregontp
2019-10-16  2:04 ` [ruby-core:95350] " daniel
2019-10-18  6:08 ` [ruby-core:95413] " naruse
2019-11-07 21:32 ` [ruby-core:95749] " ko1
2019-11-07 21:35 ` [ruby-core:95750] " ko1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).