ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:71943] [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format
       [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
@ 2015-12-08 10:56 ` ko1
  2015-12-08 11:45   ` [ruby-core:71945] " Eric Wong
  2015-12-11  8:15   ` [ruby-core:72052] " Vít Ondruch
  2015-12-08 12:55 ` [ruby-core:71947] [Ruby trunk - Feature #11788] " ko1
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 10+ messages in thread
From: ko1 @ 2015-12-08 10:56 UTC (permalink / raw
  To: ruby-core

Issue #11788 has been reported by Koichi Sasada.

----------------------------------------
Feature #11788: New ISeq serialize binary format
https://bugs.ruby-lang.org/issues/11788

* Author: Koichi Sasada
* Status: Open
* Priority: Normal
* Assignee: Koichi Sasada
----------------------------------------
# Abstract

I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as *experimental* feature.
So I'll commit them.

There are two methods to serialize and de-serialize.

* RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
* RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

The goal of this project is to provide "machine dependent" binary file to achieve:

* fast bootstrap time for big applications
* reduce memory consumption with several techniques

"Machine dependent" means you can't migrate compiled binaries to other machines.

They are not goals of this project:

* packing scripts to one package
* migrate obfuscate binary to other node to hide source code

To achieve such goals, we need to consider compatibility issues such as `__FILE__`, `__dir__`, `DATA`, and so on (for example, consider about this code: `Dir.glob(File.join(__dir__, '*.rb')`).

This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.

So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.

For example,

* making a compiled binary files automatically in same directory of script files like Rubinius,
* store compiled binaries in some DB
* make storage data structure in your own.

I wrote several samples:

* dbm: use dbm
* fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
* fs2: use file system. locate compiled file in specified directory.
* nothing: do nothing.

You can see my sample implementation:
https://github.com/ko1/ruby/blob/iseq_p1/sample/iseq_loader.rb

The key interface is `RubyVM::InstructionSequence.load_iseq(fname)`.
When MRI try to load any script named `fname`, then call this method with `fname` if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling `fname` file.

Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).

# Current status

The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.

Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.

# Evaluation

Several evaluation results:

## resolv.rb

Try to load resolv.rb 1,000 times (and remove Resolv class each time).

```
compile     12.360000   0.310000  12.670000 ( 13.413011)
compile     12.120000   0.300000  12.420000 ( 13.195313)
compile     12.230000   0.270000  12.500000 ( 13.242140)

eager load
load         3.750000   0.180000   3.930000 (  3.918169)
load         4.000000   0.170000   4.170000 (  4.178442)
load         4.120000   0.200000   4.320000 (  4.320233)

lazy load
load         2.410000   0.090000   2.500000 (  2.609716)
load         2.280000   0.210000   2.490000 (  2.518892)
load         2.310000   0.110000   2.420000 (  2.419687)
```

3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.

## fileutils.rb

Try similar to resolv.rb.

```
                 user     system      total        real
compile      8.540000   0.130000   8.670000 (  8.703615)
compile      8.540000   0.150000   8.690000 (  8.693870)
compile      8.430000   0.120000   8.550000 (  8.547480)

eager load
load         4.470000   0.150000   4.620000 (  4.659934)
load         4.500000   0.140000   4.640000 (  4.640365)
load         4.610000   0.100000   4.710000 (  4.708825)

lazy load
load         3.510000   0.140000   3.650000 (  3.694146)
load         3.470000   0.130000   3.600000 (  3.609040)
load         3.550000   0.150000   3.700000 (  3.831015)
```

Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.

## Simple rails application

run `time rails r ''` on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).

```
compile:
real    0m2.049s
user    0m1.601s
sys     0m0.402s

eager:
real    0m1.544s
user    0m1.094s
sys     0m0.422s

lazy:
$ time rails r ''
real    0m1.536s
user    0m1.112s
sys     0m0.388s
```

Not so impressive result. It seems there are many initialization code.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:71945] Re: [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format
  2015-12-08 10:56 ` [ruby-core:71943] [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format ko1
@ 2015-12-08 11:45   ` Eric Wong
  2015-12-11  8:15   ` [ruby-core:72052] " Vít Ondruch
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Wong @ 2015-12-08 11:45 UTC (permalink / raw
  To: ruby-core

ko1@atdot.net wrote:
> * RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
> * RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

Why a new custom binary format instead of existing iseq.to_a + marshal?
Is performance improved enough to be worth extra code?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:71947] [Ruby trunk - Feature #11788] New ISeq serialize binary format
       [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
  2015-12-08 10:56 ` [ruby-core:71943] [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format ko1
@ 2015-12-08 12:55 ` ko1
  2015-12-09  0:53 ` [ruby-core:71962] [Ruby trunk - Feature #11788] [Assigned] " ko1
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: ko1 @ 2015-12-08 12:55 UTC (permalink / raw
  To: ruby-core

Issue #11788 has been updated by Koichi Sasada.


Good question. I never take notice about that.

Try resolv.rb with use to_a/load_iseq.

```
                 user     system      total        real
compile     10.590000   0.270000  10.860000 ( 11.452102)
compile     10.580000   0.250000  10.830000 ( 11.455050)
compile     10.630000   0.330000  10.960000 ( 11.580943)

Use iseq_load()
load        26.380000   0.690000  27.070000 ( 27.768315)
load        27.220000   0.660000  27.880000 ( 28.576489)
load        29.860000   0.630000  30.490000 ( 31.242912)
```

To use to_a, we need to use Marshal.dump().
And loading also needs Marshal.dump and ISeq.load.

>  Is performance improved enough to be worth extra code?

Yes.


----------------------------------------
Feature #11788: New ISeq serialize binary format
https://bugs.ruby-lang.org/issues/11788#change-55355

* Author: Koichi Sasada
* Status: Open
* Priority: Normal
* Assignee: Koichi Sasada
----------------------------------------
# Abstract

I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as *experimental* feature.
So I'll commit them.

There are two methods to serialize and de-serialize.

* RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
* RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

The goal of this project is to provide "machine dependent" binary file to achieve:

* fast bootstrap time for big applications
* reduce memory consumption with several techniques

"Machine dependent" means you can't migrate compiled binaries to other machines.

They are not goals of this project:

* packing scripts to one package
* migrate obfuscate binary to other node to hide source code

To achieve such goals, we need to consider compatibility issues such as `__FILE__`, `__dir__`, `DATA`, and so on (for example, consider about this code: `Dir.glob(File.join(__dir__, '*.rb')`).

This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.

So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.

For example,

* making a compiled binary files automatically in same directory of script files like Rubinius,
* store compiled binaries in some DB
* make storage data structure in your own.

I wrote several samples:

* dbm: use dbm
* fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
* fs2: use file system. locate compiled file in specified directory.
* nothing: do nothing.

You can see my sample implementation:
https://github.com/ko1/ruby/blob/iseq_p1/sample/iseq_loader.rb

The key interface is `RubyVM::InstructionSequence.load_iseq(fname)`.
When MRI try to load any script named `fname`, then call this method with `fname` if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling `fname` file.

Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).

# Current status

The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.

Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.

# Evaluation

Several evaluation results:

## resolv.rb

Try to load resolv.rb 1,000 times (and remove Resolv class each time).

```
compile     12.360000   0.310000  12.670000 ( 13.413011)
compile     12.120000   0.300000  12.420000 ( 13.195313)
compile     12.230000   0.270000  12.500000 ( 13.242140)

eager load
load         3.750000   0.180000   3.930000 (  3.918169)
load         4.000000   0.170000   4.170000 (  4.178442)
load         4.120000   0.200000   4.320000 (  4.320233)

lazy load
load         2.410000   0.090000   2.500000 (  2.609716)
load         2.280000   0.210000   2.490000 (  2.518892)
load         2.310000   0.110000   2.420000 (  2.419687)
```

3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.

## fileutils.rb

Try similar to resolv.rb.

```
                 user     system      total        real
compile      8.540000   0.130000   8.670000 (  8.703615)
compile      8.540000   0.150000   8.690000 (  8.693870)
compile      8.430000   0.120000   8.550000 (  8.547480)

eager load
load         4.470000   0.150000   4.620000 (  4.659934)
load         4.500000   0.140000   4.640000 (  4.640365)
load         4.610000   0.100000   4.710000 (  4.708825)

lazy load
load         3.510000   0.140000   3.650000 (  3.694146)
load         3.470000   0.130000   3.600000 (  3.609040)
load         3.550000   0.150000   3.700000 (  3.831015)
```

Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.

## Simple rails application

run `time rails r ''` on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).

```
compile:
real    0m2.049s
user    0m1.601s
sys     0m0.402s

eager:
real    0m1.544s
user    0m1.094s
sys     0m0.422s

lazy:
$ time rails r ''
real    0m1.536s
user    0m1.112s
sys     0m0.388s
```

Not so impressive result. It seems there are many initialization code.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:71962] [Ruby trunk - Feature #11788] [Assigned] New ISeq serialize binary format
       [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
  2015-12-08 10:56 ` [ruby-core:71943] [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format ko1
  2015-12-08 12:55 ` [ruby-core:71947] [Ruby trunk - Feature #11788] " ko1
@ 2015-12-09  0:53 ` ko1
  2015-12-09  4:41 ` [ruby-core:71970] [Ruby trunk - Feature #11788] " usa
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: ko1 @ 2015-12-09  0:53 UTC (permalink / raw
  To: ruby-core

Issue #11788 has been updated by Koichi Sasada.

Status changed from Closed to Assigned

Note that current implementation lacks error checking and verification, 
so that broken binary data cause SEGV (or access overflow) easily.

This is another reason why I wrote it is not for "migration" purpose.
(malicious binary data can come from outside)
Same reason why we don't publish rb_iseq_load() as Ruby method.

It will be a security risk if malicious person can pass modified binary data to MRI.


----------------------------------------
Feature #11788: New ISeq serialize binary format
https://bugs.ruby-lang.org/issues/11788#change-55371

* Author: Koichi Sasada
* Status: Assigned
* Priority: Normal
* Assignee: Koichi Sasada
----------------------------------------
# Abstract

I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as *experimental* feature.
So I'll commit them.

There are two methods to serialize and de-serialize.

* RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
* RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

The goal of this project is to provide "machine dependent" binary file to achieve:

* fast bootstrap time for big applications
* reduce memory consumption with several techniques

"Machine dependent" means you can't migrate compiled binaries to other machines.

They are not goals of this project:

* packing scripts to one package
* migrate obfuscate binary to other node to hide source code

To achieve such goals, we need to consider compatibility issues such as `__FILE__`, `__dir__`, `DATA`, and so on (for example, consider about this code: `Dir.glob(File.join(__dir__, '*.rb')`).

This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.

So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.

For example,

* making a compiled binary files automatically in same directory of script files like Rubinius,
* store compiled binaries in some DB
* make storage data structure in your own.

I wrote several samples:

* dbm: use dbm
* fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
* fs2: use file system. locate compiled file in specified directory.
* nothing: do nothing.

You can see my sample implementation:
https://github.com/ko1/ruby/blob/iseq_p1/sample/iseq_loader.rb

The key interface is `RubyVM::InstructionSequence.load_iseq(fname)`.
When MRI try to load any script named `fname`, then call this method with `fname` if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling `fname` file.

Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).

# Current status

The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.

Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.

# Evaluation

Several evaluation results:

## resolv.rb

Try to load resolv.rb 1,000 times (and remove Resolv class each time).

```
compile     12.360000   0.310000  12.670000 ( 13.413011)
compile     12.120000   0.300000  12.420000 ( 13.195313)
compile     12.230000   0.270000  12.500000 ( 13.242140)

eager load
load         3.750000   0.180000   3.930000 (  3.918169)
load         4.000000   0.170000   4.170000 (  4.178442)
load         4.120000   0.200000   4.320000 (  4.320233)

lazy load
load         2.410000   0.090000   2.500000 (  2.609716)
load         2.280000   0.210000   2.490000 (  2.518892)
load         2.310000   0.110000   2.420000 (  2.419687)
```

3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.

## fileutils.rb

Try similar to resolv.rb.

```
                 user     system      total        real
compile      8.540000   0.130000   8.670000 (  8.703615)
compile      8.540000   0.150000   8.690000 (  8.693870)
compile      8.430000   0.120000   8.550000 (  8.547480)

eager load
load         4.470000   0.150000   4.620000 (  4.659934)
load         4.500000   0.140000   4.640000 (  4.640365)
load         4.610000   0.100000   4.710000 (  4.708825)

lazy load
load         3.510000   0.140000   3.650000 (  3.694146)
load         3.470000   0.130000   3.600000 (  3.609040)
load         3.550000   0.150000   3.700000 (  3.831015)
```

Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.

## Simple rails application

run `time rails r ''` on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).

```
compile:
real    0m2.049s
user    0m1.601s
sys     0m0.402s

eager:
real    0m1.544s
user    0m1.094s
sys     0m0.422s

lazy:
$ time rails r ''
real    0m1.536s
user    0m1.112s
sys     0m0.388s
```

Not so impressive result. It seems there are many initialization code.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:71970] [Ruby trunk - Feature #11788] New ISeq serialize binary format
       [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2015-12-09  0:53 ` [ruby-core:71962] [Ruby trunk - Feature #11788] [Assigned] " ko1
@ 2015-12-09  4:41 ` usa
  2015-12-09  5:05   ` [ruby-core:71972] " SASADA Koichi
  2015-12-09  5:14 ` [ruby-core:71973] " usa
  2016-04-11  6:28 ` [ruby-core:74876] [Ruby trunk Feature#11788][Closed] " ko1
  5 siblings, 1 reply; 10+ messages in thread
From: usa @ 2015-12-09  4:41 UTC (permalink / raw
  To: ruby-core

Issue #11788 has been updated by Usaku NAKAMURA.


Koichi Sasada wrote:
> Note that current implementation lacks error checking and verification, 
> so that broken binary data cause SEGV (or access overflow) easily.
(snip)
> It will be a security risk if malicious person can pass modified binary data to MRI.

You should mention it at NEWS.

----------------------------------------
Feature #11788: New ISeq serialize binary format
https://bugs.ruby-lang.org/issues/11788#change-55379

* Author: Koichi Sasada
* Status: Assigned
* Priority: Normal
* Assignee: Koichi Sasada
----------------------------------------
# Abstract

I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as *experimental* feature.
So I'll commit them.

There are two methods to serialize and de-serialize.

* RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
* RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

The goal of this project is to provide "machine dependent" binary file to achieve:

* fast bootstrap time for big applications
* reduce memory consumption with several techniques

"Machine dependent" means you can't migrate compiled binaries to other machines.

They are not goals of this project:

* packing scripts to one package
* migrate obfuscate binary to other node to hide source code

To achieve such goals, we need to consider compatibility issues such as `__FILE__`, `__dir__`, `DATA`, and so on (for example, consider about this code: `Dir.glob(File.join(__dir__, '*.rb')`).

This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.

So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.

For example,

* making a compiled binary files automatically in same directory of script files like Rubinius,
* store compiled binaries in some DB
* make storage data structure in your own.

I wrote several samples:

* dbm: use dbm
* fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
* fs2: use file system. locate compiled file in specified directory.
* nothing: do nothing.

You can see my sample implementation:
https://github.com/ko1/ruby/blob/iseq_p1/sample/iseq_loader.rb

The key interface is `RubyVM::InstructionSequence.load_iseq(fname)`.
When MRI try to load any script named `fname`, then call this method with `fname` if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling `fname` file.

Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).

# Current status

The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.

Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.

# Evaluation

Several evaluation results:

## resolv.rb

Try to load resolv.rb 1,000 times (and remove Resolv class each time).

```
compile     12.360000   0.310000  12.670000 ( 13.413011)
compile     12.120000   0.300000  12.420000 ( 13.195313)
compile     12.230000   0.270000  12.500000 ( 13.242140)

eager load
load         3.750000   0.180000   3.930000 (  3.918169)
load         4.000000   0.170000   4.170000 (  4.178442)
load         4.120000   0.200000   4.320000 (  4.320233)

lazy load
load         2.410000   0.090000   2.500000 (  2.609716)
load         2.280000   0.210000   2.490000 (  2.518892)
load         2.310000   0.110000   2.420000 (  2.419687)
```

3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.

## fileutils.rb

Try similar to resolv.rb.

```
                 user     system      total        real
compile      8.540000   0.130000   8.670000 (  8.703615)
compile      8.540000   0.150000   8.690000 (  8.693870)
compile      8.430000   0.120000   8.550000 (  8.547480)

eager load
load         4.470000   0.150000   4.620000 (  4.659934)
load         4.500000   0.140000   4.640000 (  4.640365)
load         4.610000   0.100000   4.710000 (  4.708825)

lazy load
load         3.510000   0.140000   3.650000 (  3.694146)
load         3.470000   0.130000   3.600000 (  3.609040)
load         3.550000   0.150000   3.700000 (  3.831015)
```

Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.

## Simple rails application

run `time rails r ''` on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).

```
compile:
real    0m2.049s
user    0m1.601s
sys     0m0.402s

eager:
real    0m1.544s
user    0m1.094s
sys     0m0.422s

lazy:
$ time rails r ''
real    0m1.536s
user    0m1.112s
sys     0m0.388s
```

Not so impressive result. It seems there are many initialization code.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:71972] Re: [Ruby trunk - Feature #11788] New ISeq serialize binary format
  2015-12-09  4:41 ` [ruby-core:71970] [Ruby trunk - Feature #11788] " usa
@ 2015-12-09  5:05   ` SASADA Koichi
  0 siblings, 0 replies; 10+ messages in thread
From: SASADA Koichi @ 2015-12-09  5:05 UTC (permalink / raw
  To: ruby-core

On 2015/12/09 13:41, usa@garbagecollect.jp wrote:
> You should mention it at NEWS.

Updated.
I'm not sure how detail description is needed on an entry.

-- 
// SASADA Koichi at atdot dot net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:71973] [Ruby trunk - Feature #11788] New ISeq serialize binary format
       [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2015-12-09  4:41 ` [ruby-core:71970] [Ruby trunk - Feature #11788] " usa
@ 2015-12-09  5:14 ` usa
  2016-04-11  6:28 ` [ruby-core:74876] [Ruby trunk Feature#11788][Closed] " ko1
  5 siblings, 0 replies; 10+ messages in thread
From: usa @ 2015-12-09  5:14 UTC (permalink / raw
  To: ruby-core

Issue #11788 has been updated by Usaku NAKAMURA.


Koichi Sasada wrote:
>  Updated.

Thanks.


>  I'm not sure how detail description is needed on an entry.

I'm not sure, too.
But I can say that if a feature introduces a known security risk, users must be informed with documentations.


----------------------------------------
Feature #11788: New ISeq serialize binary format
https://bugs.ruby-lang.org/issues/11788#change-55382

* Author: Koichi Sasada
* Status: Assigned
* Priority: Normal
* Assignee: Koichi Sasada
----------------------------------------
# Abstract

I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as *experimental* feature.
So I'll commit them.

There are two methods to serialize and de-serialize.

* RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
* RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

The goal of this project is to provide "machine dependent" binary file to achieve:

* fast bootstrap time for big applications
* reduce memory consumption with several techniques

"Machine dependent" means you can't migrate compiled binaries to other machines.

They are not goals of this project:

* packing scripts to one package
* migrate obfuscate binary to other node to hide source code

To achieve such goals, we need to consider compatibility issues such as `__FILE__`, `__dir__`, `DATA`, and so on (for example, consider about this code: `Dir.glob(File.join(__dir__, '*.rb')`).

This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.

So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.

For example,

* making a compiled binary files automatically in same directory of script files like Rubinius,
* store compiled binaries in some DB
* make storage data structure in your own.

I wrote several samples:

* dbm: use dbm
* fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
* fs2: use file system. locate compiled file in specified directory.
* nothing: do nothing.

You can see my sample implementation:
https://github.com/ko1/ruby/blob/iseq_p1/sample/iseq_loader.rb

The key interface is `RubyVM::InstructionSequence.load_iseq(fname)`.
When MRI try to load any script named `fname`, then call this method with `fname` if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling `fname` file.

Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).

# Current status

The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.

Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.

# Evaluation

Several evaluation results:

## resolv.rb

Try to load resolv.rb 1,000 times (and remove Resolv class each time).

```
compile     12.360000   0.310000  12.670000 ( 13.413011)
compile     12.120000   0.300000  12.420000 ( 13.195313)
compile     12.230000   0.270000  12.500000 ( 13.242140)

eager load
load         3.750000   0.180000   3.930000 (  3.918169)
load         4.000000   0.170000   4.170000 (  4.178442)
load         4.120000   0.200000   4.320000 (  4.320233)

lazy load
load         2.410000   0.090000   2.500000 (  2.609716)
load         2.280000   0.210000   2.490000 (  2.518892)
load         2.310000   0.110000   2.420000 (  2.419687)
```

3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.

## fileutils.rb

Try similar to resolv.rb.

```
                 user     system      total        real
compile      8.540000   0.130000   8.670000 (  8.703615)
compile      8.540000   0.150000   8.690000 (  8.693870)
compile      8.430000   0.120000   8.550000 (  8.547480)

eager load
load         4.470000   0.150000   4.620000 (  4.659934)
load         4.500000   0.140000   4.640000 (  4.640365)
load         4.610000   0.100000   4.710000 (  4.708825)

lazy load
load         3.510000   0.140000   3.650000 (  3.694146)
load         3.470000   0.130000   3.600000 (  3.609040)
load         3.550000   0.150000   3.700000 (  3.831015)
```

Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.

## Simple rails application

run `time rails r ''` on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).

```
compile:
real    0m2.049s
user    0m1.601s
sys     0m0.402s

eager:
real    0m1.544s
user    0m1.094s
sys     0m0.422s

lazy:
$ time rails r ''
real    0m1.536s
user    0m1.112s
sys     0m0.388s
```

Not so impressive result. It seems there are many initialization code.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:72052] Re: [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format
  2015-12-08 10:56 ` [ruby-core:71943] [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format ko1
  2015-12-08 11:45   ` [ruby-core:71945] " Eric Wong
@ 2015-12-11  8:15   ` Vít Ondruch
  2015-12-11  8:26     ` [ruby-core:72053] " SASADA Koichi
  1 sibling, 1 reply; 10+ messages in thread
From: Vít Ondruch @ 2015-12-11  8:15 UTC (permalink / raw
  To: ruby-core



Dne 8.12.2015 v 11:56 ko1@atdot.net napsal(a):
> The goal of this project is to provide "machine dependent" binary file to achieve:

Could you please elaborate more about this? For example on Fedora/RHEL,
Python byte code is shipped precompiled per platform, is this machine
dependent code more fine grained then Python byte code, so this would
not work?

> * making a compiled binary files automatically in same directory of script files like Rubinius,

This does not work for packaged Ruby applications, i.e. they are
installed somewhere in read-only /usr, so it should go probably
somewhere into /var/cache IMO. Or it should be configurable somehow
during runtime with some fallback paths.


Vít

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:72053] Re: [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format
  2015-12-11  8:15   ` [ruby-core:72052] " Vít Ondruch
@ 2015-12-11  8:26     ` SASADA Koichi
  0 siblings, 0 replies; 10+ messages in thread
From: SASADA Koichi @ 2015-12-11  8:26 UTC (permalink / raw
  To: ruby-core

On 2015/12/11 17:15, Vít Ondruch wrote:
> Dne 8.12.2015 v 11:56 ko1@atdot.net napsal(a):
>> The goal of this project is to provide "machine dependent" binary file to achieve:
> 
> Could you please elaborate more about this? For example on Fedora/RHEL,
> Python byte code is shipped precompiled per platform, is this machine
> dependent code more fine grained then Python byte code, so this would
> not work?

You are right.
This proposal doesn't support "shipped precompiled per platform".

>> * making a compiled binary files automatically in same directory of script files like Rubinius,
> 
> This does not work for packaged Ruby applications, i.e. they are
> installed somewhere in read-only /usr, so it should go probably
> somewhere into /var/cache IMO. Or it should be configurable somehow
> during runtime with some fallback paths.

Yes.

-- 
// SASADA Koichi at atdot dot net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:74876] [Ruby trunk Feature#11788][Closed] New ISeq serialize binary format
       [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2015-12-09  5:14 ` [ruby-core:71973] " usa
@ 2016-04-11  6:28 ` ko1
  5 siblings, 0 replies; 10+ messages in thread
From: ko1 @ 2016-04-11  6:28 UTC (permalink / raw
  To: ruby-core

Issue #11788 has been updated by Koichi Sasada.

Status changed from Assigned to Closed

MRI 2.3 was shipped with this feature.


----------------------------------------
Feature #11788: New ISeq serialize binary format
https://bugs.ruby-lang.org/issues/11788#change-58005

* Author: Koichi Sasada
* Status: Closed
* Priority: Normal
* Assignee: Koichi Sasada
----------------------------------------
# Abstract

I wrote a new RubyVM::InstructionSequence (ISeq) object serializer and de-serializer binary format.
Matz had approved to introduce this feature to Ruby 2.3 as *experimental* feature.
So I'll commit them.

There are two methods to serialize and de-serialize.

* RubyVM::InstructionSequence#to_binary_format returns binary format data as String object.
* RubyVM::InstructionSequence.from_binary_format(data) de-serialize it.

The goal of this project is to provide "machine dependent" binary file to achieve:

* fast bootstrap time for big applications
* reduce memory consumption with several techniques

"Machine dependent" means you can't migrate compiled binaries to other machines.

They are not goals of this project:

* packing scripts to one package
* migrate obfuscate binary to other node to hide source code

To achieve such goals, we need to consider compatibility issues such as `__FILE__`, `__dir__`, `DATA`, and so on (for example, consider about this code: `Dir.glob(File.join(__dir__, '*.rb')`).

This proposal doesn't contain "how to store compiled binaries".
For example, Rubinius makes *.rbc file automatically.
However, Matz does not like such automatic compilation.

So that my proposal only show user storage class interface.
People can try to make your own ISeq binary storage.

For example,

* making a compiled binary files automatically in same directory of script files like Rubinius,
* store compiled binaries in some DB
* make storage data structure in your own.

I wrote several samples:

* dbm: use dbm
* fs: [default] use file system. locate compiled file in same directory of script file like Rubinius. foo.rb.yarb will be created for foo.rb.
* fs2: use file system. locate compiled file in specified directory.
* nothing: do nothing.

You can see my sample implementation:
https://github.com/ko1/ruby/blob/iseq_p1/sample/iseq_loader.rb

The key interface is `RubyVM::InstructionSequence.load_iseq(fname)`.
When MRI try to load any script named `fname`, then call this method with `fname` if defined.
The return value is an ISeq object, then MRI use this ISeq object instead of parsing/compiling `fname` file.

Note that this proposal is "experimental".
These interfaces are only for experiments.
For example, if we want to use several binary storage,
this interface doesn't support multiple storage (lack of extensibility).

# Current status

The current implementation is not matured because the binary size is very big because pointer size consumes 32/64 bits.
It is easy to reduce, but I remain this weak point.

Now, one goal "reduction of memory consumption" is not achieved because no techniques are introduced to share/unload or something.
This is future work.

# Evaluation

Several evaluation results:

## resolv.rb

Try to load resolv.rb 1,000 times (and remove Resolv class each time).

```
compile     12.360000   0.310000  12.670000 ( 13.413011)
compile     12.120000   0.300000  12.420000 ( 13.195313)
compile     12.230000   0.270000  12.500000 ( 13.242140)

eager load
load         3.750000   0.180000   3.930000 (  3.918169)
load         4.000000   0.170000   4.170000 (  4.178442)
load         4.120000   0.200000   4.320000 (  4.320233)

lazy load
load         2.410000   0.090000   2.500000 (  2.609716)
load         2.280000   0.210000   2.490000 (  2.518892)
load         2.310000   0.110000   2.420000 (  2.419687)
```

3.25 times faster than normal compilation.
If we use lazy loading technique, it is 5.2 times faster.

## fileutils.rb

Try similar to resolv.rb.

```
                 user     system      total        real
compile      8.540000   0.130000   8.670000 (  8.703615)
compile      8.540000   0.150000   8.690000 (  8.693870)
compile      8.430000   0.120000   8.550000 (  8.547480)

eager load
load         4.470000   0.150000   4.620000 (  4.659934)
load         4.500000   0.140000   4.640000 (  4.640365)
load         4.610000   0.100000   4.710000 (  4.708825)

lazy load
load         3.510000   0.140000   3.650000 (  3.694146)
load         3.470000   0.130000   3.600000 (  3.609040)
load         3.550000   0.150000   3.700000 (  3.831015)
```

Only 1.8 times faster (eager) and 2.4 times faster (lazy).
This is because the initialization of FileUtils class run long time.
It uses module_eval(str) to add methods.

## Simple rails application

run `time rails r ''` on simple Rails application (https://github.com/ko1/tracer_demo_rails_app tracers are disabled).

```
compile:
real    0m2.049s
user    0m1.601s
sys     0m0.402s

eager:
real    0m1.544s
user    0m1.094s
sys     0m0.422s

lazy:
$ time rails r ''
real    0m1.536s
user    0m1.112s
sys     0m0.388s
```

Not so impressive result. It seems there are many initialization code.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-11  5:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-11788.20151208105615@ruby-lang.org>
2015-12-08 10:56 ` [ruby-core:71943] [Ruby trunk - Feature #11788] [Open] New ISeq serialize binary format ko1
2015-12-08 11:45   ` [ruby-core:71945] " Eric Wong
2015-12-11  8:15   ` [ruby-core:72052] " Vít Ondruch
2015-12-11  8:26     ` [ruby-core:72053] " SASADA Koichi
2015-12-08 12:55 ` [ruby-core:71947] [Ruby trunk - Feature #11788] " ko1
2015-12-09  0:53 ` [ruby-core:71962] [Ruby trunk - Feature #11788] [Assigned] " ko1
2015-12-09  4:41 ` [ruby-core:71970] [Ruby trunk - Feature #11788] " usa
2015-12-09  5:05   ` [ruby-core:71972] " SASADA Koichi
2015-12-09  5:14 ` [ruby-core:71973] " usa
2016-04-11  6:28 ` [ruby-core:74876] [Ruby trunk Feature#11788][Closed] " ko1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).